How the .NET Debugger Works

Breaking Things

Before I start I'd like to mention Whidbey, which is the next version of .NET and visual studio. So far everything that I've covered has been accurate for Whidbey too and the only breaking change that I've covered is the fact that you can't call CoInitialize(NULL) any more and expect the debugging APIs to work. I assume that the debugger APIs were never meant to be able to run under that COM setting and so they have fixed this in this version.

In order to do much at all when you are running under a debugger you need to be able to break into the execution of the debuggee and there are many ways in which you can do this under VS.NET. The most obvious are through breakpoints and through unhandled exceptions when something goes wrong. There is also the Break command on the Debug menu which is useful when you want to know what our application is doing at any point. The easiest way is to call System.Diagnostics.Debugger.Break() from inside the debuggee, and in this article I will use this method in order to show how to retrieve the source file and line number for the location of the break.

In order to keep the code clean and simple all of the processing for the break is handled in the Break debugger callback. In a real system the processing would be more suitably placed spread out over some of the other callbacks such as LoadModule as well, but this greatly complicates the process. Error checking is also mostly omitted.

ICorDebugManagedCallback::Break(AppDomain, Thread)

In order to get the source file and line number for a break we need to do the following:

  1. Get the active frame (ICorDebugFrame)
  2. Get the IL Frame by QueryInterfacing ICorDebugFrame for ICorDebugILFrame
  3. Use the frame to get the function
  4. Get the module for the function
  5. Create a ISymUnmanagedReader for the module's debugging information
  6. Pass the function token into the ISymUnmanagedReader to retrieve a ISymUnmanagedMethod pointer.
  7. Retrieve the sequence points for this method.
  8. Retrieve the instruction pointer (IP) from the IL frame.
  9. Search through the sequence points until you match the IP.
  10. Return the associated line and filename.

As you can see a lot is involved in retrieving the information that we need, which is why in a real system a lot of this would be cached. The module symbol reader could be loaded in the ModuleLoad event for example.

Finding out where we are

When we break we only know two pieces of information: The AppDomain we are in and the Thread we are running in. To begin with we need to know where are in the call stack, which we retrieve through ICorDebugThread::GetCurrentFrame(). The callstack is made up of a series of chains, and each chain is made up of a series of frames. We will cover this in more detail when we get onto displaying the call stack in a future instalment. For now we only need to care about the current frame, which will be at the very top of the call stack for our current thread. The object we receive from GetCurrentFrame is of the type ICorDebugFrame, but this isn't enough for our needs. Each frame can either be a managed or an unmanaged frame depending on the nature of the code inside of it and since we know that we will be breaking into a managed frame we can QueryInterface for a more useful ICorDebugILFrame which contains more information. The following code is the start of the break handler from the code that is supplied with this article that does this:

HRESULT TestDebugger::Break(
    /* [in] */ ICorDebugAppDomain *pAppDomain,
    /* [in] */ ICorDebugThread *thread)
{
    ICorDebugFrame* frame = NULL;
    thread->GetActiveFrame(&frame);
    ICorDebugILFrame* ilFrame;
    frame->QueryInterface(IID_ICorDebugILFrame, (void**)&ilFrame);

Now that we have an ICorDebugILFrame object we can get the current function that we are broken in.

ICorDebugFunction* function;
ilFrame->GetFunction(&function);

This code returns an ICorDebugFunction that contains the information about where we actually are in the application. The first step is to retrieve the module that contains this function. The module is needed in order to get the symbol reader that will allow us to take the function and look up the source.

ICorDebugModule* module = NULL;
function->GetModule(&module);
wchar_t moduleName[256];
ULONG32 len = 0;
module->GetName(256, &len, moduleName);
Module m(module, moduleName);
if (m.Init())
{

I have created a helper class called Module which handles retrieving all of the information we need for this module, and the constructor just stores the ICorDebugModule pointer and the module name, which is the full path to this module on the disk. We next call the init function on the module, which I have made return true if debug symbols are available and false if not.

bool Module::Init(void)
{
    // Attempt to load debug symbols
    // This code is slight modification of the code in the
    // Debugger sample in the .NET Framework SDK
    HRESULT hr = module->GetMetaDataInterface(IID_IMetaDataImport,
                                                (IUnknown**)&import);
    if(SUCCEEDED(hr))
    {
        BOOL isDynamic = FALSE;
        hr = module->IsDynamic(&isDynamic);
        BOOL isInMemory = FALSE;
        hr = module->IsInMemory(&isInMemory);
        if(isDynamic || isInMemory)
        {
            // Dynamic and in memory assemblies are a special case
            // which we will ignore for now
            // now
            return false;
        }
        // We now need a binder object that will take the module and return a
        // reader object
        ISymUnmanagedBinder *binder;
        hr = CoCreateInstance(CLSID_CorSymBinder, NULL,
            CLSCTX_INPROC_SERVER,
            IID_ISymUnmanagedBinder,
            (void**)&binder);
        if(SUCCEEDED(hr))
        {
            reader = NULL;
            hr = binder->GetReaderForFile(import,
                name,
                L".", // Search the current directory only for now
                &reader);
            if(FAILED(hr))
            {
                binder->Release();
                // There is a case where a valid reader can be returned from
                // binder->GetReaderForFile() even though it is considered an error.
                // Handle that by releasing if we need to
                if(NULL != reader)
                {
                    reader->Release();
                    reader = NULL;
                }
                return false;
            }
        }
        binder->Release();
    }
    return true;
}
bool Module::Init(void)
{
    // Attempt to load debug symbols
    // This code is slight modification of the code in the
    // Debugger sample in the .NET Framework SDK
    HRESULT hr = module->GetMetaDataInterface(IID_IMetaDataImport,
                                                (IUnknown**)&import);
    if(SUCCEEDED(hr))
    {
        BOOL isDynamic = FALSE;
        hr = module->IsDynamic(&isDynamic);
        BOOL isInMemory = FALSE;
        hr = module->IsInMemory(&isInMemory);
        if(isDynamic || isInMemory)
        {
            // Dynamic and in memory assemblies are a special case
            // which we will ignore for now
            // now
            return false;
        }
        // We now need a binder object that will take the module and return a
        // reader object
        ISymUnmanagedBinder *binder;
        hr = CoCreateInstance(CLSID_CorSymBinder, NULL,
            CLSCTX_INPROC_SERVER,
            IID_ISymUnmanagedBinder,
            (void**)&binder);
        if(SUCCEEDED(hr))
        {
            reader = NULL;
            hr = binder->GetReaderForFile(import,
                name,
                L".", // Search the current directory only for now
                &reader);
            if(FAILED(hr))
            {
                binder->Release();
                // There is a case where a valid reader can be returned from
                // binder->GetReaderForFile() even though it is considered an error.
                // Handle that by releasing if we need to
                if(NULL != reader)
                {
                    reader->Release();
                    reader = NULL;
                }
                return false;
            }
        }
        binder->Release();
    }
    return true;
}

The process here is simple enough: We get the meta data interface for the module, create an ISymUnmanagedBinder and then use the two to create a ISymUnmanagedReader for this module through a call to GetReaderForFile on the binder. If this all succeeds then we have a reader available so we return true, and if not we return false. We can now return to the break handler and the code for when we have symbols available.

if(m.Init())
    {
        // Source is available
        Function func(function);
        func.Init(&m);

Again I have moved some code into a separate class for clarity here, and it is in the Function class where the important logic resides. The constructor just stores the ICorDebugFunction pointer, and the actual initialisation takes place in the Init function.

void Function::Init(Module* module)
{
    mdMethodDef methodToken = 0;
   
    function->GetToken(&methodToken);
    module->GetMethod(methodToken, &symMethod);

The CLR debugger uses tokens to represent items such as functions and so we retrieve the token for this function and then pass it onto another helper function in the module. All this does is call GetMethod on the ISymUnmanagedReader, which returns an ISymUnmanagedMethod object. It should be noted here that the “Unmanaged” part of that name refers to the “ISymUnmanaged” section of the name that is shared by all of the symbol interfaces and doesn't mean that this function is unmanaged. We can now use this to return the sequence points.

    HRESULT hr = symMethod->GetSequencePointCount(&spCount);
    if(spCount > 0)
    {
        spOffsets = new ULONG32[spCount];
        spDocuments = new ISymUnmanagedDocument*[spCount];
        spLines = new ULONG32[spCount];
        ULONG32 actualCount = 0;
        hr = symMethod->GetSequencePoints(spCount, &actualCount, spOffsets,
                                          spDocuments, spLines, NULL, NULL, NULL);
    }
}

Sequence Points

One of the most important concepts for what we are doing is the sequence point. These are a list of numbers that contain the instruction offsets for various points in the code. Where these points are depends on the compiler and language, but usually take the form of mapping between Intermediate Language (IL) to elements of the language. For instance if an “if” statement in C# took 5 instructions of IL to implement then the second sequence point would be the value of the first plus 5. If you code in IL then your sequence points will be one apart. Since they exist to allow the debugger to map logical elements in the code such as keywords into chunks that can be stepped over (among other things) think of them as defining where each step command in the debugger should go to next. We will also use them for setting breakpoints in the future since breakpoints are associated with sequence points, and they are why you can set a breakpoint on just the the “i++” section of “for(int i = 0; i < 10; i++)” in C# while you can only set one on the whole line in unmanaged C++.

For the code above we are allocating arrays to hold the sequence point offsets, and each offset has a line number and document associated with it. We are only retrieving the start line of the sequence point here, and we could also be retrieving the end line and also start and end columns if we wished.

After we have filled our arrays with the sequence point information we return to the break handler and retrieve the current instruction pointer from the IL Frame. After we have done this we can release the IL Frame that we Queried for earlier in the function.

ULONG32 ip = 0;
CorDebugMappingResult mappingResult;
ilFrame->GetIP(&ip, &mappingResult);
ilFrame->Release();

Having the instruction pointer (IP) means that we now have all of the information we need in order to find where we are in a source file. Since the IP is just an instruction number in the function and we have an array of sequence points that map instruction pointers to lines we can perform a lookup:

HRESULT Function::FindLineFromIP(UINT_PTR ip, unsigned int* line,
                        wchar_t* fileName)
{
    if(spCount > 0)
    {
        if(spOffsets[0] <= ip)
        {
            unsigned int i;
            for (i = 0; i < spCount; i++)
            {
                if (spOffsets[i] >= ip)
                {
                    break;
                }
            }
            // If not an exact match, then we're one too high.
            if (((i == spCount) || (spOffsets[i] != ip)) && (i > 0))
            {
                i--;
            }
            *line = spLines[i];
            ISymUnmanagedDocument* doc = spDocuments[i];
            ULONG32 length = 0;
            doc->GetURL(MAX_PATH, &length, fileName);
           
            return S_OK;
        }
    }
    return E_FAIL;
}

The mechanics of the search are straight forwards with a loop through the sequence points until we find the one that contains out IP. After we have this number we can use it on the lines and document arrays to return the correct values. The document contains the full name and path of our source file that we can retrieve through the GetURL function and we return both of these values back out to our handler which can either load the file and show where we are, or in the case of the example code just display these values.

This has been a whistle stop tour of how to retrieve the source file and line number when you break, but in doing it this way I hope that I've made it a lot clearer than it is in the debugger example in the framework SDK which is from where most of this code has been adapted.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Programs must be written for people to read, and only incidentally for machines to execute.”