Surviving the Release Version

Bounds Errors

There are many valid optimizations which uncover bugs that are masked in the debug version. Yes, sometimes it is a compiler bug, but 99% of the time it is a genuine logic error that just happens to be harmless in the absence of optimization, but fatal when it is in place. For example, if you have an off-by-one array access, consider code of the following general form

void func()
    {
     char buffer[10];
     int counter;

     lstrcpy(buffer, "abcdefghik"); // 11-byte copy, including NUL
     ...

In the debug version, the NUL byte at the end of the string overwrites the high-order byte of counter, but unless counter gets > 16M, this is harmless even if counter is active. But in the optimizing compiler, counter is moved to a register, and never appears on the stack. There is no space allocated for it. The NUL byte overwrites the data which follows buffer, which may be the return address from the function, causing an access error when the function returns.

Of course, this is sensitive to all sorts of incidental features of the layout. If instead the program had been

void func()
    {
     char buffer[10];
     int counter;
     char result[20];

     wsprintf(result, _T("Result = %d"), counter);
     lstrcpy(buffer, _T("abcdefghik")); // 11-byte copy, including NUL

then the NUL byte, which used to overlap the high order byte of counter (which doesn't matter in this example because counter is obviously no longer needed after the line using it is printed) now overwrites the first byte of result, with the consequence that the string result now appears to be an empty string, with no explanation of why it is so. If result had been a char * variable or some other pointer you would be getting an access fault trying to access through it. Yet the program "worked in the debug version"! Well, it didn't, it was wrong, but the error was masked.

In such cases you will need to create a version of the executable with debug information, then use the break-on-value-changed feature to look for the bogus overwrite. Sometimes you have to get very creative to trap these errors.

Been there, done that. I once got a company award at the monthly company meeting for finding a fatal memory overwrite error that was a "seventh-level bug", that is, the pointer that was clobbered by overwriting it with another valid (but incorrect) pointer caused another pointer to be clobbered which caused an index to be computed incorrectly which caused...and seven levels of damage later it finally blew up with a fatal access error. In that system, it was impossible to generate a release version with symbols, so I spent 17 straight hours single-stepping instructions, working backward through the link map, and gradually tracking it down. I had two terminals, one running the debug version and one running the release version. It was obvious in the debug version what had gone wrong, after I found the error, but in the unoptimized code the phenomenon shown above masked the actual error.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“In theory, theory and practice are the same. In practice, they're not.”