Master of Memory Mysteries

The mere mortals can fix bugs in computer programs, but, sometimes, it takes a master to ping-point the real problem.

The mere mortals can fix bugs in computer programs, but, sometimes, it takes a master to ping-point the real problem. Here is a real life example.

I was working on a project that integrates a number of different packages under the umbrella of a visualization system. My main role is to make sure my own little package does not break the whole thing. Things were going pretty much as expected; we dealt with the problems like one package changing its public interface between versions, and another software layer not handling all the data types demanded by the application, pretty routine stuff. I even fixed an intermittent bug in the package that I am responsible for. I was just about to pat myself on the back for that when the lead developer of visualization system reported a crash apparently caused by my program.

The report was very precise: one object was allocated at address 0x8a0b750 initially, however, without noticing any destruction of this object, another object of different type was allocated at the same location a little while later. The memory management system should not be doing this sort thing, what could be causing this?

Since most of the packages were written in C++, an obvious suspect was that someone might have allocated the first object using the new operator, but deleted with free. However, this is unlikely because that package has gone through quite a bit of testing, such an obvious problem would have been spotted already. Next, I tried a little bit of blame-it-on-someone-else - maybe the glue code standing between my code and the visualization system is broken - it might have allocated an array of objects by only deleted one. I can see the raw pointers in the glue code, but I don’t see any mismatching memory management functions. Oh, well, here goes the theory.

The bug report came in late in a Friday afternoon. Since I am not able to figure anything out, I have to leave it alone over the weekend. There were a few more email exchanges by the following Monday, and there still isn’t anything obvious as what caused the problem. Furthermore, no one else working on the integration project has experienced a similar problem. This is starting to look like a mystery to me. However, before long, I was told that the cause of the problem has been found. I couldn’t believe it at first, but it makes perfect sense after a few minutes of thinking.

There are a number of compiler macros in my code, one of which is named HAVE_GCC_ATOMIC32. It indicates whether the GCC extension on atomic operations is available. If the atomic operations are not available, it would need a mutual exclusion lock in order to increment some global counters. Clearly the presence or absence of this macro affects how much space is needed for this global counter object. The trouble was that when my code was configured and compiled, this macro was defined; but when the glue code was compiled, it neglected the option ‘-DHAVE_CONFIG_H’ – a common option to tell most code to look into a config.h file for compiler macros. On many platforms, even without HAVE_CONFIG_H, my code can figure out the presence of the GCC extension for atomic operations on the platforms I am familiar with. Therefore, I have never seen anything like this in my own testing and neither have anyone else on the team. Obviously, the fail-safe option was not fool-proof, and we were caught in a surprise: with the macro HAVE_CONFIG_H, the glue code and the visualization system believe the global counter object needs a mutual exclusion lock while inside my own code the global counter was without such a lock. This split personality causes the memory management system to misbehave.

Once told of the problem, it is so obvious it can cause trouble, but I would not have thought of it had my life depended on it. Thanks goodness, there was a master around.



Business Affiliate Programs · SEO · Personals · Advertising · Resources
John Wu © John Wu
Disclaimer

Post your comment at ITtoolbox

The views and opinions expressed in this page are strictly those of the page author.
The contents of this page have not been reviewed or approved by the University of Minnesota.