Wednesday, May 11, 2016

Memory Corruption Detection – An approach using Remote Debugging


While working on an activity to identify the memory corruption issues on a C++ application built using highly asynchronous I/O methodology and time sensitive/dependent processing legs and where the call stack in most cases were typically of the order of 20 or more frames, making high number of possible ordering permutations of call frames to reach a particular function, it was becoming difficult and time-consuming to find those memory corruptions/segmentation violations that were happening on rare and specific call frame orders.
The quest to be efficient led to a something new learning. Before describing how the corruptions were handled, need to describe few terminologies.

Definitions

Memory Corruption

Memory corruption (as defined in Wikipedia [3]) occurs in a computer program when the contents of a memory location are unintentionally modified due to programming errors; this is termed violating memory safety. When the corrupted memory contents are used later in that program, it leads either to program crash or to strange and bizarre program behavior. Nearly 10% of application crashes on Windows systems are due to heap corruption.

Why is detecting memory corruption a tiresome activity?

Memory corruption is one of the most intractable class of programming errors, for two reasons [3]:
  1. The source of the memory corruption and its manifestation may be far apart, making it hard to correlate the cause and the effect.
  2. Symptoms appear under unusual conditions, making it hard to consistently reproduce the error.
  3. Many times normal runs of the program and even run using a debugger becomes successful, even though the program has lurking corruption issues.

Valgrind

Memory profiling tools like valgrind in addition to finding memory leaks, finds issues like a) illegal read/write, b) use of uninitialized and un-addressable values, c) illegal free, d) mismatch in de-allocation function, e) overlapping of source and destination block in copy operation and f) suspected memory size allocation.
But with valgrind one gets the memory profiling report at the end of the execution and there is no effective way to know the state of the system and process when the corruption started happening, i.e. when one of the above-mentioned error points were hit.

gdbserver

[3] gdbserver is a computer program that makes it possible to remotely debug other programs. Running on the same system as the program to be debugged, it allows the GNU Debugger to connect from another system; that is, only the executable to be debugged needs to be resident on the target system ("target"), while the source code and a copy of the binary file to be debugged reside on the developer’s local computer ("host").

Requirement

What was required here was to be able to use memory profiler (valgrind) along with the debugger (gdb), such that the profiler on hitting any of the valgrind monitorable errors breaks automatically allowing inspection of the states of various objects and threads that led to the scenario. Note that manual or conditional breakpoints were ineffective as in most of the cases the process flow was working fine, and breaking manually had impacts on other time sensitive legs.
Didn’t expected any tool to provide the combined functionality. But unexpectedly there was a way available by using the two tools in an integrative way.

Approach

Here what was required is the ability to use the functionality of the debugger (like gdb) together with  memory profiler (like valgrind), where the process on hitting any of the possible memory corruption points automatically pause/break. This allows one to inspects the state of the various process and thread variables/objects, that led to the condition.
This integration was very easily achievable using the valgrind remote debugging hook (its gdbserver implementation). Though remote debugging is originally meant for debugging a program remotely from a separate machine, it can be used on the same m/c as the program being run. This is what was used to detect nasty memory corruptions in a wild asynchronous multi-threaded world in a very simple and efficient way.
The below steps describes in brief what has to be done.
Run the program as below:
              $ valgrind –vgdb=yes –vgdb-error=0 <program name>
where vgdb is a built-in gdbserver on valgrind,
–vgdb-error allows specifying the no. of the errors after which the gdbserver should become active.
On another shell, issue the following commands:
gdb <program name>
$ (gdb)  target remote | vgdb
$ (gdb)  continue
Below is a snapshot of the shell where SIGTRAP was received by the gdb client application from valgrind run on another shell.
Now valgrind on encountering any suspected and valid errors (like invalid write, double free) will signal the gdbserver (vgdb) which in turn will inform the gdb, making the program to break, allowing inspection of the process, threads and objects state, to drill down on the situations leading to corruptions.

Conclusion

Experience to know the power of the approach. It was found it to be very effective in resolving the memory corruptions issues fast.

References

[1]    Debugging remote programs - http://davis.lbl.gov/Manuals/GDB/gdb_17.html
[2]    Using and understanding the Valgrind core: Advanced Topics - http://valgrind.org/docs/manual/manual-core-adv.html
[3]    Memory Corruption - https://en.wikipedia.org/wiki/Memory_corruption

No comments:

Post a Comment