During the course of developing large C++ projects, one will inevitably encounter bugs that crash the program without much information, such as segmentation faults or uncaught exceptions. To debug such problems, especially in large multithread programs with frequent interaction between modules, one typically wants to backtrace the call stack that leads to the crash site with function names, filenames, and line number.
The GNU Debugger, GDB, was specifically designed for such purposes and much more. Many popular IDEs' built-in debugging features are actually wrappers of GDB with some GUI. Without an IDE, one can still use GDB in the terminal to debug a program interactively. For example, Law@2015 and Law@2016 provided good tutorials on using GDB interactively.
However, when developing a C++ program that interacts with other programs running on realtime systems, there could be additional difficulties. For instance, the other program might periodically check how often a client program interacts with it; if the interaction rate is not consistent, it might terminate the interaction and return some error status. Which happens quite often in robotics software development.
In such cases, one cannot use GDB to debug a program interactively. Since stepping through a program will cause the program to pause in the middle of execution. Even without stepping, running a program with GDB is typically resource heavy. All of which will cause the program fail to meet the realtime requirements, hence result in failure without even reaching the crash point.
To address these issues, one can configure system core dump, run the program, let it crash and generate a core dump file, then use GDB to backtrace the crashing call stack from the file. This post summarizes necessary steps to achieve it in Linux system with a simple example project. For reference, see Lll@2022 , Evans@2018 , and Aleksander@2009 .
Debug, RelWithDebInfo, Release, and MinSizeRel,
Debug or RelWithDebInfo for debugging purposes since they produce
core dump files with debug symbols,
Release or MinSizeRel in production for program efficiency and size,
see
Cmake@BuildType
for more information.
core_pattern specifies the core dump file's path and name pattern, typically in
/proc/sys/kernel/core_pattern, for ways to configure it, see "Core dumps and systemd"
section in
Lmp@core
,
ulimit, can specify max size of core files created,
File structure, standard CMake project
$ example-project/
| -- build/
| -- include/foo.hpp
| -- CMakeLists.txt
| -- main.cpp
Contents of CMakeLists.txt, setting executable name to example-project, build type to
Debug
cmake_minimum_required(VERSION 3.14 FATAL_ERROR)
project(example-project)
set(CMAKE_BUILD_TYPE Debug) # or RelWithDebInfo
add_executable(${PROJECT_NAME} main.cpp)
target_include_directories(${PROJECT_NAME} PUBLIC ${CMAKE_SOURCE_DIR}/include)
Contents of include/foo.hpp, with segmentation fault source
#pragma once
#include <cstdio>
void bar() {
int* i = nullptr;
*i = 1; // segmentation fault source
return;
}
Contents of main.cpp, with segmentation fault call stack
#include "foo.hpp"
int main() {
bar(); // segmentation fault call stack
return 0;
}
Check core_pattern configuration, path and name pattern
# Check system core_pattern configuration
cat /proc/sys/kernel/core_pattern
# By default, it's typically set to a system default path and name pattern, for example in Ubuntu:
# |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
Change and verify core_pattern configuration, or meaning of the placeholders, see "Naming
of core dump files" section in
Lmp@core
.
# Change system configuration to where the binary is executed
sudo sysctl -w kernel.core_pattern=./core.%E.%p.%h.%t
# Verify configuration by below, now it should show `./core.%E.%p.%h.%t`.
cat /proc/sys/kernel/core_pattern
The above setup is valid per operation system session, i.e., if the system reboots, settings will be reset to system default values.
Check core file maximum size
# Check core file maximum size
ulimit
Change it and verify
# Change it to unlimited
ulimit -c unlimited
# Verify change by below, now it should show `unlimited`.
ulimit -c
The above setup is valid per terminal session, it has to be set for each new terminal session launched.
For example, in ~/example-project/build
# Compile
cmake .. && make
# Run program
./example-project
# Terminal should show `Segmentation fault (core dumped)`.
# In the current folder, there should be a `core.!launch-absolute-path!example-project.xx.yy.zz` file, as configured in the core_pattern.
Now we are ready to launch gdb with the executable and core dump file to backtrace the bug, for example in
~/example-project/build.
# gdb [executable_generating_core_dump] [core_dump_file]
gdb ./example-project \
core.!executable!absolute!path!example-project.xx.yy.zz
The last few lines of GDB should show something similar to below, which indicates the segmentation fault point.
Reading symbols from ./example-project...
[New LWP 9970]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./example-project'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 bar () at /{example-project-absolute-path}/include/foo.hpp:6
6 *i = 1; // segmentation fault point
(gdb)
Issue command backtrace or bt, GDB will print backtrace information similar to below,
indicating call stack filename, function, and line number.
(gdb) bt
#0 bar () at /{example-project-absolute-path}/include/foo.hpp:6
#1 main () at /{example-project-absolute-path}/main.cpp:4
(gdb)
The above method also applies to other operating systems. For example, in QNX, the procedure is similar.
However, to configure core dump files, QNX uses dumper, see
QNX dumper. And QNX provides GDB executables for various target platforms for people to
cross-platform debug, see
QNX GDB.
See $gdb --help for all options, below are some convenient ones.
# Run executable in GDB directly, without having to type `run` in GDB
gdb -ex run /{example-project-absolute-path}/example-project
# Run executable in GDB directly, with input arguments
gdb -ex run --args /{example-project-absolute-path}/example-project -a FOO -b BAR
Build type Debug typically offers most debug information when tracing call stack. However, the code
oftentimes isn't optimized. Such code could run much slower than Release build type, and may
result in overtime in some realtime settings with hard loop rate constraints. Build type
RelWithDebInfo offers a compromise to balance between code efficiency and debug information.
My personal preference for debug options are:
Release build may produce
sufficient call stack information, which doesn't not require rebuild,Debug build type and debug with core dump,Debug build type results in violation of realtime loop rate constraints, one can use
RelWithDebInfo
RelWithDebInfo still violates realtime constraints, consider temporarily relax the realtime
loop rate constraints to make debug possible.
If executable file size is a concern, one can reduce its size by stripping the debug symbols via
strip -g ./example-project -o ./example-project.strip
run the stripped executable by ./example-project.strip, generate a core dump file, then analyze the
core dump file generated by the stripped executable via
gdb ./example-project -c ./core.example-project.strip.xx.yy.zz
Which typically still gets enough debug information from GDB since the executable analyzing the core dump file has debug symbols.