Dealing with Large Symbol Files

Large applications can produce very large symbol files when debug information is enabled (especially at the higher, more verbose levels of debug info!). This article describes some techniques for reducing debug symbol file sizes!

Building With Debug Info Enabled

Let’s start with a small C program:

#include <stdio.h>

int main(int argc, char **argv) {
  printf("hello!\n");
  return 0;
}

Compile it with gcc hello.c -o hello, and let’s take a look at the output size:

❯ size -Ax hello
hello  :
section               size     addr
.interp               0x1c    0x318
.note.gnu.property    0x30    0x338
.note.gnu.build-id    0x24    0x368
.note.ABI-tag         0x20    0x38c
.gnu.hash             0x24    0x3b0
.dynsym               0xa8    0x3d8
.dynstr               0x8d    0x480
.gnu.version           0xe    0x50e
.gnu.version_r        0x30    0x520
.rela.dyn             0xc0    0x550
.rela.plt             0x18    0x610
.init                 0x1b   0x1000
.plt                  0x20   0x1020
.plt.got              0x10   0x1040
.plt.sec              0x10   0x1050
.text                0x112   0x1060
.fini                  0xd   0x1174
.rodata                0xb   0x2000
.eh_frame_hdr         0x34   0x200c
.eh_frame             0xac   0x2040
.init_array            0x8   0x3db8
.fini_array            0x8   0x3dc0
.dynamic             0x1f0   0x3dc8
.got                  0x48   0x3fb8
.data                 0x10   0x4000
.bss                   0x8   0x4010
.comment              0x25      0x0
Total                0x7e9

Phew! lots of stuff in there. We can see a few typical sections like .text, .data, .bss, and several other ones.

Overall file size of this binary is:

ls -lh hello
-rwxrwxr-x 1 noah noah 16K Mar  2 12:26 hello

Wow, 16kB for printing hello!!

Loading this executable into gdb, we get the following warning:

Reading symbols from hello...
(No debugging symbols found in hello)

🥺

Ok, let’s now compile with debug information enabled. Checking the manual for GCC, we see several different options for debug information. Let’s go with the highest level, ggdb3:

❯ gcc -ggdb3 hello.c -o hello.debug
❯ ls -lh hello*
-rwxrwxr-x 1 noah noah  16K Mar  2 12:26 hello
-rw-rw-r-- 1 noah noah   90 Mar  2 12:25 hello.c
-rwxrwxr-x 1 noah noah  42K Mar  2 12:30 hello.debug

Ok, our 90 byte source file compiled to 16kB without debug information, and 42kB with debug information 😮!

Checking the section sizes again, we see some new sections prefixed with .debug_ that account for all of the added size:

# diff the result of the 'size' command for the 'hello' and 'hello.debug' files
❯ diff -duw <(size -Ax hello) <(size -Ax hello.debug)
--- /proc/self/fd/13    2022-03-02 12:34:14.796360776 -0500
+++ /proc/self/fd/15    2022-03-02 12:34:14.796360776 -0500
@@ -1,4 +1,4 @@
-hello  :
+hello.debug  :
 section               size     addr
 .interp               0x1c    0x318
 .note.gnu.property    0x30    0x338
@@ -27,6 +27,13 @@
 .data                 0x10   0x4000
 .bss                   0x8   0x4010
 .comment              0x25      0x0
-Total                0x7e9
+.debug_aranges         0x30      0x0
+.debug_info            0xb9      0x0
+.debug_abbrev          0x66      0x0
+.debug_line            0xe8      0x0
+.debug_str           0x4e94      0x0
+.debug_line_str       0x1f5      0x0
+.debug_macro         0x1233      0x0
+Total                0x6cdc

Note: if you’re using the arm-none-eabi tools from ARM, you might want to use those variants of the binutils referenced in this article (size, strip, objcopy), for example arm-none-eabi-size here instead of size

Strategies for Reducing Debug Info

Let’s go through some techniques for reducing the debug info section sizes.

Remove Debug Info with strip

It’s possible to entirely remove debug info after building, using the strip binutil:

❯ strip hello.debug -o hello.strip
# now diff the original, non-debug build with the stripped build
❯ diff -duw <(size -Ax hello) <(size -Ax hello.strip)
--- /proc/self/fd/13    2022-03-02 13:43:27.298081401 -0500
+++ /proc/self/fd/15    2022-03-02 13:43:27.298081401 -0500
@@ -1,4 +1,4 @@
-hello  :
+hello.strip  :
 section               size     addr
 .interp               0x1c    0x318
 .note.gnu.property    0x30    0x338

The debug data is entirely removed from the binary:

ls -l hello*
-rwxrwxr-x 1 noah noah     15952 Mar  2 12:26 hello
-rw-rw-r-- 1 noah noah        90 Mar  2 12:25 hello.c
-rwxrwxr-x 1 noah noah     42336 Mar  2 12:30 hello.debug
-rwxrwxr-x 1 noah noah     14464 Mar  2 13:43 hello.strip

The stripped binary is actually a bit smaller than the one compiled without debug sections (stripping that binary results in an identical file size). Identifying the additional data that’s been stripped is left for a future exercise 😬.

Since the debug info is removed, as in the original binary, it’s no longer possible to get rich backtraces etc. when debugging the binary.

Extract Debug Info with objcopy

It’s possible to copy the debug info sections into a separate file using objcopy:

❯ objcopy --only-keep-debug hello.debug hello.debug.dbg
❯ ls -lh hello.debug*
-rwxrwxr-x 1 noah noah 42K Mar  2 12:30 hello.debug
-rwxrwxr-x 1 noah noah 31K Mar  2 14:00 hello.debug.dbg

Note that the section headers for the non-debug sections are still present in the file, so using size to examine the binary will show deceptive information (eg, .text, .data, etc). From the manual:

Note - the section headers of the stripped sections are preserved, including their sizes, but the contents of the section are discarded. The section headers are preserved so that other tools can match up the debuginfo file with the real executable, even if that executable has been relocated to a different address space.

You can then strip the debug binary:

❯ strip -o hello.debug.stripped hello.debug

And in GDB, you can manually load a symbol file:

Reading symbols from hello.debug.stripped...
(No debugging symbols found in hello.debug.stripped)
>>> symbol-file hello.debug.dbg  # using this command to load the symbol file
Reading symbols from hello.debug.dbg...

From the same manual page, it’s suggested to add a debuglink so gdb can automatically locate the debug information:

objcopy --add-gnu-debuglink=hello.debug.dbg hello.debug.stripped

Then GDB auto-loads the debug info file:

Reading symbols from hello.debug.stripped...
Reading symbols from /home/noah/dev/test/misc/hello.debug.dbg...

This only works if the position of the files remains consistent.

Note that there’s a newish approach to using debug info files called debuginfod, which enables GCC to retrieve debug info from a remote host. You can read more about that here: https://developers.redhat.com/blog/2019/10/14/introducing-debuginfod-the-elfutils-debuginfo-server

Compress Debug Info with objcopy

Instead of striping or splitting out the debug info, it can be compressed in-place:

❯ objcopy --compress-debug-sections hello.debug hello.debug.compressed
❯ ls -lh hello.debug*
-rwxrwxr-x 1 noah noah 42K Mar  2 12:30 hello.debug
-rwxrwxr-x 1 noah noah 26K Mar  2 14:09 hello.debug.compressed

The binary size is greatly reduced, even for a small program like this (still quite a bit larger than the stripped binary though).

View of the debug info section reduction:

❯ diff -duw <(size -Ax hello.debug) <(size -Ax hello.debug.compressed)
--- /proc/self/fd/13    2022-03-02 14:09:36.028276594 -0500
+++ /proc/self/fd/15    2022-03-02 14:09:36.024276703 -0500
@@ -1,4 +1,4 @@
-hello.debug  :
+hello.debug.compressed  :
 section                size     addr
 .interp                0x1c    0x318
 .note.gnu.property     0x30    0x338
@@ -28,12 +28,12 @@
 .bss                    0x8   0x4010
 .comment               0x25      0x0
 .debug_aranges         0x30      0x0
-.debug_info            0xb9      0x0
+.debug_info            0x98      0x0
 .debug_abbrev          0x66      0x0
-.debug_line            0xe8      0x0
-.debug_str           0x4e94      0x0
-.debug_line_str       0x1f5      0x0
-.debug_macro         0x1233      0x0
-Total                0x6cdc
+.debug_line            0xb6      0x0
+.debug_str           0x1853      0x0
+.debug_line_str       0x103      0x0
+.debug_macro          0xabe      0x0
+Total                0x2de1

We can see the .debug_str section is most aggressively reduced, but all around improvement.

There shouldn’t be a significant performance hit when loading a compressed vs. uncompressed binary into GDB; on my computer, it’s a few ms for this small program (using the excellent hyperfine benching tool):

❯ hyperfine 'gdb -batch -ex quit hello.debug' 'gdb -batch -ex quit hello.debug.compressed'
Benchmark 1: gdb -batch -ex quit hello.debug
  Time (mean ± σ):     418.8 ms ±   9.3 ms    [User: 350.2 ms, System: 67.6 ms]
  Range (min … max):   399.8 ms … 428.9 ms    10 runs

Benchmark 2: gdb -batch -ex quit hello.debug.compressed
  Time (mean ± σ):     425.9 ms ±  14.7 ms    [User: 365.6 ms, System: 60.0 ms]
  Range (min … max):   407.7 ms … 451.3 ms    10 runs

Summary
  'gdb -batch -ex quit hello.debug' ran
    1.02 ± 0.04 times faster than 'gdb -batch -ex quit hello.debug.compressed'

With larger files it might be more significant, but I haven’t tested that.

Note that it’s possible to decompress the debug sections with objcopy using the --decompress-debug-sections flag.

Compress Debug Info with dwz

The dwz tool available here (and from many package managers) performs some nice optimizations of DWARF info (the fundamental format of the debug info data). You can read some more details in the man page, either by downloading the source and loading it:

❯ man -l dwz.1

Or from one of the many online copies, for example Debian Linux.

Running it on our test binary:

❯ dwz hello.debug -o hello.debug.dwz
❯ ls -l hello.debug*
-rwxrwxr-x 1 noah noah 42336 Mar  2 12:30 hello.debug
-rwxrwxr-x 1 noah noah 26144 Mar  2 14:11 hello.debug.compressed
-rwxrwxr-x 1 noah noah 42304 Mar  2 14:22 hello.debug.dwz
-rwxrwxr-x 1 noah noah 14568 Mar  2 14:07 hello.debug.stripped

Ok, that didn’t really improve much 😕 it’s possible the debug info for such a small program is already pretty optimized, or maybe dwz doesn’t bother optimizing below a threshold (I haven’t checked).

Let’s try it again on a large binary, the cmake program compiled with CMAKE_BUILD_TYPE=Debug:

# use dwz to optimize the debug information in 'bin/cmake'
❯ dwz bin/cmake -o cmake.dwz
# use objcopy to compress debug info in the result
❯ objcopy --compress-debug-sections cmake.dwz cmake.dwz.compressed
# and for completeness, also compress debug info in the original cmake file
❯ objcopy --compress-debug-sections bin/cmake cmake.compressed
# let's take a look at file sizes nowls -lh bin/cmake cmake.dwz cmake.dwz.compressed cmake.compressed
-rwxrwxr-x 1 noah noah 179M Mar  2 14:46 bin/cmake
-rwxrwxr-x 1 noah noah  88M Mar  2 15:45 cmake.compressed
-rwxrwxr-x 1 noah noah 129M Mar  2 14:47 cmake.dwz
-rwxrwxr-x 1 noah noah  70M Mar  2 14:48 cmake.dwz.compressed

To summarize:

Strategy File Size Size Compared to Base
base 179MB 100%
dwz 129MB 72%
--compress-debug-sections 88MB 49%
dwz + --compress-debug-sections 70MB 39%

I also tested on another large C++ program, and the savings were very impressive:

ls -lh program.*
-rwxrwxr-x 1 noah noah 146M Mar  2 16:51 program
-rw-rw-r-- 1 noah noah  66M Mar  2 16:52 program.compressed
-rwxrwxr-x 1 noah noah  37M Mar  2 16:53 program.dwz
-rwxrwxr-x 1 noah noah  16M Mar  2 16:53 program.dwz.compressed

146MB for the original binary, down to 16M for the dwz and compressed one, or 11% of the original!

Analyzing Binaries for Bloat Using bloaty

An interesting tool for examining binary bloat is bloaty:

https://github.com/google/bloaty

It’s pretty straightforward to build, I followed the instructions here:

https://github.com/google/bloaty#install

It has a nice diff view, for example:

# run bloaty on the cmake binary with debug symbols.
# this output should look pretty similar to 'size -Ax'
❯ ~/dev/github/bloaty/build/bloaty bin/cmake
    FILE SIZE        VM SIZE
 --------------  --------------
  68.2%   121Mi   0.0%       0    .debug_info
  11.2%  19.9Mi   0.0%       0    .debug_str
   5.9%  10.5Mi  71.9%  10.5Mi    .text
   5.1%  9.04Mi   0.0%       0    .strtab
   3.5%  6.28Mi   0.0%       0    .debug_line
   1.2%  2.21Mi  15.1%  2.21Mi    .eh_frame
   1.2%  2.08Mi   0.0%       0    .symtab
   1.0%  1.81Mi   0.0%       0    .debug_abbrev
   1.0%  1.73Mi   0.0%       0    .debug_aranges
   0.7%  1.17Mi   0.0%       0    .debug_rnglists
   0.4%   724Ki   4.8%   724Ki    .rodata
   0.3%   543Ki   3.6%   543Ki    .eh_frame_hdr
   0.2%   307Ki   2.0%   307Ki    .gcc_except_table
   0.1%   149Ki   1.0%   149Ki    .rela.dyn
   0.0%  63.4Ki   0.4%  63.4Ki    .data.rel.ro
   0.0%  60.4Ki   0.0%       0    .debug_line_str
   0.0%  59.9Ki   0.3%  49.5Ki    [24 Others]
   0.0%       0   0.3%  49.8Ki    .bss
   0.0%  23.4Ki   0.2%  23.4Ki    .dynstr
   0.0%  21.4Ki   0.1%  21.4Ki    .dynsym
   0.0%  19.4Ki   0.1%  19.4Ki    .rela.plt
 100.0%   178Mi 100.0%  14.6Mi    TOTAL

# now run it against the compressed debug info version, and
# also passing a '-- BASE_FILE' to show the difference
❯ ~/dev/github/bloaty/build/bloaty cmake.compressed -- bin/cmake
    FILE SIZE        VM SIZE
 --------------  --------------
  +8.1%      +3  [ = ]       0    .comment
 -85.3% -51.5Ki  [ = ]       0    .debug_line_str
 -66.6%  -801Ki  [ = ]       0    .debug_rnglists
 -78.2% -1.35Mi  [ = ]       0    .debug_aranges
 -79.1% -1.43Mi  [ = ]       0    .debug_abbrev
 -68.1% -4.28Mi  [ = ]       0    .debug_line
 -89.8% -17.9Mi  [ = ]       0    .debug_str
 -53.2% -64.7Mi  [ = ]       0    .debug_info
 -50.7% -90.5Mi  [ = ]       0    TOTAL

It’s pretty handy for quickly examining binaries, but has a lot of other features for pretty deep analysis, check out the manual here:

https://github.com/google/bloaty/blob/master/doc/using.md

upx: the “Ultimate Packer for eXecutables”

This tool compresses a binary file, adds a wrapper that’s capable of unpacking it into memory at runtime, and bundles that into a standalone executable that when run behaves like the original binary.

It achieves great compression ratios:

❯ upx --preserve-build-id -o cmake.upx bin/cmake
❯ ls -lh bin/cmake cmake.upx*
-rwxrwxr-x 1 noah noah 179M Mar  2 14:46 bin/cmake
-rwxrwxr-x 1 noah noah  42M Mar  2 14:46 cmake.upx

It also preserves debug information (you have to decompress the file with upx -d first though), what’s not to love! (it does require certain features of the host runtime environment so the executable can unpack itself into memory on launch and before executing the compressed binary 😅).

GDB Index Files for Large Symbol Files

Something you may notice when manipulating large binaries in GDB: lots of operations are slowwww, including the initial load. Fortunately, there’s an easy way to speed this up: pre-generate the GDB index.

For the cmake executable, this works like so:

cp bin/cmake bin/cmake.gdbindex
❯ gdb-add-index bin/cmake.gdbindex
# let's take a look at file sizes:ls -lh bin/cmake*
-rwxrwxr-x 1 noah noah 179M Mar  2 14:46 bin/cmake
-rwxrwxr-x 1 noah noah 203M Mar  2 17:44 bin/cmake.gdbindex
# 😓 ouch, that added a lot to the file size

Now benchmarking the speed at which gdb can load the files:

❯ hyperfine 'gdb -batch -ex quit bin/cmake' 'gdb -batch -ex quit bin/cmake.gdbindex'
Benchmark 1: gdb -batch -ex quit bin/cmake
  Time (mean ± σ):      4.105 s ±  0.045 s    [User: 4.814 s, System: 0.146 s]
  Range (min … max):    4.040 s …  4.193 s    10 runs

Benchmark 2: gdb -batch -ex quit bin/cmake.gdbindex
  Time (mean ± σ):      1.046 s ±  0.023 s    [User: 1.847 s, System: 0.128 s]
  Range (min … max):    1.013 s …  1.079 s    10 runs

Summary
  'gdb -batch -ex quit bin/cmake.gdbindex' ran
    3.93 ± 0.09 times faster than 'gdb -batch -ex quit bin/cmake'

😮 impressive speed up! It’s somewhat of a niche operation to optimize for, but since it’s part of the edit-compile-test loop, it could be worth doing.

Splitting Debug Info at Build Time

GCC and Clang both support an option to split out DWARF info when compiling source files into object files:

https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#index-gsplit-dwarf

Here’s an example:

# compile + link with '-gsplit-dwarf'
❯ gcc -ggdb3 -gsplit-dwarf hello.c -o hello.debug

# examine file sizes- note the .dwo file,
# and the smaller hello.debug file sizels -l hello.debug*
-rwxrwxr-x 1 noah noah 17776 Mar  2 18:03 hello.debug
-rw-rw-r-- 1 noah noah 31208 Mar  2 18:03 hello.debug-hello.dwo

# test that symbols are working correctly
❯ gdb -batch --ex 'info line main' hello.debug
threads module disabled
Line 3 of "hello.c" starts at address 0x1149 <main> and ends at 0x115c <main+19>.

# delete the dwo file and retestrm hello.debug-hello.dwo
❯ gdb -batch --ex 'info line main' hello.debug
threads module disabled

warning: Could not find DWO CU hello.debug-hello.dwo(0xef39648d198fbc2d) referenced by CU at offset 0x0 [in module /home/noah/dev/test/misc/hello.debug]
Dwarf Error: unexpected tag 'DW_TAG_skeleton_unit' at offset 0x0 [in module /home/noah/dev/test/misc/hello.debug]
No line number information available for address 0x1149 <main>

This has a couple of advantages:

  • small (essentially stripped) binaries are produced without post-processing
  • link times can be dramatically improved, because the linker has to load much less data when it memmap’s the input object files

The disadvantage is you’ll need to keep those .dwo files around in order for GDB to locate them when you load the binary, and they’re not very portable.

In theory there is the dwp tool to pack up the .dwo files into a single archive that can be shipped with the binary, but I had a bad time trying to get it to work 😞.

https://gcc.gnu.org/wiki/DebugFissionDWP

Clang’s -gembed-source

This is a very interesting feature that Clang provides (alas, it’s not present in GCC, at least as of GCC 11):

https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-gembed-source

Using it goes something like this:

# build with clang, with -gembed-source.
# also strip the current working directory from the debug prefixes:
# https://interrupt.memfault.com/blog/reproducible-firmware-builds#fdebug-prefix-map
❯ clang -gdwarf-5 -gembed-source -fdebug-prefix-map=$(pwd)=. hello.c -o hello.embedsource

(This actually increases the size of the binary, unlike the other tools in this article, but it’s such a nice feature I couldn’t resist including it here).

You can access the source data using llvm-dwarfdump or llvm-objcopy:

❯ llvm-dwarfdump-12 --color --debug-line-str hello.embedsource
hello.embedsource:      file format elf64-x86-64

.debug_line_str contents:
0x00000000: "."
0x00000002: "hello.c"
0x0000000a: "#include <stdio.h>\n\nint main(int argc, char **argv) {\n  printf(\"hello!\\n\");\n  return 0;\n}\n"
❯ llvm-objdump-13 --disassemble-symbols=main --source hello.embedsource

hello.embedsource:      file format elf64-x86-64

Disassembly of section .text:

0000000000401130 <main>:
; int main(int argc, char **argv) {
  401130: 55                            pushq   %rbp
  401131: 48 89 e5                      movq    %rsp, %rbp
  401134: 48 83 ec 10                   subq    $16, %rsp
  401138: c7 45 fc 00 00 00 00          movl    $0, -4(%rbp)
  40113f: 89 7d f8                      movl    %edi, -8(%rbp)
  401142: 48 89 75 f0                   movq    %rsi, -16(%rbp)
;   printf("hello!\n");
  401146: 48 bf 04 20 40 00 00 00 00 00 movabsq $4202500, %rdi          # imm = 0x402004
  401150: b0 00                         movb    $0, %al
  401152: e8 d9 fe ff ff                callq   0x401030 <printf@plt>
;   return 0;
  401157: 31 c0                         xorl    %eax, %eax
  401159: 48 83 c4 10                   addq    $16, %rsp
  40115d: 5d                            popq    %rbp
  40115e: c3                            retq

(Unfortunately I couldn’t convince lldb to show this information automatically, but hopefully that will be available eventually!)

Summary

In general, I think the best option to use is:

objcopy --compress-debug-sections

This requires no changes when using the file- debuggers and binutils are all good about unpacking the data, and worst case it can be reveresed with --decompress-debug-sections.

The dwz tool definitely performs nice optimizations on larger binaries, but it seems to be very dependent on the specific file. In some cases it provides superb optimization.

The remaining strategies covered in this article are more situation-specific.

Hope you enjoyed reading this! I certainly learned a lot investigating these tools.

See anything you'd like to change? Submit a pull request or open an issue at GitHub

References

Noah Pendleton is an embedded software engineer at Memfault. Noah previously worked on embedded software teams at Fitbit and Markforged