CTF conversion fails with large files
In continuing the rollout of CTF to more of OmniOS userland, I hit a memory scaling problem with a 30MiB shared library with 3,816 compilation units.
ctfconvert cannot even initialise the die list without running out memory (ctfconvert is 32-bit).
Talking to Jonathan Perkin, he hit the same issue with the same shared library (libicudata.so) a while back, and wrote a patch for ctfconvert. This is open as https://smartos.org/bugview/OS-6485 and https://github.com/joyent/illumos-joyent/pull/215 but CTF in gate has moved on a bit since.
Here's the write-up from his issue:
During the work to CTF convert pkgsrc, libicudata.so exposed memory scaling issues with ctfconvert. It is a 29MB shared library containing 3,449 DIEs.
The current conversion process allocates memory as follows:
- A new ctf_die_t is created for each DIE during the initialisation process.
- Each ctf_die_t holds a DWARF handle open on the input file.
- Each ctf_die_t includes a new ctf_file_t allocation, each of which mmap()'s its own private copy of the CTF data, symtab, and strtab from the object.
- During the merge process, for each DIE, an extra ctf_file_t is allocated, again with its own mmap'ed private copies, as the destination for the merge output.
- These allocations are only freed at the end of the entire conversion process.
- With all these allocations, a 32-bit ctfconvert process runs out of available memory while processing libicudata.so.
With the proposed patch the conversion process is changed to be as follows:
- Instead of allocating a full ctf_die_t for every DIE up-front, we instead defer full initialisation to the main conversion process in ctf_dwarf_convert_one().
- DIEs are processed in batches, defaulting to a batch size of 256 (configurable via -b on the command line if necessary).
- Each batch is converted and merged until a single merged ctf_file_t is returned as the result.
- After processing a batch, the input DIEs for that batch are freed.
- The merged ctf_file_t is added as an input to the next batch.
- The process continues until we have processed all batches and end with a final merged ctf_file_t, or a failure.
The default batchsize of 256 is based on a few constraints:
- Processing DIEs in multiple batches means that ctf_id_t's will be different compared to those generated by a previous ctfconvert. If we're able to choose a batchsize which is larger than the
number of DIEs in most objects then we will avoid changing ctf_id_t's. Whilst mostly cosmetic, it's still nice to avoid differences if possible.
- Performance goes up as the batchsize increases, at least when using the default of 4 threads.
- The batchsize needs to be well below the number of DIEs that can be processed without hitting memory limits.
Updated by Andy Fiddaman about 1 month ago
I've tested the change in the attached Gerrit review.
wsdiff (recently updated to compare CTF) on two gate builds showed no CTF differences apart from in libctf itself.
For doing the single object conversions in gate, the new version is measurably faster. Most likely because it no longer allocates two libdwarf handles and does not use the task queue for processing.
Testing with two sample objects:
sunddi.o - 741K
Old: % hyperfine -w1 -r50 'ctfconvert -o /dev/null sunddi.o' Time (mean): 225.8 ms +- 16.4 ms [User: 215.2 ms, System: 8.2 ms] Range (min | max): 200.5 ms | 253.4 ms 50 runs New: % hyperfine -w1 -r50 'ctfconvert -o /dev/null sunddi.o' Time (mean): 213.4 ms +- 21.2 ms [User: 203.1 ms, System: 8.3 ms] Range (min | max): 184.3 ms | 255.5 ms 50 runs
unix.o - 29.1M
Old: % hyperfine -w1 -r10 'ctfconvert -o /dev/null unix.o' Time (mean): 26.849 s +- 0.613 s [User: 66.796 s, System: 2.360 s] Range (min | max): 25.582 s | 27.639 s 10 runs New: % hyperfine -w1 -r10 'ctfconvert -o /dev/null unix.o' Time (mean): 17.836 s +- 0.467 s [User: 52.066 s, System: 2.359 s] Range (min | max): 17.049 s | 18.467 s 10 runs
I also tested converting libcrypto.so.1.1, 13MiB, which contains 659 DWARF CUs. The current gate version cannot convert this file due to one of the CUs (from aesni-mb-x86_64.s) not having any debug data - it aborts the whole process.
I did this test to get a better feel for the default level of threads and batch size.
ctfconvert has always defaulted to 4 threads and, with this change, defaults to batching the CUs up into blocks of 256.
Threads Batch size -------------------- ... 1 ... 16 ... 29.74s user 1.10s system 99% cpu 30.924 total ... 1 ... 32 ... 19.45s user 1.17s system 99% cpu 20.693 total ... 1 ... 64 ... 15.64s user 1.18s system 99% cpu 16.893 total ... 1 ... 128 ... 13.79s user 1.12s system 99% cpu 14.977 total ... 1 ... 256 ... 12.87s user 1.19s system 99% cpu 14.140 total ... 1 ... 512 ... 11.58s user 1.17s system 99% cpu 12.844 total ... 4 ... 16 ... 41.40s user 1.94s system 140% cpu 30.875 total ... 4 ... 32 ... 28.28s user 1.85s system 169% cpu 17.817 total ... 4 ... 64 ... 23.80s user 1.83s system 191% cpu 13.408 total ... 4 ... 128 ... 21.26s user 1.82s system 213% cpu 10.810 total ... 4 ... 256 ... 19.23s user 1.75s system 235% cpu 8.911 total ... 4 ... 512 ... 18.59s user 1.87s system 243% cpu 8.413 total ... 8 ... 16 ... 47.24s user 2.76s system 163% cpu 30.494 total ... 8 ... 32 ... 34.63s user 2.60s system 205% cpu 18.153 total ... 8 ... 64 ... 28.96s user 2.54s system 245% cpu 12.819 total ... 8 ... 128 ... 25.95s user 2.52s system 282% cpu 10.079 total ... 8 ... 256 ... 23.45s user 2.46s system 321% cpu 8.063 total ... 8 ... 512 ... 23.43s user 2.56s system 333% cpu 7.801 total ... 16 ... 16 ... 53.59s user 3.65s system 172% cpu 33.138 total ... 16 ... 32 ... 40.53s user 3.56s system 232% cpu 18.947 total ... 16 ... 64 ... 35.39s user 3.36s system 285% cpu 13.591 total ... 16 ... 128 ... 32.44s user 3.31s system 335% cpu 10.650 total ... 16 ... 256 ... 30.04s user 3.37s system 382% cpu 8.744 total ... 16 ... 512 ... 29.72s user 3.38s system 406% cpu 8.135 total ... 32 ... 16 ... 52.85s user 4.06s system 176% cpu 32.242 total ... 32 ... 32 ... 43.53s user 4.21s system 239% cpu 19.897 total ... 32 ... 64 ... 38.01s user 3.91s system 295% cpu 14.200 total ... 32 ... 128 ... 34.79s user 3.65s system 353% cpu 10.878 total ... 32 ... 256 ... 33.24s user 3.63s system 416% cpu 8.862 total ... 32 ... 512 ... 32.23s user 3.74s system 428% cpu 8.402 total ... 64 ... 16 ... 54.10s user 4.46s system 176% cpu 33.251 total ... 64 ... 32 ... 42.43s user 4.40s system 236% cpu 19.833 total ... 64 ... 64 ... 38.14s user 4.28s system 290% cpu 14.591 total ... 64 ... 128 ... 33.67s user 4.03s system 339% cpu 11.104 total ... 64 ... 256 ... 33.59s user 3.93s system 401% cpu 9.352 total ... 64 ... 512 ... 32.82s user 3.97s system 424% cpu 8.660 total
4 threads and batches of 256 seems to still be a sweet spot (the same conclusion was reached by jperkin in https://smartos.org/bugview/OS-6485 for libicudata.so, which I still can't convert)
Updated by Andy Fiddaman 18 days ago
With thanks to Jonathan Perkin, a full pkgsrc build has been done with the updated CTF tools with good results.
A total of 22608 objects were successfully converted, slightly up on the existing pkgsrc CTF tools (which are patched versions of older gate bits).
The summarised output from the full pkgsrc build with the updated tools is:
1 assertion failed for thread 0xfe7e0a40, thread-id 3: kind != CTF_ERR, file /data/omnios-build/omniosorg/bloody/illumos/usr/src/lib/libctf/common/ctf_dwarf.c, line 1377 1 ctfconvert: CTF conversion failed: Failed to mmap a needed data section 1 ctfconvert: die main.cpp: failed to convert strong functions and variables: Invalid type identifier 1 ctfconvert: die posix.c: failed to add inputs for merge: Failed to mmap a needed data section 1 ctfconvert: die qrc_kdedeprecated.cxx: failed to convert strong functions and variables: Invalid type identifier 1 ctfconvert: failed to get tag type: DW_DLE_DIE_NULL (52) 1 ctfconvert: failed to get unsigned attribute for type: DW_DLE_BAD_REF_FORM. The form code is 0x10 which does not have an offset for dwarf_formref() to return. 2 ctfconvert: failed to get DW_FORM_ref4 (19) value for DW_AT_upper_bound: DW_DLE_ATTR_FORM_BAD: In function formudata (internal function) on seeing form 0x13 (DW_FORM_ref4) 4 ctfconvert: failed to add member hidden: Invalid type identifier 5 ctfconvert: encountered unknown DWARF encoding: 16 5 ctfconvert: failed to add member tlsh: Invalid type identifier 5 ctfconvert: failed to add member variable: Invalid type identifier 6 ctfconvert: failed to add member obj: Invalid type identifier 8 CTF conversion failed: Invalid type identifier 8 ctfconvert: failed to add member UNNAMED: Duplicate member name definition 10 ctfconvert: failed to get unsigned attribute for type: DW_DLE_ATTR_FORM_BAD: In function formudata (internal function) on seeing form 0xd (DW_FORM_sdata) 12 ctfconvert: failed to add member <various>: Limit on number of dynamic type members reached 22 ctfconvert: CTF conversion failed: No such file or directory 27 CTF conversion failed: Invalid argument 29 /usr/bin/bash: ctfconvert: command not found 109 truncating enumeration <various> at member <various> 205 ctfconvert: CTF conversion failed: Invalid type identifier 682 ctfconvert: CTF conversion failed: Invalid argument 690 ctfconvert: file <various>.c is missing debug info 3283 ctfconvert: file does not contain DWARF data
It's difficult to directly compare with the existing tools since they produce different output and less informative messages (that's mostly due to
6885 CTF Everywhere Part 1 - the old tools were forked before this came in) , but Jonathan tells me this is comparable, and that the additional information in the messages is helpful in terms of working out why some objects do not convert.
Updated by Andy Fiddaman 17 days ago
Sorry, I didn't phrase that very well.
It's the existing pkgsrc tools (which were forked before the CTF everywhere changes) that produce less useful error messages.
That is what makes it difficult to directly compare an old pkgsrc build with a newer one that uses the gate tools + the four reviews I have open.
Updated by Electric Monk 15 days ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
commit effb27ee30c48fe502152c38487ced379d9f8693 Author: Andy Fiddaman <email@example.com> Date: 2020-11-12T21:15:16.000Z 13247 CTF conversion fails with large files 13251 CTF conversion fails if any CU is missing DWARF data Portions Contributed by: Jonathan Perkin <firstname.lastname@example.org> Reviewed by: Robert Mustacchi <email@example.com> Approved by: Dan McDonald <firstname.lastname@example.org>