ctf symbol mapping needs work
CTF symbol mapping as implemented in the new round of CTF tools are broken. CTF symbol mapping refers to the act of taking information in a debug format, such as DWARF, and associating that with the corresponding symbol table entries while converting them into CTF.
When I introduced the new tools in OS-4548 (CTF Everywhere Part 1), the way that this mapping was performed was rather naive and in the cast of doing a ctfconvert on a final binary, entirely broken and would result in having few data objects in the CTF data.
The first issue is mostly easily seen when we convert multiple object files that all have the same static symbol, but different types. Imagine each file has a function that returns a different type. Unfortunately, the existing logic would just naively match them based on the name of the symbol and a few other things. This is fundamentally flawed as evidenced above.
The second case is one where when building our lists of symbols and functions we were adding entries that didn't have mappings into the lists, causing them to grow in size, but then also taking the first ones that we saw. This combined to basically not have valid type data for a number of types as the first dwarf compilation unit when converting all at once, would have only a handful of mappings for entries in the symbol table.
The solution here is to really overhaul how we do symbol mapping. First, we need to track a lot more of the elf information that's present. In essence, when matching local symbols we need to use the corresponding file that's indicated by the STT_FILE symbol. This stricter checking has some knock on effects.
These arise in the case of a mapfile reducing symbol visibility. Consider most libraries in illumos use a mapfile to limit symbol visibility to explicitly exported symbols. This causes the DWARF debug data and the symbol table to disagree. Such symbols were previously global which means that have external visibility; however, the reduction gives them a binding of STB_LOCAL. However, we know that such symbols will always be associated with the first STT_FILE symbol because they were previously not associated with any file and then reduced. Therefore, we can use this as a key as to when to perform such a mapping. However, when performing a dedup as part of the conversion of a single DWARF die we can't perform this.
This stricter checking also causes weak symbol mapping to need to be improved. The current version isn't perfect and we miss a few weak symbols, but not all of them. This is unfortunate; however, the general improvements in static symbols make this a worthwhile tradeoff. Solving weak symbols is tricky because while we can note the association of a weak symbol when we first convert single compilation unit, the general conditions we use for weak symbol mapping start to break down when we end up merging because of how the link-editor manipulates them.
Honestly, if we want weak symbol processing to be useful, we either need to ask that ld puts in a section of weak to strong symbol mappings so we can just do this once and for all or we need to wait until we actually do conversion in the link-editor.
Updated by John Levon about 4 years ago
- Description updated (diff)
Updated by Electric Monk about 4 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 37e82d12edcc0e950e771c6cda32e45f1ecd7f8d
commit 37e82d12edcc0e950e771c6cda32e45f1ecd7f8d Author: Robert Mustacchi <email@example.com> Date: 2019-05-03T09:15:57.000Z 10812 ctf tools shouldn't add blank labels 10813 ctf symbol mapping needs work Reviewed by: Jerry Jelinek <firstname.lastname@example.org> Reviewed by: Gordon Ross <email@example.com> Reviewed by: Toomas Soome <firstname.lastname@example.org> Approved by: Gordon Ross <email@example.com>