dis does not handle sections with unassigned virtual addresses well
When disassembling an ELF object, dis attempts to map the instruction to a 'symbol+offset' value. It does this by looking at the offset of the instruction from the ELF section start, and looking for an entry in the symbol with the closest value that is less than or equal to the instruction offset. It also caches the last symbol used. Unfortunately, it does not pay attention to the section index. This can cause dis to incorrectly lookup a symbol when disassembling NOP padding between functions. If a symbol in another section (such as .data) happens to have a ELF value (i.e. ELF section offset) that lies within the padding offset, it will use that symbol, cache it, and continue to use it.
For example, for an object with the following symbols:
index value size type bind oth ver shndx name ...  0x0000000000000250 0x0000000000000014 FUNC LOCL D 0 .text bridge_m_start  0x0000000000000270 0x0000000000000012 FUNC LOCL D 0 .text bridge_m_stop ...  0x0000000000000260 0x0000000000000050 OBJT LOCL D 0 .data bridge_rinit
When dis is run on the object, it produces
bridge_m_start() bridge_m_start: 55 pushq %rbp bridge_m_start+0x1: 31 c0 xorl %eax,%eax bridge_m_start+0x3: 48 89 e5 movq %rsp,%rbp bridge_m_start+0x6: 48 83 ec 10 subq $0x10,%rsp bridge_m_start+0xa: 83 4f 20 02 orl $0x2,0x20(%rdi) bridge_m_start+0xe: 48 89 7d f8 movq %rdi,-0x8(%rbp) bridge_m_start+0x12: c9 leave bridge_m_start+0x13: c3 ret bridge_rinit+0x4: 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax) 84 00 00 00 00 00 bridge_rinit+0x10: 55 pushq %rbp bridge_rinit+0x11: 48 89 e5 movq %rsp,%rbp bridge_rinit+0x14: 48 83 ec 10 subq $0x10,%rsp
The 'nopw %cs:0x0(%rax,%rax)' instruction lies at .text+0x264. When dis attempts to locate a symbol for this, it sees that it is beyond the range of bridge_m_start, and looks for the next symbol whose value is <= 0x264. It finds 'bridge_rinit' which resides at .data+0x260 (disregarding that it's in .data and not .text). It then caches this value, so even though the next instruction is the beginning of bridge_m_stop, it continues to use the incorrect value until it gets to .text+0x2b0 (0x240 + 0x50).
The expected output should be something similar to:
bridge_m_start() bridge_m_start: 55 pushq %rbp bridge_m_start+0x1: 31 c0 xorl %eax,%eax bridge_m_start+0x3: 48 89 e5 movq %rsp,%rbp bridge_m_start+0x6: 48 83 ec 10 subq $0x10,%rsp bridge_m_start+0xa: 83 4f 20 02 orl $0x2,0x20(%rdi) bridge_m_start+0xe: 48 89 7d f8 movq %rdi,-0x8(%rbp) bridge_m_start+0x12: c9 leave bridge_m_start+0x13: c3 ret 66 66 66 2e 0f 1f nopw %cs:0x0(%rax,%rax) 84 00 00 00 00 00 bridge_m_stop() bridge_m_stop: 55 pushq %rbp bridge_m_stop+0x1: 48 89 e5 movq %rsp,%rbp bridge_m_stop+0x4: 48 83 ec 10 subq $0x10,%rsp ...
This also causes issues when an invalid opcode is encountered. dis attempts to merely dump the bytes until it reaches the next symbol, however it will have the same issue if it locates a symbol in the wrong section.
Updated by Jason King about 8 years ago
Doing a bit more digging, it appears that this is only an issue when the sh_addr field of the ELF section header is 0 (or more generally if the [sh_addr, shader + sh_size) ranges of multiple sections overlap).
It appears that kernel modules as well as .o files have this property, executables and .so files appear to have the field set.
Updated by Jason King about 8 years ago
- Assignee set to Jason King
- % Done changed from 0 to 80
The simplest solution appears to be to maintain a table of (virtual address, size) tuples for each section, then assign an unused address range to any sections that have not been assigned a virtual address, adjusting the value of any symbol in that section in the symbol table as appropriate.
Updated by Albert Lee about 8 years ago
- Status changed from New to Resolved
- % Done changed from 80 to 100
user: Jason King <firstname.lastname@example.org>
date: Wed Sep 14 11:28:33 2011 -0500
files: usr/src/cmd/dis/dis_main.c usr/src/cmd/dis/dis_target.c usr/src/cmd/dis/dis_target.h
1393 dis does not handle sections with unassigned virtual addresses well
1409 dis is overly restrictive when looking for it's next symbol
Reviewed by: Rich Lowe <email@example.com>
Reviewed by: Eric Schrock <firstname.lastname@example.org>
Approved by: Albert Lee <email@example.com>