Project

General

Profile

Actions

Bug #1393

closed

dis does not handle sections with unassigned virtual addresses well

Added by Jason King over 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cmd - userland programs
Start date:
2011-08-19
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

When disassembling an ELF object, dis attempts to map the instruction to a 'symbol+offset' value. It does this by looking at the offset of the instruction from the ELF section start, and looking for an entry in the symbol with the closest value that is less than or equal to the instruction offset. It also caches the last symbol used. Unfortunately, it does not pay attention to the section index. This can cause dis to incorrectly lookup a symbol when disassembling NOP padding between functions. If a symbol in another section (such as .data) happens to have a ELF value (i.e. ELF section offset) that lies within the padding offset, it will use that symbol, cache it, and continue to use it.

For example, for an object with the following symbols:

     index    value              size              type bind oth ver shndx    name
... 
      [50]  0x0000000000000250 0x0000000000000014  FUNC LOCL  D    0 .text          bridge_m_start
      [51]  0x0000000000000270 0x0000000000000012  FUNC LOCL  D    0 .text          bridge_m_stop
...    
     [110]  0x0000000000000260 0x0000000000000050  OBJT LOCL  D    0 .data          bridge_rinit

When dis is run on the object, it produces

bridge_m_start()
   bridge_m_start:         55                 pushq  %rbp
   bridge_m_start+0x1:     31 c0              xorl   %eax,%eax
   bridge_m_start+0x3:     48 89 e5           movq   %rsp,%rbp
   bridge_m_start+0x6:     48 83 ec 10        subq   $0x10,%rsp
   bridge_m_start+0xa:     83 4f 20 02        orl    $0x2,0x20(%rdi)
   bridge_m_start+0xe:     48 89 7d f8        movq   %rdi,-0x8(%rbp)
   bridge_m_start+0x12:    c9                 leave
   bridge_m_start+0x13:    c3                 ret
   bridge_rinit+0x4:       66 66 66 2e 0f 1f  nopw   %cs:0x0(%rax,%rax)
                           84 00 00 00 00 00
   bridge_rinit+0x10:      55                 pushq  %rbp
   bridge_rinit+0x11:      48 89 e5           movq   %rsp,%rbp
   bridge_rinit+0x14:      48 83 ec 10        subq   $0x10,%rsp

The 'nopw %cs:0x0(%rax,%rax)' instruction lies at .text+0x264. When dis attempts to locate a symbol for this, it sees that it is beyond the range of bridge_m_start, and looks for the next symbol whose value is <= 0x264. It finds 'bridge_rinit' which resides at .data+0x260 (disregarding that it's in .data and not .text). It then caches this value, so even though the next instruction is the beginning of bridge_m_stop, it continues to use the incorrect value until it gets to .text+0x2b0 (0x240 + 0x50).

The expected output should be something similar to:

bridge_m_start()
   bridge_m_start:         55                 pushq  %rbp
   bridge_m_start+0x1:     31 c0              xorl   %eax,%eax
   bridge_m_start+0x3:     48 89 e5           movq   %rsp,%rbp
   bridge_m_start+0x6:     48 83 ec 10        subq   $0x10,%rsp
   bridge_m_start+0xa:     83 4f 20 02        orl    $0x2,0x20(%rdi)
   bridge_m_start+0xe:     48 89 7d f8        movq   %rdi,-0x8(%rbp)
   bridge_m_start+0x12:    c9                 leave
   bridge_m_start+0x13:    c3                 ret
                           66 66 66 2e 0f 1f  nopw   %cs:0x0(%rax,%rax)
                           84 00 00 00 00 00
bridge_m_stop()
   bridge_m_stop:          55                 pushq  %rbp
   bridge_m_stop+0x1:      48 89 e5           movq   %rsp,%rbp
   bridge_m_stop+0x4:      48 83 ec 10        subq   $0x10,%rsp
...

This also causes issues when an invalid opcode is encountered. dis attempts to merely dump the bytes until it reaches the next symbol, however it will have the same issue if it locates a symbol in the wrong section.

Actions

Also available in: Atom PDF