Project

General

Profile

Bug #1393

dis does not handle sections with unassigned virtual addresses well

Added by Jason King about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cmd - userland programs
Start date:
2011-08-19
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

When disassembling an ELF object, dis attempts to map the instruction to a 'symbol+offset' value. It does this by looking at the offset of the instruction from the ELF section start, and looking for an entry in the symbol with the closest value that is less than or equal to the instruction offset. It also caches the last symbol used. Unfortunately, it does not pay attention to the section index. This can cause dis to incorrectly lookup a symbol when disassembling NOP padding between functions. If a symbol in another section (such as .data) happens to have a ELF value (i.e. ELF section offset) that lies within the padding offset, it will use that symbol, cache it, and continue to use it.

For example, for an object with the following symbols:

     index    value              size              type bind oth ver shndx    name
... 
      [50]  0x0000000000000250 0x0000000000000014  FUNC LOCL  D    0 .text          bridge_m_start
      [51]  0x0000000000000270 0x0000000000000012  FUNC LOCL  D    0 .text          bridge_m_stop
...    
     [110]  0x0000000000000260 0x0000000000000050  OBJT LOCL  D    0 .data          bridge_rinit

When dis is run on the object, it produces

bridge_m_start()
   bridge_m_start:         55                 pushq  %rbp
   bridge_m_start+0x1:     31 c0              xorl   %eax,%eax
   bridge_m_start+0x3:     48 89 e5           movq   %rsp,%rbp
   bridge_m_start+0x6:     48 83 ec 10        subq   $0x10,%rsp
   bridge_m_start+0xa:     83 4f 20 02        orl    $0x2,0x20(%rdi)
   bridge_m_start+0xe:     48 89 7d f8        movq   %rdi,-0x8(%rbp)
   bridge_m_start+0x12:    c9                 leave
   bridge_m_start+0x13:    c3                 ret
   bridge_rinit+0x4:       66 66 66 2e 0f 1f  nopw   %cs:0x0(%rax,%rax)
                           84 00 00 00 00 00
   bridge_rinit+0x10:      55                 pushq  %rbp
   bridge_rinit+0x11:      48 89 e5           movq   %rsp,%rbp
   bridge_rinit+0x14:      48 83 ec 10        subq   $0x10,%rsp

The 'nopw %cs:0x0(%rax,%rax)' instruction lies at .text+0x264. When dis attempts to locate a symbol for this, it sees that it is beyond the range of bridge_m_start, and looks for the next symbol whose value is <= 0x264. It finds 'bridge_rinit' which resides at .data+0x260 (disregarding that it's in .data and not .text). It then caches this value, so even though the next instruction is the beginning of bridge_m_stop, it continues to use the incorrect value until it gets to .text+0x2b0 (0x240 + 0x50).

The expected output should be something similar to:

bridge_m_start()
   bridge_m_start:         55                 pushq  %rbp
   bridge_m_start+0x1:     31 c0              xorl   %eax,%eax
   bridge_m_start+0x3:     48 89 e5           movq   %rsp,%rbp
   bridge_m_start+0x6:     48 83 ec 10        subq   $0x10,%rsp
   bridge_m_start+0xa:     83 4f 20 02        orl    $0x2,0x20(%rdi)
   bridge_m_start+0xe:     48 89 7d f8        movq   %rdi,-0x8(%rbp)
   bridge_m_start+0x12:    c9                 leave
   bridge_m_start+0x13:    c3                 ret
                           66 66 66 2e 0f 1f  nopw   %cs:0x0(%rax,%rax)
                           84 00 00 00 00 00
bridge_m_stop()
   bridge_m_stop:          55                 pushq  %rbp
   bridge_m_stop+0x1:      48 89 e5           movq   %rsp,%rbp
   bridge_m_stop+0x4:      48 83 ec 10        subq   $0x10,%rsp
...

This also causes issues when an invalid opcode is encountered. dis attempts to merely dump the bytes until it reaches the next symbol, however it will have the same issue if it locates a symbol in the wrong section.

History

#1

Updated by Jason King about 8 years ago

Doing a bit more digging, it appears that this is only an issue when the sh_addr field of the ELF section header is 0 (or more generally if the [sh_addr, shader + sh_size) ranges of multiple sections overlap).

It appears that kernel modules as well as .o files have this property, executables and .so files appear to have the field set.

#2

Updated by Jason King about 8 years ago

  • Assignee set to Jason King
  • % Done changed from 0 to 80

The simplest solution appears to be to maintain a table of (virtual address, size) tuples for each section, then assign an unused address range to any sections that have not been assigned a virtual address, adjusting the value of any symbol in that section in the symbol table as appropriate.

#3

Updated by Rich Lowe about 8 years ago

  • Subject changed from dis ignores section index on symbol lookup to dis does not handle sections with unassigned virtual addresses well
#4

Updated by Albert Lee about 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 80 to 100

Fixed in:
changeset: 13451:4349f1403dad
tag: tip
user: Jason King <>
date: Wed Sep 14 11:28:33 2011 -0500
files: usr/src/cmd/dis/dis_main.c usr/src/cmd/dis/dis_target.c usr/src/cmd/dis/dis_target.h
description:
1393 dis does not handle sections with unassigned virtual addresses well
1409 dis is overly restrictive when looking for it's next symbol
Reviewed by: Rich Lowe <>
Reviewed by: Eric Schrock <>
Approved by: Albert Lee <>

Also available in: Atom PDF