Project

General

Profile

Bug #3771

kmdb hangs when reading an invalid address

Added by Christopher Siden over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
mdb - modular debugger
Start date:
2013-05-16
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Jeremy Jones did some heroic investigation of this. His write-up of the problem is:

This is commonly seen when executing a command like "::walk thread |
::findstack" but can be easily reproduced by simply trying to examine an
invalid address using a formatter that reads more than 1 byte at a time.

Once in this state I noticed that the VM was consuming 100% of CPU so it was
obviously stuck in a tight loop.  Using vprobes I was able to determine that
the guest OS was repeatedly page faulting on the same address with the
following stack:
...
GUEST_0xfffffffffbd39f1a_0xfffffffffbd39f8c_0xfffffffffbd39e7f_0xfffffffffbd3a031_0xfffffffffbd4eecf_0xfffffffffbd1d25c_0xfffffffffbd1fe35_0xfffffffffbd1ff4a_0xfffffffffbd22f7e_0xfffffffffbd23182_0xfffffffffbd22a59_0xfffffffffbd59c44_...
CR2: 0x400
...

Decoding the stack addresses in mdb we have:
...
kmdbmod`kmt_bcopy+0x5a:         movl   (%rdi),%eax
kmdbmod`kmt_reader+0x2c:        movq   %rbx,%rax
kmdbmod`kmt_rw+0xef:            cmpq   %rbx,%rax
kmdbmod`kmt_read+0x61:          movq   -0x30(%rbp),%rbx
kmdbmod`mdb_tgt_vread+0x1f:     addq   $0x20,%rsp
kmdbmod`mdb_amd64_kvm_stack_iter+0xbc:          cmpq   %r14,%rax
kmdbmod`kmt_stack_common+0xa5:  xorl   %eax,%eax
kmdbmod`kmt_stackv+0x2a:        addq   $0x20,%rsp
kmdbmod`dcmd_invoke+0x7e:       cmpl   $0x2,%eax
kmdbmod`mdb_call_idcmd+0x112:   cmpl   $0x4,%eax
kmdbmod`mdb_call+0x3f9:         call   +0x932   <kmdbmod`mdb_intr_disable>
kmdbmod`yyparse+0xdb4:          testl  %eax,%eax
...

I tried using dynamic vprobes to debug the problem further but apparently they
aren't supported under ESX ?  Now that I knew where the general problem was I
was able to do some iterative printf debugging to locate the root cause of the
problem.

In this case we are trying to read form an invalid address (0x400) which will
generate a page fault.  The way that kmdb handles read page faults is by using
setjmp/longjmp.  If we hit a page fault during a read we will longjmp back to
the setjmp location to handle the error.  Here is the relevant c code from
kmt_rw():
...
        /*
         * Try to process the entire buffer, as requested.  If we catch a
fault,
         * try smaller chunks.  This allows us to handle regions that cross
         * mapping boundaries.
         */
        chunksz = nbytes;
        ndone = 0;
        if (setjmp(pcb) != 0) {
                if (chunksz == 1) {
                        /* We failed with the smallest chunk - give up */
                        kmdb_dpi_restore_fault_hdlr(oldpcb);
                        return (ndone > 0 ? ndone : -1); /* errno set for us */
                } else if (chunksz > 4) 
                        chunksz = 4;
                else 
                        chunksz = 1;
        }
...

So after a few iterations we should return from this function with an error but
what I discovered through instrumentation was that the value of chunksz was
never changing ?  Obviously if chunksz not changing (and was initially > 1) we
will never error out of kmt_rw().

Again from the above c code we can see that chunksz (a local variable) is set
to the function's nbytes parameter.  If we disassemble the kmt_rw code we can
see how the chunksz local variable is modified:
...
kmt_rw+0x77:                    call   +0x30d44 <setjmp>                     
kmt_rw+0x7c:                    testl  %eax,%eax                  (A)
kmt_rw+0x7e:                    movq   -0x98(%rbp),%r12           (B)
kmt_rw+0x85:                    je     +0x1c    <kmt_rw+0xa3>                
kmt_rw+0x87:                    decq   %r12
kmt_rw+0x8a:                    je     +0x90    <kmt_rw+0x120>               
kmt_rw+0x90:                    cmpq   $0x5,-0x98(%rbp)                      
kmt_rw+0x98:                    sbbq   %r12,%r12
kmt_rw+0x9b:                    andq   $0xfffffffffffffffd,%r12              
kmt_rw+0x9f:                    addq   $0x4,%r12
kmt_rw+0xa3:                    leaq   -0x90(%rbp),%rdi                      
kmt_rw+0xaa:                    xorl   %r13d,%r13d
kmt_rw+0xad:                    call   -0xc192  <kmdb_dpi_set_fault_hdlr>    
kmt_rw+0xb2:                    movq   %rax,%r14
kmt_rw+0xb5:                    jmp    +0xf     <kmt_rw+0xc6>                
kmt_rw+0xb7:                    nop    
kmt_rw+0xb8:                    addq   %rax,-0xa8(%rbp)                      
kmt_rw+0xbf:                    addq   %rax,-0xa0(%rbp)                      
kmt_rw+0xc6:                    cmpq   %r12,-0x98(%rbp)                      
kmt_rw+0xcd:                    movq   -0x98(%rbp),%rbx                      
kmt_rw+0xd4:                    movq   -0xa8(%rbp),%rdx                      
kmt_rw+0xdb:                    movq   -0xa0(%rbp),%rdi                      
kmt_rw+0xe2:                    cmovq.a %r12,%rbx                            
kmt_rw+0xe6:                    movq   %rbx,%rsi
kmt_rw+0xe9:                    call   *-0xb0(%rbp)               (C)
...

(A) - This is where we will branch on the page fault.
(B) - This is where chunksz is set to the nbytes parameter value.
(C) - The page fault occurs in this call.

So from this code we can see that chunksz is a register and is always loaded
from the stack.  Nowhere between (A) and (C) is the value of -0x98(%rbp)
modified.  This explains why the value is never changing when we hit a page
fault.

If we add an mdb_printf statement to print the address of the chunksz local
variable this forces the variable to be placed on the stack.  From the
disassembled code we can see chunksz is now being stored in the stack:
...
kmt_rw+0x7a:                    call   +0x30d61 <setjmp>
kmt_rw+0x7f:                    testl  %eax,%eax
kmt_rw+0x81:                    je     +0x33    <kmt_rw+0xb6>
kmt_rw+0x83:                    movq   -0x58(%rbp),%rax
kmt_rw+0x87:                    cmpq   $0x1,%rax
kmt_rw+0x8b:                    je     +0xaf    <kmt_rw+0x140>
kmt_rw+0x91:                    cmpq   $0x5,%rax
kmt_rw+0x95:                    leaq   -0x58(%rbp),%rsi
kmt_rw+0x99:                    movq   $0xfffffffffbd78b12,%rdi
kmt_rw+0xa0:                    sbbq   %rax,%rax
kmt_rw+0xa3:                    andq   $0xfffffffffffffffd,%rax
kmt_rw+0xa7:                    addq   $0x4,%rax
kmt_rw+0xab:                    movq   %rax,-0x58(%rbp)          (A)
...

(A) - Store of chunksz.

With this change I'm no longer able to reproduce the problem.  Looking at a
version of the kmbdmod that was compiled with the Sun compiler shows that it
does not have this problem:
...
kmt_rw+0x7f:                    call   +0x38d04 <setjmp>
kmt_rw+0x84:                    testl  %eax,%eax
kmt_rw+0x86:                    je     +0x58    <kmt_rw+0xe0>
kmt_rw+0x88:                    movq   -0x90(%rbp),%r8           (A)
kmt_rw+0x8f:                    cmpq   $0x1,%r8
kmt_rw+0x93:                    je     +0x22    <kmt_rw+0xb7>
kmt_rw+0x95:                    movq   $0x4,%r9
kmt_rw+0x9c:                    cmpq   %r8,%r9
kmt_rw+0x9f:                    jb     +0xd     <kmt_rw+0xae>
kmt_rw+0xa1:                    movq   $0x1,-0x90(%rbp)          (B)
kmt_rw+0xac:                    jmp    +0x32    <kmt_rw+0xe0>
kmt_rw+0xae:                    movq   %r9,-0x90(%rbp)           (C)
kmt_rw+0xb5:                    jmp    +0x29    <kmt_rw+0xe0>
kmt_rw+0xb7:                    movq   -0x98(%rbp),%rdi
...

(A) - Load into chunksz.
(B) - Store of chunksz into the stack.
(C) - Store of chunksz into the stack.

After reading the setjmp documentation it appears that the bug is in the
kmt_rw() code and is not in the compiler.

"All accessible objects have values as of the time longjmp() was called, except
that the values of objects of automatic storage duration which are local to the
function containing the invocation of the corresponding setjmp() which do not
have volatile-qualified type and which are changed between the setjmp()
invocation and longjmp() call are indeterminate." 

In our code we are modifying a non-volatile variable (chunksz) between the
setjmp and longjmp.  After making chunksz volatile the disassmebly looked good
and I was no longer able to reproduce the problem.

History

#1

Updated by Rich Lowe over 6 years ago

Wonderful analysis, thanks for that.

The compiler should be issueing a warning in this case (perhaps another that is gagged in this specific bit of code), it should be emitting "clobbered by longjmp or vfork" warnings (it does this for anything where this is possible, not necessarily just those two functions).

If it is not emitting that warning, it's possible that the mdb/kmdb/kernel setjmp is missing an annotation (it should have __RETURNS_TWICE).

#2

Updated by Christopher Siden over 6 years ago

Just quickly glancing through the code it looks like kmdb_kvm.c does eventually include setjmp.h which does define setjmp() as a #defines to something that does have the __RETURNS_TWICE annotation, but it looks like the make file for mdb (which I assume affects kmdb as well) uses -Wno-clobbered, which I assume silences that warning. It is probably worth investigating if we can remove the -Wno-clobbered flag at least from mdb after this fix.

#3

Updated by Rich Lowe over 6 years ago

  • Status changed from In Progress to Resolved
  • Tags deleted (needs-triage)

Resolved in 982e63f

Also available in: Atom PDF