Bug #3771

kmdb hangs when reading an invalid address

Added by Christopher Siden over 7 years ago. Updated over 7 years ago.

mdb - modular debugger
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


Jeremy Jones did some heroic investigation of this. His write-up of the problem is:

This is commonly seen when executing a command like "::walk thread |
::findstack" but can be easily reproduced by simply trying to examine an
invalid address using a formatter that reads more than 1 byte at a time.

Once in this state I noticed that the VM was consuming 100% of CPU so it was
obviously stuck in a tight loop.  Using vprobes I was able to determine that
the guest OS was repeatedly page faulting on the same address with the
following stack:
CR2: 0x400

Decoding the stack addresses in mdb we have:
kmdbmod`kmt_bcopy+0x5a:         movl   (%rdi),%eax
kmdbmod`kmt_reader+0x2c:        movq   %rbx,%rax
kmdbmod`kmt_rw+0xef:            cmpq   %rbx,%rax
kmdbmod`kmt_read+0x61:          movq   -0x30(%rbp),%rbx
kmdbmod`mdb_tgt_vread+0x1f:     addq   $0x20,%rsp
kmdbmod`mdb_amd64_kvm_stack_iter+0xbc:          cmpq   %r14,%rax
kmdbmod`kmt_stack_common+0xa5:  xorl   %eax,%eax
kmdbmod`kmt_stackv+0x2a:        addq   $0x20,%rsp
kmdbmod`dcmd_invoke+0x7e:       cmpl   $0x2,%eax
kmdbmod`mdb_call_idcmd+0x112:   cmpl   $0x4,%eax
kmdbmod`mdb_call+0x3f9:         call   +0x932   <kmdbmod`mdb_intr_disable>
kmdbmod`yyparse+0xdb4:          testl  %eax,%eax

I tried using dynamic vprobes to debug the problem further but apparently they
aren't supported under ESX ?  Now that I knew where the general problem was I
was able to do some iterative printf debugging to locate the root cause of the

In this case we are trying to read form an invalid address (0x400) which will
generate a page fault.  The way that kmdb handles read page faults is by using
setjmp/longjmp.  If we hit a page fault during a read we will longjmp back to
the setjmp location to handle the error.  Here is the relevant c code from
         * Try to process the entire buffer, as requested.  If we catch a
         * try smaller chunks.  This allows us to handle regions that cross
         * mapping boundaries.
        chunksz = nbytes;
        ndone = 0;
        if (setjmp(pcb) != 0) {
                if (chunksz == 1) {
                        /* We failed with the smallest chunk - give up */
                        return (ndone > 0 ? ndone : -1); /* errno set for us */
                } else if (chunksz > 4) 
                        chunksz = 4;
                        chunksz = 1;

So after a few iterations we should return from this function with an error but
what I discovered through instrumentation was that the value of chunksz was
never changing ?  Obviously if chunksz not changing (and was initially > 1) we
will never error out of kmt_rw().

Again from the above c code we can see that chunksz (a local variable) is set
to the function's nbytes parameter.  If we disassemble the kmt_rw code we can
see how the chunksz local variable is modified:
kmt_rw+0x77:                    call   +0x30d44 <setjmp>                     
kmt_rw+0x7c:                    testl  %eax,%eax                  (A)
kmt_rw+0x7e:                    movq   -0x98(%rbp),%r12           (B)
kmt_rw+0x85:                    je     +0x1c    <kmt_rw+0xa3>                
kmt_rw+0x87:                    decq   %r12
kmt_rw+0x8a:                    je     +0x90    <kmt_rw+0x120>               
kmt_rw+0x90:                    cmpq   $0x5,-0x98(%rbp)                      
kmt_rw+0x98:                    sbbq   %r12,%r12
kmt_rw+0x9b:                    andq   $0xfffffffffffffffd,%r12              
kmt_rw+0x9f:                    addq   $0x4,%r12
kmt_rw+0xa3:                    leaq   -0x90(%rbp),%rdi                      
kmt_rw+0xaa:                    xorl   %r13d,%r13d
kmt_rw+0xad:                    call   -0xc192  <kmdb_dpi_set_fault_hdlr>    
kmt_rw+0xb2:                    movq   %rax,%r14
kmt_rw+0xb5:                    jmp    +0xf     <kmt_rw+0xc6>                
kmt_rw+0xb7:                    nop    
kmt_rw+0xb8:                    addq   %rax,-0xa8(%rbp)                      
kmt_rw+0xbf:                    addq   %rax,-0xa0(%rbp)                      
kmt_rw+0xc6:                    cmpq   %r12,-0x98(%rbp)                      
kmt_rw+0xcd:                    movq   -0x98(%rbp),%rbx                      
kmt_rw+0xd4:                    movq   -0xa8(%rbp),%rdx                      
kmt_rw+0xdb:                    movq   -0xa0(%rbp),%rdi                      
kmt_rw+0xe2:                    cmovq.a %r12,%rbx                            
kmt_rw+0xe6:                    movq   %rbx,%rsi
kmt_rw+0xe9:                    call   *-0xb0(%rbp)               (C)

(A) - This is where we will branch on the page fault.
(B) - This is where chunksz is set to the nbytes parameter value.
(C) - The page fault occurs in this call.

So from this code we can see that chunksz is a register and is always loaded
from the stack.  Nowhere between (A) and (C) is the value of -0x98(%rbp)
modified.  This explains why the value is never changing when we hit a page

If we add an mdb_printf statement to print the address of the chunksz local
variable this forces the variable to be placed on the stack.  From the
disassembled code we can see chunksz is now being stored in the stack:
kmt_rw+0x7a:                    call   +0x30d61 <setjmp>
kmt_rw+0x7f:                    testl  %eax,%eax
kmt_rw+0x81:                    je     +0x33    <kmt_rw+0xb6>
kmt_rw+0x83:                    movq   -0x58(%rbp),%rax
kmt_rw+0x87:                    cmpq   $0x1,%rax
kmt_rw+0x8b:                    je     +0xaf    <kmt_rw+0x140>
kmt_rw+0x91:                    cmpq   $0x5,%rax
kmt_rw+0x95:                    leaq   -0x58(%rbp),%rsi
kmt_rw+0x99:                    movq   $0xfffffffffbd78b12,%rdi
kmt_rw+0xa0:                    sbbq   %rax,%rax
kmt_rw+0xa3:                    andq   $0xfffffffffffffffd,%rax
kmt_rw+0xa7:                    addq   $0x4,%rax
kmt_rw+0xab:                    movq   %rax,-0x58(%rbp)          (A)

(A) - Store of chunksz.

With this change I'm no longer able to reproduce the problem.  Looking at a
version of the kmbdmod that was compiled with the Sun compiler shows that it
does not have this problem:
kmt_rw+0x7f:                    call   +0x38d04 <setjmp>
kmt_rw+0x84:                    testl  %eax,%eax
kmt_rw+0x86:                    je     +0x58    <kmt_rw+0xe0>
kmt_rw+0x88:                    movq   -0x90(%rbp),%r8           (A)
kmt_rw+0x8f:                    cmpq   $0x1,%r8
kmt_rw+0x93:                    je     +0x22    <kmt_rw+0xb7>
kmt_rw+0x95:                    movq   $0x4,%r9
kmt_rw+0x9c:                    cmpq   %r8,%r9
kmt_rw+0x9f:                    jb     +0xd     <kmt_rw+0xae>
kmt_rw+0xa1:                    movq   $0x1,-0x90(%rbp)          (B)
kmt_rw+0xac:                    jmp    +0x32    <kmt_rw+0xe0>
kmt_rw+0xae:                    movq   %r9,-0x90(%rbp)           (C)
kmt_rw+0xb5:                    jmp    +0x29    <kmt_rw+0xe0>
kmt_rw+0xb7:                    movq   -0x98(%rbp),%rdi

(A) - Load into chunksz.
(B) - Store of chunksz into the stack.
(C) - Store of chunksz into the stack.

After reading the setjmp documentation it appears that the bug is in the
kmt_rw() code and is not in the compiler.

"All accessible objects have values as of the time longjmp() was called, except
that the values of objects of automatic storage duration which are local to the
function containing the invocation of the corresponding setjmp() which do not
have volatile-qualified type and which are changed between the setjmp()
invocation and longjmp() call are indeterminate." 

In our code we are modifying a non-volatile variable (chunksz) between the
setjmp and longjmp.  After making chunksz volatile the disassmebly looked good
and I was no longer able to reproduce the problem.

Also available in: Atom PDF