Bug #3771
closedkmdb hangs when reading an invalid address
100%
Description
Jeremy Jones did some heroic investigation of this. His write-up of the problem is:
This is commonly seen when executing a command like "::walk thread | ::findstack" but can be easily reproduced by simply trying to examine an invalid address using a formatter that reads more than 1 byte at a time. Once in this state I noticed that the VM was consuming 100% of CPU so it was obviously stuck in a tight loop. Using vprobes I was able to determine that the guest OS was repeatedly page faulting on the same address with the following stack: ... GUEST_0xfffffffffbd39f1a_0xfffffffffbd39f8c_0xfffffffffbd39e7f_0xfffffffffbd3a031_0xfffffffffbd4eecf_0xfffffffffbd1d25c_0xfffffffffbd1fe35_0xfffffffffbd1ff4a_0xfffffffffbd22f7e_0xfffffffffbd23182_0xfffffffffbd22a59_0xfffffffffbd59c44_... CR2: 0x400 ... Decoding the stack addresses in mdb we have: ... kmdbmod`kmt_bcopy+0x5a: movl (%rdi),%eax kmdbmod`kmt_reader+0x2c: movq %rbx,%rax kmdbmod`kmt_rw+0xef: cmpq %rbx,%rax kmdbmod`kmt_read+0x61: movq -0x30(%rbp),%rbx kmdbmod`mdb_tgt_vread+0x1f: addq $0x20,%rsp kmdbmod`mdb_amd64_kvm_stack_iter+0xbc: cmpq %r14,%rax kmdbmod`kmt_stack_common+0xa5: xorl %eax,%eax kmdbmod`kmt_stackv+0x2a: addq $0x20,%rsp kmdbmod`dcmd_invoke+0x7e: cmpl $0x2,%eax kmdbmod`mdb_call_idcmd+0x112: cmpl $0x4,%eax kmdbmod`mdb_call+0x3f9: call +0x932 <kmdbmod`mdb_intr_disable> kmdbmod`yyparse+0xdb4: testl %eax,%eax ... I tried using dynamic vprobes to debug the problem further but apparently they aren't supported under ESX ? Now that I knew where the general problem was I was able to do some iterative printf debugging to locate the root cause of the problem. In this case we are trying to read form an invalid address (0x400) which will generate a page fault. The way that kmdb handles read page faults is by using setjmp/longjmp. If we hit a page fault during a read we will longjmp back to the setjmp location to handle the error. Here is the relevant c code from kmt_rw(): ... /* * Try to process the entire buffer, as requested. If we catch a fault, * try smaller chunks. This allows us to handle regions that cross * mapping boundaries. */ chunksz = nbytes; ndone = 0; if (setjmp(pcb) != 0) { if (chunksz == 1) { /* We failed with the smallest chunk - give up */ kmdb_dpi_restore_fault_hdlr(oldpcb); return (ndone > 0 ? ndone : -1); /* errno set for us */ } else if (chunksz > 4) chunksz = 4; else chunksz = 1; } ... So after a few iterations we should return from this function with an error but what I discovered through instrumentation was that the value of chunksz was never changing ? Obviously if chunksz not changing (and was initially > 1) we will never error out of kmt_rw(). Again from the above c code we can see that chunksz (a local variable) is set to the function's nbytes parameter. If we disassemble the kmt_rw code we can see how the chunksz local variable is modified: ... kmt_rw+0x77: call +0x30d44 <setjmp> kmt_rw+0x7c: testl %eax,%eax (A) kmt_rw+0x7e: movq -0x98(%rbp),%r12 (B) kmt_rw+0x85: je +0x1c <kmt_rw+0xa3> kmt_rw+0x87: decq %r12 kmt_rw+0x8a: je +0x90 <kmt_rw+0x120> kmt_rw+0x90: cmpq $0x5,-0x98(%rbp) kmt_rw+0x98: sbbq %r12,%r12 kmt_rw+0x9b: andq $0xfffffffffffffffd,%r12 kmt_rw+0x9f: addq $0x4,%r12 kmt_rw+0xa3: leaq -0x90(%rbp),%rdi kmt_rw+0xaa: xorl %r13d,%r13d kmt_rw+0xad: call -0xc192 <kmdb_dpi_set_fault_hdlr> kmt_rw+0xb2: movq %rax,%r14 kmt_rw+0xb5: jmp +0xf <kmt_rw+0xc6> kmt_rw+0xb7: nop kmt_rw+0xb8: addq %rax,-0xa8(%rbp) kmt_rw+0xbf: addq %rax,-0xa0(%rbp) kmt_rw+0xc6: cmpq %r12,-0x98(%rbp) kmt_rw+0xcd: movq -0x98(%rbp),%rbx kmt_rw+0xd4: movq -0xa8(%rbp),%rdx kmt_rw+0xdb: movq -0xa0(%rbp),%rdi kmt_rw+0xe2: cmovq.a %r12,%rbx kmt_rw+0xe6: movq %rbx,%rsi kmt_rw+0xe9: call *-0xb0(%rbp) (C) ... (A) - This is where we will branch on the page fault. (B) - This is where chunksz is set to the nbytes parameter value. (C) - The page fault occurs in this call. So from this code we can see that chunksz is a register and is always loaded from the stack. Nowhere between (A) and (C) is the value of -0x98(%rbp) modified. This explains why the value is never changing when we hit a page fault. If we add an mdb_printf statement to print the address of the chunksz local variable this forces the variable to be placed on the stack. From the disassembled code we can see chunksz is now being stored in the stack: ... kmt_rw+0x7a: call +0x30d61 <setjmp> kmt_rw+0x7f: testl %eax,%eax kmt_rw+0x81: je +0x33 <kmt_rw+0xb6> kmt_rw+0x83: movq -0x58(%rbp),%rax kmt_rw+0x87: cmpq $0x1,%rax kmt_rw+0x8b: je +0xaf <kmt_rw+0x140> kmt_rw+0x91: cmpq $0x5,%rax kmt_rw+0x95: leaq -0x58(%rbp),%rsi kmt_rw+0x99: movq $0xfffffffffbd78b12,%rdi kmt_rw+0xa0: sbbq %rax,%rax kmt_rw+0xa3: andq $0xfffffffffffffffd,%rax kmt_rw+0xa7: addq $0x4,%rax kmt_rw+0xab: movq %rax,-0x58(%rbp) (A) ... (A) - Store of chunksz. With this change I'm no longer able to reproduce the problem. Looking at a version of the kmbdmod that was compiled with the Sun compiler shows that it does not have this problem: ... kmt_rw+0x7f: call +0x38d04 <setjmp> kmt_rw+0x84: testl %eax,%eax kmt_rw+0x86: je +0x58 <kmt_rw+0xe0> kmt_rw+0x88: movq -0x90(%rbp),%r8 (A) kmt_rw+0x8f: cmpq $0x1,%r8 kmt_rw+0x93: je +0x22 <kmt_rw+0xb7> kmt_rw+0x95: movq $0x4,%r9 kmt_rw+0x9c: cmpq %r8,%r9 kmt_rw+0x9f: jb +0xd <kmt_rw+0xae> kmt_rw+0xa1: movq $0x1,-0x90(%rbp) (B) kmt_rw+0xac: jmp +0x32 <kmt_rw+0xe0> kmt_rw+0xae: movq %r9,-0x90(%rbp) (C) kmt_rw+0xb5: jmp +0x29 <kmt_rw+0xe0> kmt_rw+0xb7: movq -0x98(%rbp),%rdi ... (A) - Load into chunksz. (B) - Store of chunksz into the stack. (C) - Store of chunksz into the stack. After reading the setjmp documentation it appears that the bug is in the kmt_rw() code and is not in the compiler. "All accessible objects have values as of the time longjmp() was called, except that the values of objects of automatic storage duration which are local to the function containing the invocation of the corresponding setjmp() which do not have volatile-qualified type and which are changed between the setjmp() invocation and longjmp() call are indeterminate." In our code we are modifying a non-volatile variable (chunksz) between the setjmp and longjmp. After making chunksz volatile the disassmebly looked good and I was no longer able to reproduce the problem.
Updated by Rich Lowe about 10 years ago
Wonderful analysis, thanks for that.
The compiler should be issueing a warning in this case (perhaps another that is gagged in this specific bit of code), it should be emitting "clobbered by longjmp or vfork" warnings (it does this for anything where this is possible, not necessarily just those two functions).
If it is not emitting that warning, it's possible that the mdb/kmdb/kernel setjmp is missing an annotation (it should have __RETURNS_TWICE).
Updated by Christopher Siden about 10 years ago
Just quickly glancing through the code it looks like kmdb_kvm.c does eventually include setjmp.h which does define setjmp() as a #defines to something that does have the __RETURNS_TWICE annotation, but it looks like the make file for mdb (which I assume affects kmdb as well) uses -Wno-clobbered, which I assume silences that warning. It is probably worth investigating if we can remove the -Wno-clobbered flag at least from mdb after this fix.
Updated by Rich Lowe almost 10 years ago
- Status changed from In Progress to Resolved
- Tags deleted (
needs-triage)
Resolved in 982e63f