Actions
Bug #3916
closedelfcore() can induce panic in bcopy()
Start date:
2013-07-28
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
Description
From the Joyent bug report:
We died in bcopy() called from elfcore() (called from do_core()):
0xfffffffffb85a6ca() do_core+0x246(ffffff19b3687940, b, 1, ffffff19c26f1dd0) dump_one_core+0xe9(b, 7fffffffffffffff, 1, ffffff19c26f1dd0, ffffff00baa0acf0) core+0x40b(b, 0) psig+0x4e1() trap+0x1408(ffffff00baa0af10, 408260, 16) 0xfffffffffb8001d6()
We died copying our lwp_curinfo:
> *panic_thread::print kthread_t t_lwp->lwp_curinfo t_lwp->lwp_curinfo = 0xffffff96365fef50
But this seems valid:
> 0xffffff96365fef50::whatis ffffff96365fef50 is allocated from kmem_alloc_96
The problem is that the size on the bcopy is wrong:
> elfcore+0xc0b::dis elfcore+0xbd6: jmp +0x17 <elfcore+0xbef> elfcore+0xbd8: movq %gs:0x18,%rax elfcore+0xbe1: movq 0x190(%rax),%r8 elfcore+0xbe8: movq 0x5e0(%r8),%rdi elfcore+0xbef: testq %rdi,%rdi elfcore+0xbf2: je +0x19 <elfcore+0xc0d> elfcore+0xbf4: addq $0x8,%rdi elfcore+0xbf8: leaq 0xfffffffffffffe18(%rbp),%rsi elfcore+0xbff: movq $0x100,%rdx elfcore+0xc06: call +0x3beb4dd <bcopy> elfcore+0xc0b: jmp +0x1f <elfcore+0xc2c> elfcore+0xc0d: movq 0xfffffffffffffd08(%rbp),%r8 elfcore+0xc14: movzbl 0x448(%r8),%eax elfcore+0xc1c: movl %eax,0xfffffffffffffe18(%rbp) elfcore+0xc22: movl $0x7fff,0xfffffffffffffe1c(%rbp) elfcore+0xc2c: movq %gs:0x18,%rax elfcore+0xc35: movq 0x190(%rax),%r8 elfcore+0xc3c: cmpl $0x100000,0x4e8(%r8) elfcore+0xc47: jne +0x2d <elfcore+0xc76> elfcore+0xc49: leaq 0xfffffffffffffe18(%rbp),%rdi elfcore+0xc50: leaq -0xe0(%rbp),%rsi
Note the 0x100 as the size – but this came from the kmem_alloc_96 cache. The problem is this code:
if (sq != NULL) { bcopy(&sq->sq_info, &killinfo.prk_info, sizeof (killinfo.prk_info)); } else { killinfo.prk_info.si_signo = lwp->lwp_cursig; killinfo.prk_info.si_code = SI_NOINFO; }
sq_info is a k_siginfo_t, not a siginfo_t – the only difference between the two being the excessive padding in the latter. Normally, we get lucky, and just copy some garbage into prk_info's padding, but occasionally we get unlucky enough to have a k_siginfo_t that sits immediately below an unmapped page – as in this case.
Updated by Robert Mustacchi over 10 years ago
- Status changed from New to Resolved
- % Done changed from 90 to 100
Resolved in ea2603507f6302005d1c10815c4f0458ae3ec803.
Actions