Bug #4567
closedLong RPC messages could confuse kernel and cause panic
100%
Description
During testing we encountered this panic:
> ::status debugging crash dump vmcore.1 (64-bit) from host-codenom operating system: 5.11 NexentaOS_4:88852b476f (i86pc) image uuid: 06e843c2-ef9e-6434-bc74-d81907d475ee panic message: segkp_fault: accessing redzone dump content: kernel pages and pages from PID -1 > ::stack vpanic() segkp_fault+0x1fd(ffffff06f118be78, fffffffffbc31d80, ffffff0030471000, 1000, 0, 1) as_fault+0x31a(ffffff06f118be78, fffffffffbc30ac0, ffffff0030471ad0, 1, 0, 1) pagefault+0x96(ffffff0030471ad0, 0, 1, 1) trap+0x136f(ffffff0030463f30, ffffff0030471ad0, 16) 0xfffffffffb8001d6() do_copy_fault_nta+0x2c() uiocopy+0x16f(ffffff01a3921000, 20000, 1, ffffff0030464580, ffffff0030464268) zfs_write+0x778(ffffff0746c82b80, ffffff0030464580, 0, ffffff06f6cb7680, ffffff00304645b0) fop_write+0x5b(ffffff0746c82b80, ffffff0030464580, 0, ffffff06f6cb7680, ffffff00304645b0) do_io+0x9e(2, ffffff0746c82b80, ffffff0030464580, 0, ffffff06f6cb7680, ffffff00304645b0) rfs4_op_write+0x418(ffffff072f8ff570, ffffff0746cab050, ffffff0030464c00, ffffff0030464640) rfs4_compound+0x1e9(ffffff0030464a00, ffffff00304647d0, 0, ffffff0030464c00, 0, ffffff0030464808) rfs4_dispatch+0x21d(ffffffffc0173010, ffffff0030464c00, ffffff0708e15e00, ffffff0030464a00) common_dispatch+0x765(ffffff0030464c00, ffffff0708e15e00, 2, 4, fffffffff8546376, ffffffffc0173060) rfs_dispatch+0x2d(ffffff0030464c00, ffffff0708e15e00) svc_getreq+0x1c1(ffffff0708e15e00, ffffff0700e275a0) svc_run+0xe0(ffffff07136f6430) svc_do_run+0x8e(1) nfssys+0xf1(e, fece0fbc) dtrace_systrace_syscall32+0xe4(e, fece0fbc, 43fce880, 55f820, 1, 0) _sys_sysenter_post_swapgs+0x149() >
Updated by Marcel Telka about 8 years ago
Root cause:
We received very long write request:
> ::stack vpanic() segkp_fault+0x1fd(ffffff06f118be78, fffffffffbc31d80, ffffff00301aa000, 1000, 0, 1) as_fault+0x31a(ffffff06f118be78, fffffffffbc30ac0, ffffff00301aa000, 1, 0, 1) pagefault+0x96(ffffff00301aa000, 0, 1, 1) trap+0x136f(ffffff003019ff20, ffffff00301aa000, 5) 0xfffffffffb8001d6() 0xfffffffffb853f8a() uiocopy+0x16f(ffffff00e6e0c000, 20000, 1, ffffff00301a0580, ffffff00301a0268) zfs_write+0x778(ffffff0b9c902d40, ffffff00301a0580, 0, ffffff0729797840, ffffff00301a05b0) fop_write+0x5b(ffffff0b9c902d40, ffffff00301a0580, 0, ffffff0729797840, ffffff00301a05b0) do_io+0x9e(2, ffffff0b9c902d40, ffffff00301a0580, 0, ffffff0729797840, ffffff00301a05b0) rfs4_op_write+0x418(ffffff0726f0ccf0, ffffff0730f22e50, ffffff00301a0c00, ffffff00301a0640) rfs4_compound+0x1e9(ffffff00301a0a00, ffffff00301a07d0, 0, ffffff00301a0c00, 0, ffffff00301a0808) rfs4_dispatch+0x21d(fffffffffbd39670, ffffff00301a0c00, ffffff072674e200, ffffff00301a0a00) common_dispatch+0x765(ffffff00301a0c00, ffffff072674e200, 2, 4, fffffffff8857376, fffffffffbd396c0) rfs_dispatch+0x2d(ffffff00301a0c00, ffffff072674e200) svc_getreq+0x1c1(ffffff072674e200, ffffff0700bc28c0) svc_run+0xe0(ffffff0761362850) svc_do_run+0x8e(1) nfssys+0xf1(e, fed00fbc) dtrace_systrace_syscall32+0xe4(e, fed00fbc, 9fb96b00, 26591760, 1, 0) _sys_sysenter_post_swapgs+0x149() > ffffff0726f0ccf0::print -t nfs_argop4 nfs_argop4_u.opwrite WRITE4args nfs_argop4_u.opwrite = { stateid4 nfs_argop4_u.opwrite.stateid = { uint32_t seqid = 0 char [12] other = [ '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0' ] } offset4 nfs_argop4_u.opwrite.offset = 0xffffffff stable_how4 nfs_argop4_u.opwrite.stable = 0 (0) uint_t nfs_argop4_u.opwrite.data_len = 0xffffffff char *nfs_argop4_u.opwrite.data_val = 0 mblk_t *nfs_argop4_u.opwrite.mblk = 0xffffff0700bc28c0 struct clist *nfs_argop4_u.opwrite.rlist = 0 CONN *nfs_argop4_u.opwrite.conn = 0 } >
Please note data_len = 0xffffffff above. It means we received a write request for 2^32 - 1 bytes.
The problem is that various parts of the kernel treats so big numbers as signed, while they should be considered as unsigned (as specified in RFC 4506, Chapter 4.10. Variable-Length Opaque Data).
For example in rpcmod such big message should be discarded and never passed up to the nfs server, see mir_check_len(). But because 0xffffffff is evaluated as -1 the mir_check_len() is confused and accepted the message. Similar signed/unsigned issue is in xdrmblk_getmblk().
We need to go through all RPC and XDR related source code in kernel to make sure all lengths of the opaque data are treated as unsigned, not signed (this will be time consuming).
It looks like the problem is reproducible only with very long RPC messages (> 2GB) and this is not very common, because this issue seems to be in the kernel for very long time (very likely from day zero, so for about 20, maybe 30 years now) and nobody noticed it yet.
Updated by Marcel Telka about 8 years ago
Code review: https://www.illumos.org/rb/r/53/
Updated by Marcel Telka about 8 years ago
- Status changed from In Progress to Pending RTI
Updated by Electric Monk about 8 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
git commit 3b3b7026bde850c59ef70bb86cf2ca9e8d8011fc
commit 3b3b7026bde850c59ef70bb86cf2ca9e8d8011fc Author: Marcel Telka <marcel.telka@nexenta.com> Date: 2015-05-14T17:14:57.000Z 4567 Long RPC messages could confuse kernel and cause panic Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Jan Kryl <jan.kryl@nexenta.com> Reviewed by: Rich Lowe <richlowe@richlowe.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Albert Lee <trisk@omniti.com>