Bug #14098
closedhandle failure when muxing non-device streams
100%
Description
A panic from a process using I_LINK
to arrange STREAMS multiplexing:
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe0040e6c8a0 addr=20 occurred in module "genunix" due to a NULL pointer dereference dump content: kernel pages only > $C fffffe0040e6c9b0 handle_release+0xf(0) fffffe0040e6ca20 ldi_munlink_fp+0xd4(fffffe2cc429f930, fffffe30c030a2d0, 1) fffffe0040e6caf0 munlink+0x2f0(fffffe2cc429f930, fffffe2cba51cc08, 5, fffffe312f312e10, fffffe0040e6cba4, fffffe30c098e3c8) fffffe0040e6cb60 munlinkall+0x4f(fffffe2cc429f930, 5, fffffe312f312e10, fffffe0040e6cba4, fffffe30c098e3c8) fffffe0040e6cc00 strclose+0x509(fffffe30aa73b880, 3, fffffe312f312e10) fffffe0040e6cc80 sotpi_close+0xab(fffffe30892802e8, 3, fffffe312f312e10) fffffe0040e6ccb0 socket_close_internal+0x18(fffffe30892802e8, 3, fffffe312f312e10) fffffe0040e6cd30 socket_vop_close+0xf6(fffffe30aa73b880, 3, 1, 0, fffffe312f312e10, 0) fffffe0040e6cdb0 fop_close+0x66(fffffe30aa73b880, 3, 1, 0, fffffe312f312e10, 0) fffffe0040e6cdf0 closef+0x68(fffffe312a51bef8) fffffe0040e6ce30 closeall+0x57(fffffe3095647110) fffffe0040e6cec0 proc_exit+0x429(1, 0) fffffe0040e6cee0 exit+0xb(1, 0) fffffe0040e6cf00 rexit+0x15(0) fffffe0040e6cf10 sys_syscall+0x1a8()
Related issues
Updated by Joshua M. Clulow 8 months ago
Notes from Rich¶
I_LINK
will fail when attempting to link a (device) stream and a non-device file of some kind, such as a socket. We check that the lower stream is a device (in mlink_fp
), and handle that correctly if it's not. We also end up, in the non-persistent I_LINK
case checking that the upper stream is a device (in ldi_ident_from_stream
). If that second check fails, we ASSERT
(and crash immediately), or keep going (which works), and crash later when we close the file descriptor and fail to unlink the streams.
Conservatively, we should handle errors coming up from ldi_mlink_fp
into mlink_fp
and mlink
, and return EINVAL
if they're a problem.
It is possible that muxing in a socket is actually OK, but the comments in ldi_mlink_fp
lead us to believe that a non-persistent mux that only has a major number to differentiate it (as we'd get from an unbound socket to get to ip(7D)
, rather than opening the clone device) is potentially a problem.
Updated by Joshua M. Clulow 8 months ago
Testing Notes¶
Tested using a modified version of wireguard from Nahum which discovered this bug (which fails and no longer panics). A minimal reproduction was also created to isolate the specific failure.
Tested using a modified version of wireguard from Nahum which instead does what we are determining to be correct in this bug (which works, and will establish links over the VPN correctly).
Updated by Electric Monk 8 months ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 489c4c5ee49bb6dca6f69e0c79b358fd79799b73
commit 489c4c5ee49bb6dca6f69e0c79b358fd79799b73 Author: Richard Lowe <richlowe@richlowe.net> Date: 2021-09-22T18:22:46.000Z 14098 handle failure when muxing non-device streams Reviewed by: Dan McDonald <danmcd@joyent.com> Approved by: Joshua M. Clulow <josh@sysmgr.org>
Updated by Joshua M. Clulow 8 months ago
- Related to Bug #14133: rlogind: ioctl I_LINK of tcp connection failed added