xdrmblk_getpos() is unreliable
The xdrmblk_getpos() implementation does not work as expected and it could easily panic the system, if not used carefully.
To reproduce the issue try the following (use the attached module.c file)::
# /opt/gcc/4.4.4/bin/gcc -Wall -D_KERNEL -m64 -mcmodel=kernel -mno-red-zone -ffreestanding -nodefaultlibs -c module.c # /usr/ccs/bin/ld -r -o module module.o # modload module
After the modload this will appear in the /var/adm/messages:
May 5 01:26:44 t1 genunix: [ID 344344 kern.info] NOTICE: Position #0: 0 May 5 01:26:46 t1 genunix: [ID 344344 kern.info] NOTICE: Position #1: 4 May 5 01:26:48 t1 genunix: [ID 344344 kern.info] NOTICE: Position #2: 0 May 5 01:26:50 t1 genunix: [ID 344344 kern.info] NOTICE: Position #3: 4
and the machine panices with this:
> ::status debugging crash dump vmcore.1 (64-bit) from t1 operating system: 5.11 illumos-4e90188 (i86pc) image uuid: 312165a1-d276-c83b-981d-9451c4004354 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0002e68c10 addr=18 occurred in module "genunix" due to a NULL pointer dereference dump content: kernel pages only > ::stack xdrmblk_getpos+0x28(ffffff0002e68d30) module`_init+0xee() modinstall+0x8a(ffffff00d24c2bf0) mod_hold_installed_mod+0x79(ffffff00c885e580, 0, 0, ffffff0002e68e3c) modctl_modload+0xa0(0, fffffd7fffdff910, fffffd7fffdffd1c) modctl+0xb9(0, 0, fffffd7fffdff910, fffffd7fffdffd1c, 0, 0) sys_syscall+0x17a() >
The immediate root cause for the panic is that the xdrmblk_getint32() could leave the x_base set to zero and xdrmblk_getpos() does not check that.
In addition to the panic, the correct log after the module is loaded should look like this:
May 5 01:26:44 t1 genunix: [ID 344344 kern.info] NOTICE: Position #0: 0 May 5 01:26:46 t1 genunix: [ID 344344 kern.info] NOTICE: Position #1: 4 May 5 01:26:48 t1 genunix: [ID 344344 kern.info] NOTICE: Position #2: 8 May 5 01:26:50 t1 genunix: [ID 344344 kern.info] NOTICE: Position #3: 12 ...
IOW, the position should monotonously increase.
Updated by Marcel Telka about 5 years ago
First, I needed to remember more internal data than before, so I created the xdrmblk_params structure and let the x_private point to it. Since previously sz was held in x_private I added sz to xdrmblk_params as well. The xdrmblk works with a string of mblks connected using b_cont. To properly implement xdr_getpos() I had to remember the absolute position of the current (that one I'm working with) mblk - in apos - and the relative position in the current mblk - in rpos.
The xdrmblk_params structure is allocated using kmem_alloc() in xdrmblk_init() and deallocated in xdrmblk_destroy(). Previously, the xdrmblk_destroy() was just a noop, so all xdrmblk consumers were free to not call XDR_DESTROY() for the xdrmblk (or call it several times). All changes in other files than xdr_mblk.c are to fix these missing (or extra) XDR_DESTROY() calls. Note: I noticed this issue by running ::findleaks.
I introduced xdrmblk_skip_fully_read_mblks() helper to properly skip the fully read (or empty) mblks. This functionality was already there, but it was not used everywhere where it was needed.
The last thing one needs to get familiar with is x_handy. It contains the number of available bytes (either for read or for write) in the current mblk. This semantics was already there, so I just followed to use it.
Updated by Electric Monk about 5 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
commit cf98b944cdc2063fc14f3fd525e284de3ed29fd0 Author: Marcel Telka <email@example.com> Date: 2015-09-03T17:56:17.000Z 5907 xdrmblk_getpos() is unreliable Reviewed by: Josef 'Jeff' Sipek <firstname.lastname@example.org> Reviewed by: Yuri Pankov <email@example.com> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Approved by: Dan McDonald <firstname.lastname@example.org>