Project

General

Profile

Bug #5907

xdrmblk_getpos() is unreliable

Added by Marcel Telka over 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2015-05-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

The xdrmblk_getpos() implementation does not work as expected and it could easily panic the system, if not used carefully.

To reproduce the issue try the following (use the attached module.c file)::

# /opt/gcc/4.4.4/bin/gcc -Wall -D_KERNEL -m64 -mcmodel=kernel -mno-red-zone -ffreestanding -nodefaultlibs -c module.c
# /usr/ccs/bin/ld -r -o module module.o
# modload module

After the modload this will appear in the /var/adm/messages:

May  5 01:26:44 t1 genunix: [ID 344344 kern.info] NOTICE: Position #0: 0
May  5 01:26:46 t1 genunix: [ID 344344 kern.info] NOTICE: Position #1: 4
May  5 01:26:48 t1 genunix: [ID 344344 kern.info] NOTICE: Position #2: 0
May  5 01:26:50 t1 genunix: [ID 344344 kern.info] NOTICE: Position #3: 4

and the machine panices with this:

> ::status
debugging crash dump vmcore.1 (64-bit) from t1
operating system: 5.11 illumos-4e90188 (i86pc)
image uuid: 312165a1-d276-c83b-981d-9451c4004354
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff0002e68c10 addr=18 occurred in module "genunix" due to a NULL pointer dereference
dump content: kernel pages only
> ::stack
xdrmblk_getpos+0x28(ffffff0002e68d30)
module`_init+0xee()
modinstall+0x8a(ffffff00d24c2bf0)
mod_hold_installed_mod+0x79(ffffff00c885e580, 0, 0, ffffff0002e68e3c)
modctl_modload+0xa0(0, fffffd7fffdff910, fffffd7fffdffd1c)
modctl+0xb9(0, 0, fffffd7fffdff910, fffffd7fffdffd1c, 0, 0)
sys_syscall+0x17a()
>

The immediate root cause for the panic is that the xdrmblk_getint32() could leave the x_base set to zero and xdrmblk_getpos() does not check that.

In addition to the panic, the correct log after the module is loaded should look like this:

May  5 01:26:44 t1 genunix: [ID 344344 kern.info] NOTICE: Position #0: 0
May  5 01:26:46 t1 genunix: [ID 344344 kern.info] NOTICE: Position #1: 4
May  5 01:26:48 t1 genunix: [ID 344344 kern.info] NOTICE: Position #2: 8
May  5 01:26:50 t1 genunix: [ID 344344 kern.info] NOTICE: Position #3: 12
...

IOW, the position should monotonously increase.


Files

module.c (1.07 KB) module.c Marcel Telka, 2015-05-15 05:42 PM

Related issues

Related to illumos gate - Bug #6747: xdr_READDIR4res() bypass the XDR mblk APIClosed2016-03-13

Actions
Blocks illumos gate - Bug #6090: IOPS, bandwidth, and latency kstats for NFS serverRejected2015-07-29

Actions

History

#1

Updated by Marcel Telka over 5 years ago

  • File deleted (module.c)
#2

Updated by Marcel Telka over 5 years ago

#3

Updated by Marcel Telka about 5 years ago

The fix

First, I needed to remember more internal data than before, so I created the xdrmblk_params structure and let the x_private point to it. Since previously sz was held in x_private I added sz to xdrmblk_params as well. The xdrmblk works with a string of mblks connected using b_cont. To properly implement xdr_getpos() I had to remember the absolute position of the current (that one I'm working with) mblk - in apos - and the relative position in the current mblk - in rpos.

The xdrmblk_params structure is allocated using kmem_alloc() in xdrmblk_init() and deallocated in xdrmblk_destroy(). Previously, the xdrmblk_destroy() was just a noop, so all xdrmblk consumers were free to not call XDR_DESTROY() for the xdrmblk (or call it several times). All changes in other files than xdr_mblk.c are to fix these missing (or extra) XDR_DESTROY() calls. Note: I noticed this issue by running ::findleaks.

I introduced xdrmblk_skip_fully_read_mblks() helper to properly skip the fully read (or empty) mblks. This functionality was already there, but it was not used everywhere where it was needed.

The last thing one needs to get familiar with is x_handy. It contains the number of available bytes (either for read or for write) in the current mblk. This semantics was already there, so I just followed to use it.

#4

Updated by Marcel Telka about 5 years ago

  • Blocks Bug #6090: IOPS, bandwidth, and latency kstats for NFS server added
#6

Updated by Marcel Telka about 5 years ago

  • Status changed from In Progress to Pending RTI
#7

Updated by Electric Monk about 5 years ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 0 to 100

git commit cf98b944cdc2063fc14f3fd525e284de3ed29fd0

commit  cf98b944cdc2063fc14f3fd525e284de3ed29fd0
Author: Marcel Telka <marcel.telka@nexenta.com>
Date:   2015-09-03T17:56:17.000Z

    5907 xdrmblk_getpos() is unreliable
    Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

#8

Updated by Marcel Telka over 4 years ago

  • Related to Bug #6747: xdr_READDIR4res() bypass the XDR mblk API added

Also available in: Atom PDF