Project

General

Profile

Actions

Bug #11922

closed

ipmi_open looks for wrong return value

Added by Jason King over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

We had a system that paniced with the following stack:

fffffcc26c7f4990 vpanic()
fffffcc26c7f49c0 vmem_hash_delete+0x9b(fffffe2386923000, 0, 1)
fffffcc26c7f4a20 vmem_xfree+0x4b(fffffe2386923000, 0, 1)
fffffcc26c7f4a50 vmem_free+0x23(fffffe2386923000, 0, 1)
fffffcc26c7f4a70 id_free+0x1f(fffffe2386923000, ffffffff)
fffffcc26c7f4ac0 ipmi_close+0x14d(111ffffffff, 2003, 2, fffffe27c4dda328)
fffffcc26c7f4af0 dev_close+0x31(111ffffffff, 2003, 2, fffffe27c4dda328)
fffffcc26c7f4b40 device_close+0xd8(fffffe28c2983800, 2003, fffffe27c4dda328)
fffffcc26c7f4bd0 spec_close+0x17b(fffffe28c2983800, 2003, 1, 0, fffffe27c4dda328, 0)
fffffcc26c7f4c50 fop_close+0x61(fffffe28c2983800, 2003, 1, 0, fffffe27c4dda328, 0)
fffffcc26c7f4c90 closef+0x5e(fffffe290ea738a0)
fffffcc26c7f4cd0 closeall+0x67(fffffe28558650e8)
fffffcc26c7f4d60 proc_exit+0x40e(2, d)
fffffcc26c7f4d80 exit+0x15(2, d)
fffffcc26c7f4e00 psig+0x38c()
fffffcc26c7f4ec0 post_syscall+0x805(20, 0)
fffffcc26c7f4f00 syscall_exit+0x68(fffffe2844511ba0, 20, 0)
fffffcc26c7f4f10 0xfffffffffb800f9a()

The id value being freed (0xffffffff) is suspicious -- in fact from ipmi_attach(), we can see that we should only ever create minor ids in the range [1,128]:

/* Create ID space for open devs.  ID 0 is reserved. */
        minor_ids = id_space_create("ipmi_id_space", 1, 128);

Looking at ipmi_open, we see the following:

        if ((minor = (minor_t)id_alloc_nosleep(minor_ids)) == 0)
                return (ENODEV);

From id_space(9F):

It is a programmer error to call the id_free() function on an identifier that has not been allocated.

and

Upon successful completion, the id_alloc_nosleep(), id_allocff_nosleep(), and id_alloc_specific_nosleep() functions will return an identifier that's in the range of the specified identifier space. Otherwise, -1 is returned to indicate that no identifier was available, or in the case of the id_alloc_specific_nosleep() function, that the specified identifier was not available.

The fix is pretty simple -- ENODEV should be returned when id_alloc_nosleep() returns -1, not 0. Since the id space is initialize to [1, 128] and there is no code to expand the id space, 0 is never a possible return value from id_alloc_nosleep().


Related issues

Related to illumos gate - Bug #11929: mac_minor_hold() gets id_alloc_nosleep() wrongClosedJohn Levon

Actions
Actions #1

Updated by Jason King over 1 year ago

From further investigation, it was determined that the IPMI device on this particular device was especially slow, causing some monitoring bits to have multiple ipmitool invocations running and stacking on top of each other. This makes sense -- one can more directly recreate the issue by running dtrace -wn 'ipmi_open:return { chill(10000) }' in one session and

i=0
while ((i < 130)); do
    ipmitool sensor > /dev/null &
   (( i = i + 1 ))
done

in another.

Actions #2

Updated by Jason King over 1 year ago

To test, I first reproduced the panic using the steps documented above. I then booted the same system with a fixed platform image and repeated the same steps. The panic did not occur, and while the while loop was running, eventually:

Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

was seen, which would match the expected ENODEV failure path after exhausting the minor id space.

Actions #4

Updated by Electric Monk over 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 22f89f96cd7b45b9686231ed7d98e610077df6c6

commit  22f89f96cd7b45b9686231ed7d98e610077df6c6
Author: Jason King <jason.king@joyent.com>
Date:   2019-11-07T00:52:16.000Z

    11922 ipmi_open looks for wrong return value
    Reviewed by: Marcel Telka <marcel@telka.sk>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Andy Stormont <AStormont@racktopsystems.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: John Levon <john.levon@joyent.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions #5

Updated by Marcel Telka over 1 year ago

  • Related to Bug #11929: mac_minor_hold() gets id_alloc_nosleep() wrong added
Actions

Also available in: Atom PDF