Project

General

Profile

Bug #3902

Race between ipmi_submit_driver_request() and kcs_loop()

Added by Marcel Telka over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
2013-07-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

There are three issues related to ipmi_attach().

Issue 1

There is a race window between lines 128 and 129 here in ipmi_submit_driver_request():

124    IPMI_LOCK(sc);
125    error = sc->ipmi_enqueue_request(sc, req);
126    if (error == 0) {
127        /* Wait for result - see ipmi_complete_request */
128             IPMI_UNLOCK(sc);
129             mutex_enter(&slpmutex);

After the line 128 (and before line 129) the req is already enqueued via ipmi_polled_enqueue_request().

The kcs_loop() is responsible for the req dequeue. The kcs_loop() function/thread is started in kcs_startup() here in ipmi_startup() at line 208:

208     error = sc->ipmi_startup(sc);  <----------------------
209     if (error) {
210             cmn_err(CE_WARN, "Failed to initialize interface: %d", error);
211             return (B_FALSE);
212     }

Almost immediately after that, the request is enqueued:

218     error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);

Since the kcs_loop() is started immediately before the req is enqueued, it is possible that during the req enqueue in the ipmi_polled_enqueue_request() the kcs_loop() is not fully started yet and it is waiting at line 453 here:

447kcs_loop(void *arg)
448{
449     struct ipmi_softc *sc = arg;
450     struct ipmi_request *req;
451     int i, ok;
452
453     IPMI_LOCK(sc);    <----------------------

because the sc lock (the sc->ipmi_lock mutex) is held during the ipmi_polled_enqueue_request() here:

119ipmi_submit_driver_request(struct ipmi_softc *sc, struct ipmi_request *req,
120    int timo)
121{
122     int error;
123
124     IPMI_LOCK(sc);
125     error = sc->ipmi_enqueue_request(sc, req);    <-----------
126     if (error == 0) {
127             /* Wait for result - see ipmi_complete_request */
128             IPMI_UNLOCK(sc);
129             mutex_enter(&slpmutex);

Note: ipmi_enqueue_request points to ipmi_polled_enqueue_request.

After the req is enqueued (at line 125) the sc lock is released (at line 128). Now the kcs_loop() continues and does its job:

453     IPMI_LOCK(sc);
454     while ((req = ipmi_dequeue_request(sc)) != NULL) {  <---------
455             ok = 0;
456             for (i = 0; i < 3 && !ok; i++)
457                     ok = kcs_polled_request(sc, req);
458             if (ok)
459                     req->ir_error = 0;
460             else
461                     req->ir_error = EIO;
462             ipmi_complete_request(sc, req);  <-------
463     }
464     IPMI_UNLOCK(sc);

The req is dequeued at line 454, processed at line 457 and at line 462 it is signaled that the processing is complete.

94ipmi_complete_request(struct ipmi_softc *sc, struct ipmi_request *req)

...

105             mutex_enter(&slpmutex);
106             cv_signal(&slplock);
107             mutex_exit(&slpmutex);

Back to the race window:

128             IPMI_UNLOCK(sc);
129             mutex_enter(&slpmutex);
130             if (timo == 0)
131                     cv_wait(&slplock, &slpmutex);
132             else
133                     error = cv_timedwait(&slplock, &slpmutex, <-------
134                         ddi_get_lbolt() + timo);

The ipmi_submit_driver_request() will continue at line 129 and then wait and timeout at line 133 even the request was fully processed and signaled.

This is easily reproducible with slightly modified ipmi module. Just add a delay (let say 5 sec) at beginning of the kcs_loop(). Add another delay (let say 10 sec) into the race window (between lines 128 and 129) and try to attach the ipmi.

Issue 2

Once the issue 1 is reproduced and this appears in the log:

WARNING: Timed out waiting for GET_DEVICE_ID

the ipmi_attach() hangs in ipmi_shutdown() waiting for the kcs_loop() termination:

> ::stacks -m ipmi
THREAD           STATE    SOBJ                COUNT
ffffff001ef8ec40 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 ipmi_dequeue_request+0x38
                 kcs_loop+0x85
                 taskq_thread+0x2d0
                 thread_start+8

ffffff0520426400 SLEEP    CV                      1
                 swtch+0x141
                 cv_wait+0x70
                 taskq_wait+0x43
                 taskq_destroy+0x6c
                 ipmi_shutdown+0x1b
                 ipmi_attach+0xfb
                 devi_attach+0x92
                 attach_node+0xa7
                 i_ndi_config_node+0x86
                 i_ddi_attachchild+0x48
                 devi_attach_node+0x5e
                 devi_config_one+0x294
                 ndi_devi_config_one+0xa8
                 dv_find+0x233
                 devfs_lookup+0x37
                 fop_lookup+0xa2
                 lookuppnvp+0x1f6
                 lookuppnatcred+0x15e
                 lookupnameatcred+0xad
                 lookupnameat+0x39
                 vn_openat+0x315
                 copen+0x20c
                 openat32+0x27
                 open32+0x25          
                 _sys_sysenter_post_swapgs+0x149
> ffffff0520426400::thread -d
            ADDR         DISPTIME BOUND PR SWITCH
ffffff0520426400             1155    -1  0 t-29537
>

The kcs_loop() won't terminate, because something like this:

552    /* poke the taskq so that it can terminate */
553    sc->ipmi_detaching = 1;
554    cv_signal(&sc->ipmi_request_added);

is missing right before line 531 here in ipmi_attach():

530    if (ipmi_startup(sc) != B_TRUE) {
531        ipmi_shutdown(sc);
532        return (DDI_FAILURE);
533    }

Issue 3

This issue is extremely unlikely, but theoretically possible: In a case the kcs_loop() start is very slow and it will take more than 3 seconds for kcs_loop() to call ipmi_dequeue_request() first time, the ipmi_submit_driver_request() will timeout here in ipmi_startup():

218    error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
219    if (error == EWOULDBLOCK) {
220        cmn_err(CE_WARN, "Timed out waiting for GET_DEVICE_ID");
221        ipmi_free_request(req);
222        return (B_FALSE);

After the timeout the request is freed at line 221, but it will still sit in the queue because nobody dequeued it. This might cause use after free later.


Related issues

Related to illumos gate - Bug #4282: Race between ipmi_close() and kcs_loop() could lead to panicResolvedMarcel Telka2013-10-31

Actions
#1

Updated by Marcel Telka over 7 years ago

Issue 4

In a case the cv_wait() (or cv_timedwait()) will return prematurely in ipmi_submit_driver_request() we will end up in a similar case as in Issue 3 - the request will be freed but still left on the queue.

#2

Updated by Robert Mustacchi over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in 1e3934778d15dd08e911e1d050dd7a4949348d93.

Also available in: Atom PDF