Bug #3902
closedRace between ipmi_submit_driver_request() and kcs_loop()
100%
Description
There are three issues related to ipmi_attach().
Issue 1
There is a race window between lines 128 and 129 here in ipmi_submit_driver_request():
124 IPMI_LOCK(sc); 125 error = sc->ipmi_enqueue_request(sc, req); 126 if (error == 0) { 127 /* Wait for result - see ipmi_complete_request */ 128 IPMI_UNLOCK(sc); 129 mutex_enter(&slpmutex);
After the line 128 (and before line 129) the req is already enqueued via ipmi_polled_enqueue_request().
The kcs_loop() is responsible for the req dequeue. The kcs_loop() function/thread is started in kcs_startup() here in ipmi_startup() at line 208:
208 error = sc->ipmi_startup(sc); <---------------------- 209 if (error) { 210 cmn_err(CE_WARN, "Failed to initialize interface: %d", error); 211 return (B_FALSE); 212 }
Almost immediately after that, the request is enqueued:
218 error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT);
Since the kcs_loop() is started immediately before the req is enqueued, it is possible that during the req enqueue in the ipmi_polled_enqueue_request() the kcs_loop() is not fully started yet and it is waiting at line 453 here:
447kcs_loop(void *arg) 448{ 449 struct ipmi_softc *sc = arg; 450 struct ipmi_request *req; 451 int i, ok; 452 453 IPMI_LOCK(sc); <----------------------
because the sc lock (the sc->ipmi_lock mutex) is held during the ipmi_polled_enqueue_request() here:
119ipmi_submit_driver_request(struct ipmi_softc *sc, struct ipmi_request *req, 120 int timo) 121{ 122 int error; 123 124 IPMI_LOCK(sc); 125 error = sc->ipmi_enqueue_request(sc, req); <----------- 126 if (error == 0) { 127 /* Wait for result - see ipmi_complete_request */ 128 IPMI_UNLOCK(sc); 129 mutex_enter(&slpmutex);
Note: ipmi_enqueue_request points to ipmi_polled_enqueue_request.
After the req is enqueued (at line 125) the sc lock is released (at line 128). Now the kcs_loop() continues and does its job:
453 IPMI_LOCK(sc); 454 while ((req = ipmi_dequeue_request(sc)) != NULL) { <--------- 455 ok = 0; 456 for (i = 0; i < 3 && !ok; i++) 457 ok = kcs_polled_request(sc, req); 458 if (ok) 459 req->ir_error = 0; 460 else 461 req->ir_error = EIO; 462 ipmi_complete_request(sc, req); <------- 463 } 464 IPMI_UNLOCK(sc);
The req is dequeued at line 454, processed at line 457 and at line 462 it is signaled that the processing is complete.
94ipmi_complete_request(struct ipmi_softc *sc, struct ipmi_request *req) ... 105 mutex_enter(&slpmutex); 106 cv_signal(&slplock); 107 mutex_exit(&slpmutex);
Back to the race window:
128 IPMI_UNLOCK(sc); 129 mutex_enter(&slpmutex); 130 if (timo == 0) 131 cv_wait(&slplock, &slpmutex); 132 else 133 error = cv_timedwait(&slplock, &slpmutex, <------- 134 ddi_get_lbolt() + timo);
The ipmi_submit_driver_request() will continue at line 129 and then wait and timeout at line 133 even the request was fully processed and signaled.
This is easily reproducible with slightly modified ipmi module. Just add a delay (let say 5 sec) at beginning of the kcs_loop(). Add another delay (let say 10 sec) into the race window (between lines 128 and 129) and try to attach the ipmi.
Issue 2
Once the issue 1 is reproduced and this appears in the log:
WARNING: Timed out waiting for GET_DEVICE_ID
the ipmi_attach() hangs in ipmi_shutdown() waiting for the kcs_loop() termination:
> ::stacks -m ipmi THREAD STATE SOBJ COUNT ffffff001ef8ec40 SLEEP CV 1 swtch+0x141 cv_wait+0x70 ipmi_dequeue_request+0x38 kcs_loop+0x85 taskq_thread+0x2d0 thread_start+8 ffffff0520426400 SLEEP CV 1 swtch+0x141 cv_wait+0x70 taskq_wait+0x43 taskq_destroy+0x6c ipmi_shutdown+0x1b ipmi_attach+0xfb devi_attach+0x92 attach_node+0xa7 i_ndi_config_node+0x86 i_ddi_attachchild+0x48 devi_attach_node+0x5e devi_config_one+0x294 ndi_devi_config_one+0xa8 dv_find+0x233 devfs_lookup+0x37 fop_lookup+0xa2 lookuppnvp+0x1f6 lookuppnatcred+0x15e lookupnameatcred+0xad lookupnameat+0x39 vn_openat+0x315 copen+0x20c openat32+0x27 open32+0x25 _sys_sysenter_post_swapgs+0x149 > ffffff0520426400::thread -d ADDR DISPTIME BOUND PR SWITCH ffffff0520426400 1155 -1 0 t-29537 >
The kcs_loop() won't terminate, because something like this:
552 /* poke the taskq so that it can terminate */ 553 sc->ipmi_detaching = 1; 554 cv_signal(&sc->ipmi_request_added);
is missing right before line 531 here in ipmi_attach():
530 if (ipmi_startup(sc) != B_TRUE) { 531 ipmi_shutdown(sc); 532 return (DDI_FAILURE); 533 }
Issue 3
This issue is extremely unlikely, but theoretically possible: In a case the kcs_loop() start is very slow and it will take more than 3 seconds for kcs_loop() to call ipmi_dequeue_request() first time, the ipmi_submit_driver_request() will timeout here in ipmi_startup():
218 error = ipmi_submit_driver_request(sc, req, MAX_TIMEOUT); 219 if (error == EWOULDBLOCK) { 220 cmn_err(CE_WARN, "Timed out waiting for GET_DEVICE_ID"); 221 ipmi_free_request(req); 222 return (B_FALSE);
After the timeout the request is freed at line 221, but it will still sit in the queue because nobody dequeued it. This might cause use after free later.
Related issues
Updated by Marcel Telka almost 9 years ago
Issue 4
In a case the cv_wait() (or cv_timedwait()) will return prematurely in ipmi_submit_driver_request() we will end up in a similar case as in Issue 3 - the request will be freed but still left on the queue.
Updated by Robert Mustacchi almost 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Resolved in 1e3934778d15dd08e911e1d050dd7a4949348d93.