Bug #12383
Slow down and lock up in mlxcx receive interrupt path
100%
Description
In our testing we observed the receive path would at times slow down and occasionally just appear to lock up.
The root cause is in function mlxcx_buf_take_n()
which is called in the receive data path to refill the receive work queue with DMA buffers. The code in question is:
mutex_enter(&s->mlbs_mtx);
while (done < nbufs) {
while (list_is_empty(&s->mlbs_free)) {
(void) cv_reltimedwait(&s->mlbs_free_nonempty,
&s->mlbs_mtx, wtime, TR_MILLISEC);
if (list_is_empty(&s->mlbs_free) &&
empty++ >= MLXCX_BUF_TAKE_N_MAX_RETRIES) {
mutex_exit(&s->mlbs_mtx);
return (done);
}
}
b = list_remove_head(&s->mlbs_free);
ASSERT3U(b->mlb_state, ==, MLXCX_BUFFER_FREE);
b->mlb_state = MLXCX_BUFFER_ON_WQ;
list_insert_tail(&s->mlbs_busy, b);
bp[done++] = b;
}
mutex_exit(&s->mlbs_mtx);
When the system slows down or hangs it is because it is waiting for the cv_reltimedwait()
which is called from an interrupt thread - it is not a good idea to call any form of cv_wait()
in an interrupt context as it blocks all interrupts at the same or lower priority. And before it exits it has to endure 3 * 5ms (which are rounded up to a tick ... so 10ms) timeouts.
In cases we saw, the interrupt thread appeared to be permanently blocked here.
It is looking for buffers on the free list to restock the receive queue, and in the lock up cases the free list is empty, but if you examine the buffers on the busy list I saw:
> ffffd0948fb413a0::walk list|::print -t mlxcx_buffer_t mlb_state mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) . . . mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ)
MLXCX_BUFFER_ON_LOAN
means it has been loaned to a mblk, and MLXCX_BUFFER_ON_WQ
means the buffer is assigned to a slot in the receive queue to receive data. So, the test for the empty free queue and blocking is a bit fallacious since the receive queue is in a state to receive data. Also the mblks which are MLXCX_BUFFER_ON_LOAN
are all chained together waiting to be sent up the IP stack. But aren't being sent because we appear to be stuck inside mlxcx_buf_take_n()
, and hence nothing ends up back on the free list.
To compound this there is an issue in the code which causes the mlxcx_buf_take_n()
to be called on every interrupt. mlxcx_buf_take_n()
is called by mlxcx_rq_refill()
, and:
if (!mlxcx_buf_loan(mlxp, buf)) {
mlxcx_warn(mlxp, "!loan failed, dropping packet");
mlxcx_buf_return(mlxp, buf);
return (NULL);
}
.
.
.
.
/*
* Don't check if a refill is needed on every single completion,
* since checking involves taking the RQ lock.
*/
if ((buf->mlb_wqe_index & 0x7) == 0) {
mlxcx_work_queue_t *wq = mlcq->mlcq_wq;
ASSERT(wq != NULL);
mutex_enter(&wq->mlwq_mtx);
if (!(wq->mlwq_state & MLXCX_WQ_TEARDOWN))
mlxcx_rq_refill(mlxp, wq);
mutex_exit(&wq->mlwq_mtx);
}
The check (buf->mlb_wqe_index & 0x7) == 0
is always true, because the call to mlxcx_buf_loan(mlxp, buf)
sets buf->mlb_wqe_index
to zero. Which is exacerbating the issue.
We have code in test which completely removes all the code relating to the cv_reltimedwait()
and instead will dispatch a taskq task to perform the refill only if the receive queue is getting low in buffers.
Related issues