Bug #12383
Slow down and lock up in mlxcx receive interrupt path
100%
Description
In our testing we observed the receive path would at times slow down and occasionally just appear to lock up.
The root cause is in function mlxcx_buf_take_n()
which is called in the receive data path to refill the receive work queue with DMA buffers. The code in question is:
mutex_enter(&s->mlbs_mtx);
while (done < nbufs) {
while (list_is_empty(&s->mlbs_free)) {
(void) cv_reltimedwait(&s->mlbs_free_nonempty,
&s->mlbs_mtx, wtime, TR_MILLISEC);
if (list_is_empty(&s->mlbs_free) &&
empty++ >= MLXCX_BUF_TAKE_N_MAX_RETRIES) {
mutex_exit(&s->mlbs_mtx);
return (done);
}
}
b = list_remove_head(&s->mlbs_free);
ASSERT3U(b->mlb_state, ==, MLXCX_BUFFER_FREE);
b->mlb_state = MLXCX_BUFFER_ON_WQ;
list_insert_tail(&s->mlbs_busy, b);
bp[done++] = b;
}
mutex_exit(&s->mlbs_mtx);
When the system slows down or hangs it is because it is waiting for the cv_reltimedwait()
which is called from an interrupt thread - it is not a good idea to call any form of cv_wait()
in an interrupt context as it blocks all interrupts at the same or lower priority. And before it exits it has to endure 3 * 5ms (which are rounded up to a tick ... so 10ms) timeouts.
In cases we saw, the interrupt thread appeared to be permanently blocked here.
It is looking for buffers on the free list to restock the receive queue, and in the lock up cases the free list is empty, but if you examine the buffers on the busy list I saw:
> ffffd0948fb413a0::walk list|::print -t mlxcx_buffer_t mlb_state mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) . . . mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ) mlxcx_buffer_state_t mlb_state = 2 (MLXCX_BUFFER_ON_WQ)
MLXCX_BUFFER_ON_LOAN
means it has been loaned to a mblk, and MLXCX_BUFFER_ON_WQ
means the buffer is assigned to a slot in the receive queue to receive data. So, the test for the empty free queue and blocking is a bit fallacious since the receive queue is in a state to receive data. Also the mblks which are MLXCX_BUFFER_ON_LOAN
are all chained together waiting to be sent up the IP stack. But aren't being sent because we appear to be stuck inside mlxcx_buf_take_n()
, and hence nothing ends up back on the free list.
To compound this there is an issue in the code which causes the mlxcx_buf_take_n()
to be called on every interrupt. mlxcx_buf_take_n()
is called by mlxcx_rq_refill()
, and:
if (!mlxcx_buf_loan(mlxp, buf)) {
mlxcx_warn(mlxp, "!loan failed, dropping packet");
mlxcx_buf_return(mlxp, buf);
return (NULL);
}
.
.
.
.
/*
* Don't check if a refill is needed on every single completion,
* since checking involves taking the RQ lock.
*/
if ((buf->mlb_wqe_index & 0x7) == 0) {
mlxcx_work_queue_t *wq = mlcq->mlcq_wq;
ASSERT(wq != NULL);
mutex_enter(&wq->mlwq_mtx);
if (!(wq->mlwq_state & MLXCX_WQ_TEARDOWN))
mlxcx_rq_refill(mlxp, wq);
mutex_exit(&wq->mlwq_mtx);
}
The check (buf->mlb_wqe_index & 0x7) == 0
is always true, because the call to mlxcx_buf_loan(mlxp, buf)
sets buf->mlb_wqe_index
to zero. Which is exacerbating the issue.
We have code in test which completely removes all the code relating to the cv_reltimedwait()
and instead will dispatch a taskq task to perform the refill only if the receive queue is getting low in buffers.
Related issues
Updated by Paul Winder 10 months ago
- Related to Bug #12438: mlxcx should pass receive messages to mac layer more frequently added
Updated by Paul Winder 10 months ago
- Related to Bug #12440: mlxcx should not block in the send path added
Updated by Paul Winder 10 months ago
- Related to Bug #12441: mlxcx default queue sizes are a bit on the small size added
Updated by Paul Winder 10 months ago
There are a series of related tickets (listed here).
The combined change for all these tickets produced the following set of throughput figures using iperf on a point-to-point 25Gb link with 9000 MTU
Threads | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
---|---|---|---|---|---|---|---|---|
Gb/s | 20.4 | 24.7 | 24.7 | 24.7 | 24.7 | 24.7 | 23.7 | 23.4 |
Similar runs were done with smaller MTUs. A high thread count and 1500 MTU would tend to fill the transmit rings, and the larger MTUs would fill the receive side. It was using varying MTUs and thread counts I was able to induce the different issues.
Updated by Electric Monk 9 months ago
- Status changed from In Progress to Closed
- % Done changed from 80 to 100
git commit 22d052287ba7ed169757650e2eec25fedbae163a
commit 22d052287ba7ed169757650e2eec25fedbae163a Author: Paul Winder <pwinder@racktopsystems.com> Date: 2020-04-14T15:40:07.000Z 12383 Slow down and lock up in mlxcx receive interrupt path 12438 mlxcx should pass receive messages to mac layer more frequently 12439 mlxcx send rings can overflow 12440 mlxcx should not block in the send path 12441 mlxcx default queue sizes are a bit on the small size Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Robert Mustacchi <rm@fingolfin.org> Approved by: Garrett D'Amore <garrett@damore.org>
Updated by Electric Monk 9 months ago
git commit fd7fa860de2ce9f847175f3d39dfd19f8d5735f9
commit fd7fa860de2ce9f847175f3d39dfd19f8d5735f9 Author: Dan McDonald <danmcd@joyent.com> Date: 2020-04-14T21:50:44.000Z 12383 Slow down and lock up in mlxcx receive interrupt path (fix smatch)