Bug #12084
closedidm_conn_event_handler mishandles CA_DROP action
100%
Description
While testing a fix for #4454, I got:
> ::status debugging crash dump vmcore.12 (64-bit) from omni-2 operating system: 5.11 omni-physio-0-g59a9ce0e70 (i86pc) build version: gfx-drm - heads/master-0-gbdc58b1-dirty image uuid: 6c62e915-bc81-4cd5-cc07-ecb85608e869 panic message: assertion failed: 0, file: ../../common/io/idm/idm_conn_sm.c, line: 424 dump content: kernel pages only > $C ffffff001ef78ac0 vpanic() ffffff001ef78b10 0xfffffffffbe2a425() ffffff001ef78b50 idm_conn_event_handler+0x139(ffffff076697e4d8) ffffff001ef78c00 taskq_thread+0x315(ffffff0766989628) ffffff001ef78c10 thread_start+0xb()
The test involves having a zpool with a mirror vdev and a spare sitting atop 3 iscsi disks. A dd loop writes to a file in that pool while another loop offlines an LU, sleeps 15 seconds, onlines it, then sleeps 45 seconds before repeating the loop. Scripts are in #12076.
Near line 424 we have:
405 if (event_ctx->iec_pdu_event_type != CT_NONE) { 406 switch (action) { 407 case CA_TX_PROTOCOL_ERROR: 408 idm_pdu_tx_protocol_error(ic, pdu); 409 break; 410 case CA_RX_PROTOCOL_ERROR: 411 idm_pdu_rx_protocol_error(ic, pdu); 412 break; 413 case CA_FORWARD: 414 if (!event_ctx->iec_pdu_forwarded) { 415 if (event_ctx->iec_pdu_event_type == 416 CT_RX_PDU) { 417 idm_pdu_rx_forward(ic, pdu); 418 } else { 419 idm_pdu_tx_forward(ic, pdu); 420 } 421 } 422 break; 423 default: 424 ASSERT(0); 425 break; 426 } 427 }
The `if` block and `case` statements are:
idm_conn_event_handler+0x100: movl 0x18(%rbx),%eax idm_conn_event_handler+0x103: testl %eax,%eax idm_conn_event_handler+0x105: je +0x5c <idm_conn_event_handler+0x163> idm_conn_event_handler+0x107: cmpl $0x1,%r14d idm_conn_event_handler+0x10b: je +0x2af <idm_conn_event_handler+0x3c0> idm_conn_event_handler+0x111: jb +0x291 <idm_conn_event_handler+0x3a8> idm_conn_event_handler+0x117: cmpl $0x2,%r14d idm_conn_event_handler+0x11b: je +0x15f <idm_conn_event_handler+0x280> idm_conn_event_handler+0x121: movl $0x1a8,%edx idm_conn_event_handler+0x126: movq $0xfffffffff82479b0,%rsi idm_conn_event_handler+0x12d: movq $0xfffffffff824692b,%rdi idm_conn_event_handler+0x134: call +0x3be9d97 <assfail> idm_conn_event_handler+0x139: movl 0x18(%rbx),%eax
Thus it seems that the value of action is in r14
> <r14=D 3
That is CA_DROP.
203 typedef enum { 204 CA_TX_PROTOCOL_ERROR, /* Send "protocol error" to state machine */ 205 CA_RX_PROTOCOL_ERROR, /* Send "protocol error" to state machine */ 206 CA_FORWARD, /* State machine event and forward to client */ 207 CA_DROP /* Drop PDU */ 208 } idm_pdu_event_action_t;
Earlier CA_DROP is handled.
338 case CA_DROP: 339 /* 340 * It never even happened 341 */ 342 IDM_SM_LOG(CE_NOTE, "*** drop PDU %p", (void *) pdu); 343 idm_pdu_complete(pdu, IDM_STATUS_FAIL); 344 break;
Which is consistent with the fact that the pdu has been freed by idm_pdu_complete().
> <event_ctx::print idm_conn_event_ctx_t iec_info | ::whatis ffffff0729887030 is freed from idm_rx_pdu_cache: ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS ffffff072efccea8 ffffff0729887030 22e24f71bc7 ffffff001ef78c20 ffffff0726beb008 ffffff06e402e800 ffffff06fdecf768 kmem_cache_free_debug+0xfa kmem_cache_free+0x86 idm_sorx_cache_pdu_cb+0x2a idm_sorx_addl_pdu_cb+0x68 idm_pdu_complete+0x17 idm_conn_event_handler+0xc2 taskq_thread+0x315 thread_start+0xb
Updated by Mike Gerdts almost 4 years ago
Updated by Mike Gerdts almost 4 years ago
- Category set to kernel
I think that idm is on the initiator side, so setting category to kernel rather than comstar. But maybe comstar is more appropriate because of the general area that this is in. :shrug:
Updated by Mike Gerdts almost 4 years ago
- Subject changed from use after free in idm_conn_event_handler to idm_conn_event_handler mishandles CA_DROP action
- Description updated (diff)
- Assignee set to Mike Gerdts
Updated by Mike Gerdts almost 4 years ago
- Category changed from kernel to comstar - iSCSI/FC/SAS target
The fix:
From the switch statement at the top of the function, it is clear that CA_DROP is expected and in non-debug builds it is ignored in the second switch statement. The fix will make it so that the second switch statement is aware of CA_DROP but takes no action on the pdu as it has already been freed. To make it more clear that the pdu is freed, the pointer to it (event_ctx->iec_info) is set to NULL near where it is freed and ASSERTed to be NULL later.
The test plan:
Since I've hit this a couple times while testing the fix for #4454, my ongoing testing of #4454 includes the following dtrace script to be sure this path is visited:
#! /usr/sbin/dtrace -s idm_conn_event_handler:entry { self->interest = 1; } idm_conn_sm_validate_pdu:return /self->interest && arg1 == 3/ { self->survived = 1; printf("\n%Y in problematic code path\n", walltimestamp); stack(); } idm_conn_event_handler:return /self->survived/ { printf("Survived!\n"); self->survived = 0; } idm_conn_event_handler:return { self->interest = 0; }
With the problematic condition is encountered, this script should give a stack trace that looks similar to that found in the description.
Updated by Jerry Jelinek over 3 years ago
- Assignee changed from Mike Gerdts to Jerry Jelinek
Updated by Jerry Jelinek over 3 years ago
For testing, I followed Mike's procedures from the bug description which he used to hit this bug. I'm copying that here since there is a lot of text above.
The test involves having a zpool with a mirror vdev and a spare sitting atop 3 iscsi disks. A dd loop writes to a file in that pool while another loop offlines an LU, sleeps 15 seconds, onlines it, then sleeps 45 seconds before repeating the loop. Scripts are in #12076.
Updated by Electric Monk over 3 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 5fe3b0929d8a195dbfa4196d72a3f9b15b745171
commit 5fe3b0929d8a195dbfa4196d72a3f9b15b745171 Author: Mike Gerdts <mike.gerdts@joyent.com> Date: 2020-02-04T14:09:56.000Z 12084 idm_conn_event_handler mishandles CA_DROP action Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> Reviewed by: C Fraire <cfraire@me.com> Approved by: Dan McDonald <danmcd@joyent.com>