Bug #2549
closediscsi: idm unexpectedly closes socket connection due to pagefault in uiomove/kcopy_nta
0%
Description
OS: oi_151a
when issuing SCSI read commands to a iscsi target on Linux (SCST 2.1.0, scst_cdrom module), I regularly see the connection being dropped by OI:
Mar 25 17:58:10 linuxdev kernel: [0]: iscsi-scst: __iscsi_state_change:444:***ERROR***: Connection with initiator iqn.1986-03.com.sun:01:00900d4901ff.4f62ff12 unexpectedly closed!
After doing a network trace with Wireshark, I noticed that OI indeed closes the socket connection unexpectedly.
I then started investigating the idm module using dtrace, and found that a page fault occurs in kcopy_nta() and it then
returns 14 (EFAULT):
0 > idm_fill_iov ro 0 dlength 8192
0 < idm_fill_iov 8192
0 > idm_iov_sorecv so ffffff014a6f5288 iop ffffff014c1c36e0 iovlen 1 total_len 8192
0 -> ksocket_recvmsg ks ffffff014a6f5288 msg ffffff000498eab0 flags 64 cr ffffff0148468db0
0 -> socket_recvmsg so ffffff014a6f5288 msg ffffff000498eab0 uiop ffffff000498ea10, cr ffffff0148468db0
0 -> so_recvmsg so ffffff014a6f5288 msg ffffff000498eab0 uiop ffffff000498ea10, cr ffffff0148468db0
0 -> so_dequeue_msg so ffffff014a6f5288
0 -> uiomove p ffffff01812d50e2 size 1400 uiop ffffff000498ea10 resid 8192, uio_segflg 1
0 -> kcopy_nta from ffffff01812d50e2 to ffffff0170012000 size 1400, copy_cached 1
0 -> pagefault page fault on addr 0xffffff0170012000, fault type = 1
0 < kcopy_nta 14 uiop ffffff000498ea10 from ffffff01812d50e2 to ffffff0170012000
0 <- uiomove 14 uiop ffffff000498ea10 (UIOMOVE ERROR!)
0 <- socopyoutuio 0
0 <- so_dequeue_msg 14
0 <- so_recvmsg 14
0 <- socket_recvmsg 14
0 <- ksocket_recvmsg 14
0 <- idm_iov_sorecv 14
0 <- idm_sorecvdata 4
0 <- idm_sorecv_scsidata 4
0 -> idm_conn_event
0 -> idm_conn_event_handler
0 -> idm_so_conn_disconnect
The EFAULT error causes a connection failed status in idm, and the socket connection is closed.
The SCSI read command terminates with a transport error.
I can reproduce this very easily.
I would appreciate it if someone could give me some hints on how to investigate why this page fault
occurs.
Related issues