Project

General

Profile

Bug #401

sdcard hang

Added by Garrett D'Amore about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Category:
kernel
Start date:
2010-11-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Tags:
Gerrit CR:

Description

This is detailed in OSOL bug 6975215:

We have a toshiba portege R600 laptop on which commands such as format
are hanging hard when accessing i/o devices.

This has started happening in snv_145 and didn't happen on snv_144 so
it looks like a new issue.

e.g.
Format is hanging just after 'Searching for..'
Looking at truss, it hangs trying to close /dev/rdsk/c7t0d0s2

Here is the

Searching for disks...967: write(1, " S e a r c h i n g f o".., 22) = 22
967: open(".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3
967: fcntl(3, F_SETFD, 0x00000001) = 0
967: fstat64(3, 0x08047050) = 0
967: getdents64(3, 0xFEE04000, 8192) = 2232
967: sysconfig(_CONFIG_PAGESIZE) = 4096
967: getdents64(3, 0xFEE04000, 8192) = 0
967: llseek(3, 0, SEEK_CUR) = 65
967: llseek(3, 0, SEEK_SET) = 0
967: getdents64(3, 0xFEE04000, 8192) = 2232
967: open("/dev/rdsk/c7t0d0s2", O_RDWR|O_NDELAY) = 4
967: fstat(4, 0x08046B20) = 0
967: ioctl(4, DKIOCINFO, 0x08046BB0) = 0
967: ioctl(4, DKIOCREMOVABLE, 0x08046B1C) = 0
967: close(4) = 0

Looking in /dev/

/dev/rdsk/c7t0d0s2 -> ../../devices/pci@0,0/pci1179,1@1d,7/storage@3/disk@0,0:c,raw

which looks like a usb device

Aug 5 17:27:01 oaf634 genunix: [ID 936769 kern.info] scsa2usb0 is /pci@0,0/pci1179,1@1d,7/storage@3
Aug 5 17:27:01 oaf634 genunix: [ID 408114 kern.info] /pci@0,0/pci1179,1@1d,7/storage@3 (scsa2usb0) online
Aug 5 17:27:01 oaf634 genunix: [ID 936769 kern.info] sd1 is /pci@0,0/pci1179,1@1d,7/storage@3/disk@0,0
Aug 5 17:27:01 oaf634 genunix: [ID 408114 kern.info] /pci@0,0/pci1179,1@1d,7/storage@3/disk@0,0 (sd1) online

Given there are usb changes in 145 i'll assign this bug on there
as a starting point.

A crash dump of a system with multiple hung processes can be found here:

/net/irperf.ireland/export/work/crash_images/Fri.Aug.06.2010.oaf463.vdbench407.osol_1002-144
The crash dump above isn't accessible:
root@opensolaris:~# ls /net/irperf.ireland/export/work/crash_images/Fri.Aug.06.2010.oaf463.vdbench407.osol_1002-144
ls: cannot access /net/irperf.ireland/export/work/crash_images/Fri.Aug.06.2010.oaf463.vdbench407.osol_1002-144: No such file or directory
root@opensolaris:~#

Could you please fix it. Thanks
Sorry vincent,

we've recently moved these to a different host, the link on the only isn't working
it seems.
Can you try this link instead? (the path at the end of the last one was wrong also)

/net/irperf4.ireland/export/work/crash_images/Thu.Aug.05.2010.oaf634..osol_1002-145/
Your truss results show that close(4) has completed and the return value is 0:

967: open("/dev/rdsk/c7t0d0s2", O_RDWR|O_NDELAY) = 4
967: fstat(4, 0x08046B20) = 0
967: ioctl(4, DKIOCINFO, 0x08046BB0) = 0
967: ioctl(4, DKIOCREMOVABLE, 0x08046B1C) = 0
967: close(4) = 0

There're a couple of files under /net/irperf4.ireland/export/work/crash_images/Thu.Aug.05.2010.oaf634..osol_1002-145/, I assumed that it was vmcore.0.gz and unix.0.gz. Looking at this crash dump there's no suspicious usb thread running. The format command is doing:
ffffff0191d218a0 ffffff019f722090 ffffff01832be840 1 60 ffffff0184737710
PC: _resume_from_idle+0xf1 CMD: format
stack pointer for thread ffffff0191d218a0: ffffff00067cf830
[ ffffff00067cf830 _resume_from_idle+0xf1() ]
swtch+0x145()
turnstile_block+0x760()
mutex_vector_enter+0x261()
bd_open+0x9d()
dev_open+0x3c()
spec_open+0x5dc()
fop_open+0xbf()
vn_openat+0x6ce()
copen+0x49e()
openat32+0x27()
open32+0x2e()
_sys_sysenter_post_swapgs+0x149()

ffffff0191d218a0::findstack -v

stack pointer for thread ffffff0191d218a0: ffffff00067cf830
[ ffffff00067cf830 _resume_from_idle+0xf1() ]
ffffff00067cf860 swtch+0x145()
ffffff00067cf910 turnstile_block+0x760(ffffff0186453158, 0, ffffff0184737710, fffffffffbc07df8, 0, 0)
ffffff00067cf970 mutex_vector_enter+0x261(ffffff0184737710) <=====this mutex blocks 'format'
ffffff00067cf9f0 bd_open+0x9d(ffffff00067cfa78, 7, 2, ffffff01983f6dd8)
ffffff00067cfa20 dev_open+0x3c(ffffff00067cfa78, 7, 2, ffffff01983f6dd8)
ffffff00067cfad0 spec_open+0x5dc(ffffff00067cfbc0, 7, ffffff01983f6dd8, 0)
ffffff00067cfb40 fop_open+0xbf(ffffff00067cfbc0, 7, ffffff01983f6dd8, 0)
ffffff00067cfcf0 vn_openat+0x6ce(8047560, 0, 7, b2c, ffffff00067cfd48, 0, 12, 0, 4)
ffffff00067cfe60 copen+0x49e(ffd19553, 8047560, 7, 8046b2c)
ffffff00067cfe90 openat32+0x27(ffd19553, 8047560, 6, 8046b2c)
ffffff00067cfec0 open32+0x2e(8047560, 6, 8046b2c)
ffffff00067cff10 _sys_sysenter_post_swapgs+0x149()

ffffff0184737710::mutex

ADDR TYPE HELD MINSPL OLDSPL WAITERS
ffffff0184737710 adapt ffffff0185fa1760 - - yes

So thread ffffff0185fa1760 is holding that mutex.

ffffff0185fa1760::findstack -v

stack pointer for thread ffffff0185fa1760: ffffff0006624850
[ ffffff0006624850 _resume_from_idle+0xf1() ]
ffffff0006624880 swtch+0x145()
ffffff00066248b0 cv_wait+0x61(ffffff01915adeb8, ffffff01915adeb0)
ffffff00066248f0 sda_slot_enter+0x55(ffffff01915addc0)
ffffff0006624920 sda_mem_bd_mediainfo+0x1f(ffffff01915addc0, ffffff0006624930)
ffffff0006624970 bd_update_state+0x49(ffffff0184737700)
ffffff00066249f0 bd_open+0xb1(ffffff0006624a78, 2081, 2, ffffff0189d4d8b0)
ffffff0006624a20 dev_open+0x3c(ffffff0006624a78, 2081, 2, ffffff0189d4d8b0)
ffffff0006624ad0 spec_open+0x5dc(ffffff0006624bc0, 2081, ffffff0189d4d8b0, 0)
ffffff0006624b40 fop_open+0xbf(ffffff0006624bc0, 2081, ffffff0189d4d8b0, 0)
ffffff0006624cf0 vn_openat+0x6ce(8047bc5, 0, 2081, 182, ffffff0006624d48, 0, 12, 0, 4)
ffffff0006624e60 copen+0x49e(ffd19553, 8047bc5, 2081, 182)
ffffff0006624e90 openat64+0x2d(ffd19553, 8047bc5, 80, 182)
ffffff0006624ec0 open64+0x2e(8047bc5, 80, 182)
ffffff0006624f10 _sys_sysenter_post_swapgs+0x149()

it stops inside sda_slot_enter(), in usr/src/uts/common/io/sdcard/impl/sda_slot.c
It's the sdcard driver who is blocking the format command.

I'm transferring it to the sdcard driver for a further look.
Confirming for the bug:

I reinstalled osol_1002-144 on this laptop and confirmed this problem
is not seen there. It is new in 145.

#1

Updated by Garrett D'Amore about 10 years ago

The problem is that sda_slot_enter is not dropped before calling the bd driver, and there is a deadlock due to ordering of blkdev open/close lock and the sda slot lock.

Fortunately, the solution to this is fairly straight-forward.

#3

Updated by Garrett D'Amore about 10 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Fixed in 9019c04b0003

Also available in: Atom PDF