Bug #355
closedpfiles of a smbd process hangs
0%
Description
While testing changes to pfiles recently, I noticed that cd /proc; pfiles *
would hang. I've narrowed it down to pfiles of the smbd process that hangs pfiles $(pgrep -n smbd)
It doesn't seem to matter whether smbd is busy, and this used to work as recently as onnv_144.
It seems to me that while smbd is clearly provoking this, pfiles would ideally either grab the process and succeed, or fail to and fail, not wedge waiting for some event.
13010: pfiles 2181 fffffd7fff2d03ba write (82, fffffd7fffdff390, 18) fffffd7ffe7f5c25 Pstopstatus () + 14d fffffd7ffe7f5f7a Pstop () + 12 fffffd7ffe808a88 Pcreate_agent () + 40 0000000000402813 main () + 203 00000000004024bc ???????? () > ::pgrep pfiles | ::walk thread | ::findstack -v stack pointer for thread ffffff0232593060: ffffff000bae37a0 [ ffffff000bae37a0 _resume_from_idle+0xf1() ] ffffff000bae37d0 swtch+0x145() ffffff000bae3830 cv_wait_sig_swap_core+0x174(ffffff023fbf91f8, ffffff023fbf91f0, 0) ffffff000bae3850 cv_wait_sig_swap+0x18(ffffff023fbf91f8, ffffff023fbf91f0) ffffff000bae38c0 cv_waituntil_sig+0x13c(ffffff023fbf91f8, ffffff023fbf91f0, 0 , 0) ffffff000bae3900 pr_wait+0x32(ffffff023fbf91f0, 0, 0) ffffff000bae3980 pr_wait_stop+0x10e(ffffff028e660968, 0) ffffff000bae39f0 pr_control+0x4e4(4, ffffff000bae3a30, ffffff028e660968, ffffff02975a8330) ffffff000bae3d80 prwritectl+0x1da(ffffff03a3c11780, ffffff000bae3e90, ffffff02975a8330) ffffff000bae3de0 prwrite+0xa0(ffffff03a3c11780, ffffff000bae3e90, 0, ffffff02975a8330, 0) ffffff000bae3e50 fop_write+0x6b(ffffff03a3c11780, ffffff000bae3e90, 0, ffffff02975a8330, 0) ffffff000bae3f00 write+0x2e2(82, fffffd7fffdff390, 18) ffffff000bae3f10 sys_syscall+0x17a() ::pgrep smbd | ::walk thread | ::stacks THREAD STATE SOBJ COUNT ffffff02270ba060 STOPPED <NONE> 3 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_timedwait_sig_hires+0x288 cv_waituntil_sig+0xba nanosleep+0x120 sys_syscall32+0xff ffffff0226641440 STOPPED <NONE> 3 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_wait_sig_swap_core+0x1f2 cv_wait_sig_swap+0x18 cv_waituntil_sig+0x13c lwp_park+0x157 syslwp_park+0x31 sys_syscall32+0xff ffffff0226785be0 SLEEP SHUTTLE 2 swtch_to+0xe6 shuttle_resume+0x325 door_return+0x21a doorfs32+0x134 sys_syscall32+0xff ffffff023267b7e0 STOPPED <NONE> 2 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_timedwait_sig_hires+0x288 cv_waituntil_sig+0xba lwp_park+0x157 syslwp_park+0x31 sys_syscall32+0xff ffffff022642d180 STOPPED <NONE> 2 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_wait_sig+0x1cb so_dequeue_msg+0x2a4 so_recvmsg+0x1af socket_recvmsg+0x3d recvit+0xcf recvfrom+0xd5 recvfrom32+0x30 sys_syscall32+0xff ffffff02264c5020 SLEEP CV 1 swtch+0x145 cv_wait+0x61 smb_server_spooldoc+0x6f smb_drv_ioctl+0xd4 cdev_ioctl+0x45 spec_ioctl+0x5a fop_ioctl+0x7b ioctl+0x18e sys_syscall32+0xff ffffff0232ee7440 SLEEP SHUTTLE 1 swtch+0x145 shuttle_swtch+0x256 door_return+0x242 doorfs32+0x134 sys_syscall32+0xff ffffff023293e4c0 STOPPED <NONE> 1 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_wait_sig+0x1cb door_unref+0x89 doorfs32+0x82 sys_syscall32+0xff ffffff02279023a0 STOPPED <NONE> 1 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_wait_sig_swap_core+0x1f2 cv_wait_sig_swap+0x18 cv_waituntil_sig+0x13c poll_common+0x47f pollsys+0xea sys_syscall32+0xff ffffff0232489140 STOPPED <NONE> 1 swtch+0x145 stop+0x82e issig_forreal+0x20d issig+0x20 cv_wait_sig_swap_core+0x1f2 cv_wait_sig_swap+0x18 sigsuspend+0x107 sys_syscall32+0xff ::pgrep smbd | ::walk thread | ::stacks -C stop THREAD STATE SOBJ COUNT ffffff0226785be0 SLEEP SHUTTLE 2 swtch_to+0xe6 shuttle_resume+0x325 door_return+0x21a doorfs32+0x134 sys_syscall32+0xff ffffff02264c5020 SLEEP CV 1 swtch+0x145 cv_wait+0x61 smb_server_spooldoc+0x6f smb_drv_ioctl+0xd4 cdev_ioctl+0x45 spec_ioctl+0x5a fop_ioctl+0x7b ioctl+0x18e sys_syscall32+0xff ffffff0232ee7440 SLEEP SHUTTLE 1 swtch+0x145 shuttle_swtch+0x256 door_return+0x242 doorfs32+0x134 sys_syscall32+0xff
Updated by Yuri Pankov almost 11 years ago
- Difficulty set to Medium
- Tags set to needs-triage
Works for me:
sirius:root:~# pfiles $(pgrep -n smbd) 228693: /usr/lib/smbsrv/smbd start Current rlimit: 65536 file descriptors 0: S_IFCHR mode:0666 dev:523,0 ino:23068680 uid:0 gid:3 rdev:44,2 O_RDONLY|O_LARGEFILE /devices/pseudo/mm@0:null offset:0 [...] 18: S_IFDOOR mode:0777 dev:526,0 ino:0 uid:0 gid:0 size:0 O_RDWR FD_CLOEXEC door to svc.configd[100005] sirius:root:~#
Updated by Rich Lowe almost 11 years ago
It's working for me currently also, but I'm not aware of any changes that could have fixed it (perhaps Gordon knows of something that could have helped?).
My description here is perhaps overly cryptic, but if I recall correctly the theory was that pfiles was unable to sufficiently stop the victim to insert the agent thread (hence the list of threads which aren't in stop()
). It was reproducing easily for me (that much I do remember), but I'm not sure if it needed, eg, a share from that system to be mounted (even if idle), or anything like that.
Gordon had mentioned in conversation that he'd seen smb end up in similar states previously, I think due to quirks of its door use, but I can't recall precisely why/how, and might be misremembering.
Updated by Gordon Ross almost 11 years ago
- Status changed from New to Closed
I think this was fixed by the spoolss thread changes:
#1039 /usr/lib/smbsrv/smbd prevents suspend
There may still be problems with pfiles in smbd, depending on
the state of smbd and its door upcalls, but let's just close this
until we see a reproducible case.