Bug #5653
openNFS server hang with sec=dh and keyserv down
0%
Description
In a case the svc:/network/rpc/keyserv:default service is down, the NFS server will hang for long time for DH shares attempting to connect with (non-existing) keyserv:
> ::stacks -m nfs THREAD STATE SOBJ COUNT ffffff00cfbe3040 SLEEP CV 2 swtch+0x141 cv_wait_sig+0x185 svc_wait+0xb7 nfssys+0xb1 _sys_sysenter_post_swapgs+0x149 ffffff00cfcc53a0 SLEEP CV 1 swtch+0x141 cv_timedwait_sig_hires+0x39d cv_timedwait_sig+0x49 clnt_clts_kcallit_addr+0x66e clnt_clts_kcallit+0x3f key_call+0x201 key_decryptsession+0x49 _svcauth_des+0x201 sec_svc_msg+0x104 svc_getreq+0x193 svc_run+0x146 svc_do_run+0x8e nfssys+0xf1 _sys_sysenter_post_swapgs+0x149 >
Steps to reproduce:
- Properly setup a DH share on a server
svcadm disable keyserv
on the server- Try to mount the share from a client
Updated by Bob Lu about 8 years ago
Hi Marcel,
does the connection get timeout? I met same issue as you described, why it hangs for long(forever) than the timeout I set?
does cv_timedwait_sig_hires work as expected? I think it should wake up when the timeout passed?
ffffff0199365b40::findstack -v
: stack pointer for thread ffffff0199365b40: ffffff0006c2ea30
[ ffffff0006c2ea30 _resume_from_idle+0xf1() ]
ffffff0006c2ea60 swtch+0x145()
ffffff0006c2eb00 cv_timedwait_sig_hires+0x39d(ffffff019936f910, ffffff019936f7d8, 2540be400, 989680, 0)
ffffff0006c2eb30 cv_reltimedwait_sig+0x4f(ffffff019936f910, ffffff019936f7d8, 3e8, 4)
ffffff0006c2ec10 so_dequeue_msg+0x290(ffffff019936f7b8, ffffff0006c2ec78, ffffff0006c2ee10, ffffff0006c2ec80, 0)
ffffff0006c2ece0 so_recvmsg+0x1af(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ed20 socket_recvmsg+0x3d(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ede0 recvit+0xcf(16, ffffff0006c2ee40, ffffff0006c2ee10, 0, 0, 0, 0)
ffffff0006c2ee90 recv+0x7a(16, 8300448, 200, 0)
ffffff0006c2eec0 recv32+0x22(16, 8300448, 200, 0)
ffffff0006c2ef10 _sys_sysenter_post_swapgs+0x149()
Updated by Bob Lu about 8 years ago
Bob Lu wrote:
Hi Marcel,
does the connection get timeout? I met same issue as you described, why it hangs for long(forever) than the timeout I set?
does cv_timedwait_sig_hires work as expected? I think it should wake up when the timeout passed?ffffff0199365b40::findstack -v
: stack pointer for thread ffffff0199365b40: ffffff0006c2ea30
[ ffffff0006c2ea30 _resume_from_idle+0xf1() ]
ffffff0006c2ea60 swtch+0x145()
ffffff0006c2eb00 cv_timedwait_sig_hires+0x39d(ffffff019936f910, ffffff019936f7d8, 2540be400, 989680, 0)
ffffff0006c2eb30 cv_reltimedwait_sig+0x4f(ffffff019936f910, ffffff019936f7d8, 3e8, 4)
ffffff0006c2ec10 so_dequeue_msg+0x290(ffffff019936f7b8, ffffff0006c2ec78, ffffff0006c2ee10, ffffff0006c2ec80, 0)
ffffff0006c2ece0 so_recvmsg+0x1af(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ed20 socket_recvmsg+0x3d(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ede0 recvit+0xcf(16, ffffff0006c2ee40, ffffff0006c2ee10, 0, 0, 0, 0)
ffffff0006c2ee90 recv+0x7a(16, 8300448, 200, 0)
ffffff0006c2eec0 recv32+0x22(16, 8300448, 200, 0)
ffffff0006c2ef10 _sys_sysenter_post_swapgs+0x149()
BTW, the timeout here for the socket is 10s, and the recv hans for long time, and I guess it will hang for ever.
Updated by Marcel Telka about 8 years ago
Bob Lu, your stacks are not related to this bug. I believe cv_timedwait_sig_hires() works as expected. This bug is in some other area.