Project

General

Profile

Bug #5653

NFS server hang with sec=dh and keyserv down

Added by Marcel Telka almost 6 years ago. Updated over 5 years ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
nfs - NFS server and client
Start date:
2015-02-24
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

In a case the svc:/network/rpc/keyserv:default service is down, the NFS server will hang for long time for DH shares attempting to connect with (non-existing) keyserv:

> ::stacks -m nfs
THREAD           STATE    SOBJ                COUNT
ffffff00cfbe3040 SLEEP    CV                      2
                 swtch+0x141
                 cv_wait_sig+0x185
                 svc_wait+0xb7
                 nfssys+0xb1
                 _sys_sysenter_post_swapgs+0x149

ffffff00cfcc53a0 SLEEP    CV                      1
                 swtch+0x141
                 cv_timedwait_sig_hires+0x39d
                 cv_timedwait_sig+0x49
                 clnt_clts_kcallit_addr+0x66e
                 clnt_clts_kcallit+0x3f
                 key_call+0x201
                 key_decryptsession+0x49
                 _svcauth_des+0x201
                 sec_svc_msg+0x104
                 svc_getreq+0x193
                 svc_run+0x146
                 svc_do_run+0x8e
                 nfssys+0xf1
                 _sys_sysenter_post_swapgs+0x149

>

Steps to reproduce:

  1. Properly setup a DH share on a server
  2. svcadm disable keyserv on the server
  3. Try to mount the share from a client
#1

Updated by Bob Lu over 5 years ago

Hi Marcel,

does the connection get timeout? I met same issue as you described, why it hangs for long(forever) than the timeout I set?
does cv_timedwait_sig_hires work as expected? I think it should wake up when the timeout passed?

ffffff0199365b40::findstack -v

: stack pointer for thread ffffff0199365b40: ffffff0006c2ea30
[ ffffff0006c2ea30 _resume_from_idle+0xf1() ]
ffffff0006c2ea60 swtch+0x145()
ffffff0006c2eb00 cv_timedwait_sig_hires+0x39d(ffffff019936f910, ffffff019936f7d8, 2540be400, 989680, 0)
ffffff0006c2eb30 cv_reltimedwait_sig+0x4f(ffffff019936f910, ffffff019936f7d8, 3e8, 4)
ffffff0006c2ec10 so_dequeue_msg+0x290(ffffff019936f7b8, ffffff0006c2ec78, ffffff0006c2ee10, ffffff0006c2ec80, 0)
ffffff0006c2ece0 so_recvmsg+0x1af(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ed20 socket_recvmsg+0x3d(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ede0 recvit+0xcf(16, ffffff0006c2ee40, ffffff0006c2ee10, 0, 0, 0, 0)
ffffff0006c2ee90 recv+0x7a(16, 8300448, 200, 0)
ffffff0006c2eec0 recv32+0x22(16, 8300448, 200, 0)
ffffff0006c2ef10 _sys_sysenter_post_swapgs+0x149()

#2

Updated by Bob Lu over 5 years ago

Bob Lu wrote:

Hi Marcel,

does the connection get timeout? I met same issue as you described, why it hangs for long(forever) than the timeout I set?
does cv_timedwait_sig_hires work as expected? I think it should wake up when the timeout passed?

ffffff0199365b40::findstack -v

: stack pointer for thread ffffff0199365b40: ffffff0006c2ea30
[ ffffff0006c2ea30 _resume_from_idle+0xf1() ]
ffffff0006c2ea60 swtch+0x145()
ffffff0006c2eb00 cv_timedwait_sig_hires+0x39d(ffffff019936f910, ffffff019936f7d8, 2540be400, 989680, 0)
ffffff0006c2eb30 cv_reltimedwait_sig+0x4f(ffffff019936f910, ffffff019936f7d8, 3e8, 4)
ffffff0006c2ec10 so_dequeue_msg+0x290(ffffff019936f7b8, ffffff0006c2ec78, ffffff0006c2ee10, ffffff0006c2ec80, 0)
ffffff0006c2ece0 so_recvmsg+0x1af(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ed20 socket_recvmsg+0x3d(ffffff019936f7b8, ffffff0006c2ee40, ffffff0006c2ee10, ffffff01964aa698)
ffffff0006c2ede0 recvit+0xcf(16, ffffff0006c2ee40, ffffff0006c2ee10, 0, 0, 0, 0)
ffffff0006c2ee90 recv+0x7a(16, 8300448, 200, 0)
ffffff0006c2eec0 recv32+0x22(16, 8300448, 200, 0)
ffffff0006c2ef10 _sys_sysenter_post_swapgs+0x149()

BTW, the timeout here for the socket is 10s, and the recv hans for long time, and I guess it will hang for ever.

#3

Updated by Marcel Telka over 5 years ago

Bob Lu, your stacks are not related to this bug. I believe cv_timedwait_sig_hires() works as expected. This bug is in some other area.

Also available in: Atom PDF