Project

General

Profile

Bug #6785

nfs4_attr_cache deadlock

Added by S. Klinkert over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
nfs - NFS server and client
Start date:
2016-03-23
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

It looks like we've encountered a deadlock between nfs4_inactive_thread and setattr.

> ffffff00416c5c40::findstack -v
stack pointer for thread ffffff00416c5c40: ffffff00416c5200
[ ffffff00416c5200 _resume_from_idle+0xf4() ]
  ffffff00416c5230 swtch+0x141()
  ffffff00416c5270 cv_wait+0x70(ffffff0956868d1c, ffffff0956868bf8)
  ffffff00416c52e0 cv_wait_sig+0x82(ffffff0956868d1c, ffffff0956868bf8)
  ffffff00416c5390 nfs4_attr_cache+0x120(ffffff74f3029180, ffffff25a083ff98, 572d621bee4368, ffffff9630d8f580, 1, 0)
  ffffff00416c5690 nfs4close_otw+0x318(ffffff0956868b60, ffffff9630d8f580, ffffff0958b55960, ffffff095fc60a80, ffffff00416c589c, ffffff00416c5894  , 2, ffffff00416c5950, ffffff00416c5890)
  ffffff00416c5900 nfs4close_one+0x824(ffffff74f3029180, ffffff095fc60a80, ffffff09083bcdb0, 0, 0, ffffff00416c5950, feffdc1800000002, 0, 0,  ffffff0900000000)
  ffffff00416c59a0 nfs4close_all+0x162(ffffff74f3029180, ffffff09083bcdb0)
  ffffff00416c5b90 nfs4_inactive_otw+0x3e8(ffffff74f3029180, ffffff09083bcdb0)
  ffffff00416c5c20 nfs4_inactive_thread+0xca(ffffff09254f8000)
  ffffff00416c5c30 thread_start+8()

Is waiting for rp->r_cv in nfs4_attr_cache()

Another thread with the same rnode in the stack trace:

> ffffff0956868b60::print -t rnode4_t r_serial
kthread_t *r_serial = 0xffffff0921ef9520
> ffffff0921ef9520::findstack -v
stack pointer for thread ffffff0921ef9520: ffffff004f6f0d60
[ ffffff004f6f0d60 _resume_from_idle+0xf4() ]
  ffffff004f6f0d90 swtch+0x141()
  ffffff004f6f0e30 turnstile_block+0x21a(0, 0, ffffff095fc60af8, fffffffffbc079a0, 0, 0)
  ffffff004f6f0ea0 mutex_vector_enter+0x3a3(ffffff095fc60af8)
  ffffff004f6f0f50 nfs4_get_otw_cred_by_osp+0xb1(ffffff0956868b60, ffffff09083bcdb0, ffffff004f6f10e8, ffffff004f6f10f8, ffffff004f6f10f4)
  ffffff004f6f1150 nfs4_commit+0xb0(ffffff74f3029180, 0, 130000, ffffff09083bcdb0)
  ffffff004f6f1210 nfs4_dispose+0x42c(ffffff74f3029180, ffffff00321bd6d0, 10000, 0, ffffff09083bcdb0, 0)
  ffffff004f6f12a0 fop_dispose+0x91(ffffff74f3029180, ffffff00321bd6d0, 10000, 0, ffffff09083bcdb0, 0)
  ffffff004f6f12e0 pvn_getdirty+0xe7(ffffff00321bd6d0, 10000)
  ffffff004f6f13b0 pvn_vplist_dirty+0x258(ffffff74f3029180, 0, fffffffff8156ef0, 10000, ffffff0e7398d1a0)
  ffffff004f6f1460 nfs4_putpages+0x298(ffffff74f3029180, 0, 0, 10000, ffffff0e7398d1a0)
  ffffff004f6f14f0 nfs4_putpage+0xa9(ffffff74f3029180, 0, 0, 10000, ffffff0e7398d1a0, 0)
  ffffff004f6f1570 fop_putpage+0x4c(ffffff74f3029180, 0, 0, 10000, ffffff0e7398d1a0, 0)
  ffffff004f6f15b0 nfs4_flush_pages+0x30(ffffff74f3029180, ffffff0e7398d1a0)
  ffffff004f6f1640 nfs4_purge_caches+0x1cb(ffffff74f3029180, 0, ffffff0e7398d1a0, 0)
  ffffff004f6f16f0 nfs4_attr_cache+0x2b9(ffffff74f3029180, ffffff23c9bbf9c0, 572d621c5cc7c2, ffffff0e7398d1a0, 0, 0)
  ffffff004f6f1740 nfs4_update_attrcache+0x41(0, ffffff23c9bbf9c0, 572d621c5cc7c2, ffffff74f3029180, ffffff0e7398d1a0)
  ffffff004f6f1c10 nfs4setattr+0x70a(ffffff74f3029180, ffffff004f6f1db0, 1, ffffff0e7398d1a0, 0)
  ffffff004f6f1c80 nfs4_setattr+0x8b(ffffff74f3029180, ffffff004f6f1db0, 1, ffffff0e7398d1a0, 0)
  ffffff004f6f1d00 fop_setattr+0xa8(ffffff74f3029180, ffffff004f6f1db0, 1, ffffff0e7398d1a0, 0)
  ffffff004f6f1da0 cfutimesat+0x213(ffd19553, 8883170, 1, ffffff004f6f1db0, 1, 1)
  ffffff004f6f1e80 utimensat+0x74(ffd19553, 8883170, 8046dd0, 0)
  ffffff004f6f1ec0 utimesys+0x62(1, ffd19553, 8883170, 8046dd0, 0)
  ffffff004f6f1f10 _sys_sysenter_post_swapgs+0x149()

Waiting for mutex_enter(&next_osp->os_sync_lock) which is held by the first thread:

> ffffff095fc60af8::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff095fc60af8 adapt ffffff00416c5c40      -      -     yes
#1

Updated by Marcel Telka over 4 years ago

Would you mind sharing the crash dump file? Thanks.

#2

Updated by S. Klinkert over 4 years ago

Marcel Telka wrote:

Would you mind sharing the crash dump file? Thanks.

Sorry, I can't share the whole crashdump because it contains lots of customer related data but maybe I can take a look for you if you tell me what you are looking for...

#3

Updated by S. Klinkert over 4 years ago

The same bug happened on another machine:

> ffffff0925919500::findstack -v
stack pointer for thread ffffff0925919500: ffffff005980c5f0
[ ffffff005980c5f0 _resume_from_idle+0xf4() ]
  ffffff005980c620 swtch+0x141()
  ffffff005980c690 cv_wait_sig+0x185(ffffff095c10ed5c, ffffff095c10ec38)
  ffffff005980c740 nfs4_attr_cache+0x120(ffffff9dd0bab080, ffffffef512b3a58, 55e4b4a7369c5e, ffffffd6154686c0, 1, 0)
  ffffff005980ca40 nfs4close_otw+0x318(ffffff095c10eba0, ffffffd6154686c0, ffffff095488b320, ffffff3e21879780, ffffff005980cc4c, ffffff005980cc44, 0, ffffff005980cd00,
  ffffff005980cc40)
  ffffff005980ccb0 nfs4close_one+0x824(ffffff9dd0bab080, 0, ffffffd6154686c0, 2302, 0, ffffff005980cd00, 0, 0, 0, 0)
  ffffff005980cd70 nfs4_close+0x258(ffffff9dd0bab080, 2302, 1, f8000, ffffffd6154686c0, 0)
  ffffff005980cdf0 fop_close+0x61(ffffff9dd0bab080, 2302, 1, f8000, ffffffd6154686c0, 0)
  ffffff005980ce30 closef+0x5e(ffffff519f24e580)
  ffffff005980cea0 closeandsetf+0x398(f, 0)
  ffffff005980cec0 close+0x13(f)
  ffffff005980cf10 _sys_sysenter_post_swapgs+0x149()

> ffffff9dd0bab080::print -t vnode_t v_data  |::print -t rnode4_t r_serial
kthread_t *r_serial = 0xffffff094fba2ae0

> 0xffffff094fba2ae0::findstack -v
stack pointer for thread ffffff094fba2ae0: ffffff004c4ebfb0
[ ffffff004c4ebfb0 _resume_from_idle+0xf4() ]
  ffffff004c4ebfe0 swtch+0x141()
  ffffff004c4ec080 turnstile_block+0x21a(0, 0, ffffff3e218797f8, fffffffffbc079a0, 0, 0)
  ffffff004c4ec0f0 mutex_vector_enter+0x3a3(ffffff3e218797f8)
  ffffff004c4ec1a0 nfs4_get_otw_cred_by_osp+0xb1(ffffff095c10eba0, ffffff09083bddb0, ffffff004c4ec338, ffffff004c4ec348, ffffff004c4ec344)
  ffffff004c4ec3a0 nfs4_commit+0xb0(ffffff9dd0bab080, 0, a0000, ffffff09083bddb0)
  ffffff004c4ec460 nfs4_dispose+0x42c(ffffff9dd0bab080, ffffff000b753f80, 10000, 0, ffffff09083bddb0, 0)
  ffffff004c4ec4f0 fop_dispose+0x91(ffffff9dd0bab080, ffffff000b753f80, 10000, 0, ffffff09083bddb0, 0)
  ffffff004c4ec530 pvn_getdirty+0xe7(ffffff000b753f80, 30000)
  ffffff004c4ec600 pvn_vplist_dirty+0x258(ffffff9dd0bab080, 0, fffffffff815def0, 30000, ffffffbb309da110)
  ffffff004c4ec6b0 nfs4_putpages+0x298(ffffff9dd0bab080, 0, 0, 10000, ffffffbb309da110)
  ffffff004c4ec740 nfs4_putpage+0xa9(ffffff9dd0bab080, 0, 0, 10000, ffffffbb309da110, 0)
  ffffff004c4ec7c0 fop_putpage+0x4c(ffffff9dd0bab080, 0, 0, 10000, ffffffbb309da110, 0)
  ffffff004c4ec800 nfs4_flush_pages+0x30(ffffff9dd0bab080, ffffffbb309da110)
  ffffff004c4ec890 nfs4_purge_caches+0x1cb(ffffff9dd0bab080, 0, ffffffbb309da110, 0)
  ffffff004c4ec940 nfs4_attr_cache+0x2b9(ffffff9dd0bab080, ffffff004c4eca10, 55e4b4a736aee9, ffffffbb309da110, 0, 0)
  ffffff004c4eca00 nfs4_getattr_otw+0x1b6(ffffff9dd0bab080, ffffff004c4eca10, ffffffbb309da110, 0)
  ffffff004c4ecb60 nfs4getattr+0xc7(ffffff9dd0bab080, ffffff004c4ecd00, ffffffbb309da110)
  ffffff004c4ecbe0 nfs4_getattr+0x95(ffffff9dd0bab080, ffffff004c4ecd00, 0, ffffffbb309da110, 0)
  ffffff004c4ecc60 fop_getattr+0xa8(ffffff9dd0bab080, ffffff004c4ecd00, 0, ffffffbb309da110, 0)
  ffffff004c4ecdd0 cstat64_32+0x4a(ffffff9dd0bab080, 8047790, 0, ffffffbb309da110)
  ffffff004c4ece70 cstatat64_32+0x87(ffd19553, 8669b08, 8047790, 1000, 0)
  ffffff004c4ecea0 fstatat64_32+0x42(ffd19553, 8669b08, 8047790, 1000)
  ffffff004c4ecec0 lstat64_32+0x25(8669b08, 8047790)
  ffffff004c4ecf10 _sys_sysenter_post_swapgs+0x149()

> ffffff3e218797f8::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff3e218797f8 adapt ffffff0925919500      -      -     yes
#4

Updated by S. Klinkert over 4 years ago

Resultant problem: The nfs4_ephemeral_harvester thread is sticking as well due to the deadlock. After a while, we have over 30,000 inactive nfs mount points (and therefore 30,000 kernel threads) and the system is running out of memory.

#5

Updated by Electric Monk about 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 8395f20c6509566b33c66c5c8417be08f7455053

commit  8395f20c6509566b33c66c5c8417be08f7455053
Author: Simon Klinkert <simon.klinkert@gmail.com>
Date:   2016-10-23T15:42:54.000Z

    6785 nfs4_attr_cache deadlock
    Reviewed by: Marcel Telka <marcel@telka.sk>
    Approved by: Gordon Ross <gwr@nexenta.com>

Also available in: Atom PDF