Bug #3629
closedNFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429
100%
Description
There is a kernel panic due to NFS i/o processing.
vmdump is available (probably, with debug info, unfortunately I don't remember it exactly)
some info from mdb:
witch# mdb -k unix.9 vmcore.9 Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba sd stmf stmf_sbd s1394 fctl lofs sata random nfs idm cpc fcip crypto sppp ufs logindmux ptm ipc ] > ::status debugging crash dump vmcore.9 (64-bit) from witch operating system: 5.11 oi_dev (i86pc) image uuid: 53214ce9-0ff3-48eb-a21e-edc36b25b361 panic message: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420 dump content: kernel pages only > ::stack vpanic() assfail+0x89(fffffffffbfa3b78, fffffffffbfa3de0, 974) vn_free+0x111(ffffff039d062300) destroy_rnode4+0x66(ffffff01c3e00038) rp4_addfree+0x16e(ffffff01c3e00038, ffffff01c8c58c68) nfs4_inactive_otw+0x4f9(ffffff039d062300, ffffff01c8c58c68) nfs4_inactive_thread+0xb5(ffffffa6649d6000) thread_start+8() > ::panicinfo cpu 0 thread ffffff0005d13c40 message assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420 rdi fffffffffbf87248 rsi ffffff0005d138e0 rdx fffffffffbfa3de0 rcx ffffff0005d13920 r8 ffffff01ce9407a9 r9 0 rax ffffff0005d13830 rbx fffffffffbfa3de0 rbp ffffff0005d13910 r10 ffffff04d1d310a8 r10 ffffff04d1d310a8 r11 7 r12 fffffffffbfa3b78 r13 974 r14 ffffffa6649d6148 r15 ffffffa4d4c555b8 fsbase 0 gsbase fffffffffbc3e420 ds 4b es 4b fs 0 gs 1c3 trapno 0 err 0 rip fffffffffb86b760 cs 30 rflags 246 rsp ffffff0005d13828 ss 38 gdt_hi 0 gdt_lo e00001ef idt_hi 0 idt_lo d0000fff ldt 0 task 70 cr0 8005003b cr2 8046cbc cr3 5400000 cr4 6f8
The file is usr/src/uts/common/fs/vnode.c:
void vn_free(vnode_t *vp) { ASSERT(vp->v_shrlocks == NULL); ASSERT(vp->v_filocks == NULL); ...
Related issues
Updated by Marcel Telka over 10 years ago
- Category set to nfs - NFS server and client
Could you please make the crash dump file available? Thanks.
Updated by Denis Kozadaev over 10 years ago
yes, of course
http://silversoft.net/~denis/vmdump.9
Updated by Marcel Telka over 10 years ago
This is the vnode that caused the panic (please note v_filocks):
> ffffff039d062300::print vnode_t { v_lock = { _opaque = [ 0 ] } v_flag = 0x10002 v_count = 0x1 v_data = 0xffffff01c3e00038 v_vfsp = 0xffffff01ad7d6930 v_stream = 0 v_type = 1 (VREG) v_rdev = 0 v_vfsmountedhere = 0 v_op = 0xffffff01a9697c40 v_pages = 0 v_filocks = 0xffffffef2b3b6700 v_shrlocks = 0 v_nbllock = { _opaque = [ 0 ] } v_cv = { _opaque = 0 } v_locality = 0 v_femhead = 0 v_path = 0xffffff01a5cb9b28 "/home/andrej/foxeye/atest.html" v_rdcnt = 0 v_wrcnt = 0 v_mmap_read = 0 v_mmap_write = 0 v_mpssdata = 0 v_fopdata = 0 v_vsd_lock = { _opaque = [ 0 ] } v_vsd = 0 v_xattrdir = 0 v_count_dnlc = 0 } >
The machine panicked at 2013 Mar 15 14:19:05:
> time/Y time: time: 2013 Mar 15 14:19:05 >
There are three mounted nfs filesystems:
> ::fsinfo!grep nfs ffffff01ad7d6760 nfs /nfs ffffff01ad7d6590 nfs /home/denis ffffff01ad7d6930 nfs /home/andrej >
and we have several recovery related messages logged around the panic time. This is for the /nfs:
> ffffff01ad7d6760::print vfs_t vfs_data|::print mntinfo4_t mi_msg_list|::nfs4_mimsg ! tail [NFS4]2013 Mar 15 11:56:09: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:09: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:05: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:05: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) >
This is for /home/denis:
> ffffff01ad7d6590::print vfs_t vfs_data|::print mntinfo4_t mi_msg_list|::nfs4_mimsg ! tail [NFS4]2013 Mar 15 11:56:09: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:09: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:05: File ./.xsession-errors(rnode_pt ffffff01cee9c440) on server fucker could not be recovered and was closed. Couldn't reopen: NFSv4 error NFS4ERR_OLD_STATEID. [NFS4]2013 Mar 15 14:19:05: Op OP_OPEN got error NFS4ERR_OLD_STATEID causing recovery action NR_OLDSTATEID. [NFS4]2013 Mar 15 14:19:05: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) >
And this is for /home/andrej:
> ffffff01ad7d6930::print vfs_t vfs_data|::print mntinfo4_t mi_msg_list|::nfs4_mimsg ! tail [NFS4]2013 Mar 15 11:57:09: Starting recovery for mount /home/andrej (0xffffffa6649d6000, flags 0x1) on server fucker, rnode_pt1 ./foxeye/foxeye.debug (0xffffff0499206030), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:10: Recovery done for mount /home/andrej (0xffffffa6649d6000) on server fucker, rnode_pt1 ./foxeye/foxeye.debug (0xffffff0499206030), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Op OP_RENEW got error NFS4ERR_EXPIRED causing recovery action NR_CLIENTID. [NFS4]2013 Mar 15 14:19:04: Starting recovery for mount /home/andrej (0xffffffa6649d6000, flags 0x1) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Couldn't reclaim lock for pid 151483 for file ./foxeye/logs/ugolok (rnode_pt ffffff04c39fc400) on (server fucker): error 10033 [NFS4]2013 Mar 15 14:19:04: Process 151483 lost its locks on file ./foxeye/logs/ugolok (rnode_pt ffffff04c39fc400) due to a NFS error (10033) on server fucker [NFS4]2013 Mar 15 14:19:05: File ./foxeye/atest.html(rnode_pt ffffff01c3e00038) on server fucker could not be recovered and was closed. Couldn't reopen: NFSv4 error NFS4ERR_OLD_STATEID. [NFS4]2013 Mar 15 14:19:05: Op OP_OPEN got error NFS4ERR_OLD_STATEID causing recovery action NR_OLDSTATEID. [NFS4]2013 Mar 15 14:19:05: Op OP_OPEN got error NFS4ERR_EXPIRED causing recovery action NR_CLIENTID. [NFS4]2013 Mar 15 14:19:05: Recovery done for mount /home/andrej (0xffffffa6649d6000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) >
From the log we see that the recovery for /home/andrej/foxeye/atest.html failed (the file was closed). Immediately after that we panicked.
Updated by Piotr Jasiukajtis over 10 years ago
Denis Kozadaev wrote:
There is a kernel panic due to NFS i/o processing.
vmdump is available (probably, with debug info, unfortunately I don't remember it exactly)
some info from mdb:
witch# mdb -k unix.9 vmcore.9
Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba sd stmf stmf_sbd s1394 fctl lofs sata random nfs idm cpc fcip crypto sppp ufs logindmux ptm ipc ]::status
debugging crash dump vmcore.9 (64-bit) from witch
operating system: 5.11 oi_dev (i86pc)
Denis, what is an illumos revision of this build?
Updated by Denis Kozadaev over 10 years ago
Unfortunately, I can't say anything about the revision :(
I did "hg pull -u" many times, the source tree is shared via nfs.
The only I can say now is that I booted the system when it was built,
this monday I will say you the date of some system file (/kernel/genunix, for example)
this date will be the nearest of the revision date.
I'm so sorry, guys...
Updated by Denis Kozadaev over 10 years ago
Hello, guys
the gate was built at 25 of feb.
-rwxr-xr-x 1 bin bin 3636544 Feb 25 17:41 genunix
take the latest revision before this date
and good luck
Updated by Marcel Telka almost 7 years ago
- Subject changed from kernel panic (openindiana) to NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420
Updated by Marcel Telka almost 7 years ago
Steps to reproduce
You need a debug build for the NFSv4 client (with compiled in asserts) to reproduce the panic.
- lock a file over NFSv4, leave the application running, so the file is opened and locked,
- remove the file locally at the NFS server,
- restart the NFS server (either
svcadm restart nfs/server
orreboot
), - once the NFS server is restarted the NFS client tries to reopen the file, but it fails (with the NFS4ERR_STALE error),
- try to terminate the locking application at the NFSv4 client, the system will panic.
> ::status debugging crash dump vmcore.0 (64-bit) from t3 operating system: 5.11 il-nightly-master-0-g81293f9 (i86pc) image uuid: c7393aa9-e737-4432-dbf6-d8db39fa9d4f panic message: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429 dump content: kernel pages only > ::stack vpanic() 0xfffffffffbdf77e8() vn_free+0x55(ffffff01a38a8300) destroy_rnode4+0x7c(ffffff01a3c18be8) rp4_addfree+0x2de(ffffff01a3c18be8, ffffff01ac2cf430) nfs4_inactive_otw+0x479(ffffff01a38a8300, ffffff01ac2cf430) nfs4_inactive_thread+0xde(ffffff01b16b5000) thread_start+8() >
Updated by Marcel Telka almost 7 years ago
- Related to Bug #7721: Poor nfs4_frlock() performance for blocking locks added
Updated by Marcel Telka almost 7 years ago
- Subject changed from NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420 to NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429
Updated by Marcel Telka almost 7 years ago
- Status changed from New to In Progress
- Assignee set to Marcel Telka
Updated by Marcel Telka almost 7 years ago
Root cause
In a case the file open recovery fails (so the R4RECOVERR
is set for a particular rnode4_t
) then the reclaim_one_lock()
does not attempt to recover lost locks for such a file and returns success. This causes that the relock_file()
leaks lost locks (in v_filocks).
The fix
The fix changes reclaim_one_lock()
so it returns error (ESTALE
) in such a case so the relock_file()
have a chance to notice a problem and thus send SIGLOST
to the particular process via nfs4_send_siglost()
. Finally, the relock_skip_pid()
is updated to unregister the lost locks.
Updated by Marcel Telka almost 7 years ago
Updated by Marcel Telka almost 7 years ago
- Status changed from In Progress to Pending RTI
Updated by Electric Monk almost 7 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
git commit faf39f174e3f2f426d74639ff702c5ba9f29f56b
commit faf39f174e3f2f426d74639ff702c5ba9f29f56b Author: Marcel Telka <marcel@telka.sk> Date: 2017-02-27T17:15:57.000Z 3629 NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429 Reviewed by: Arne Jansen <arne@die-jansens.de> Approved by: Robert Mustacchi <rm@joyent.com>