Bug #3629
closedNFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429
100%
Description
There is a kernel panic due to NFS i/o processing.
vmdump is available (probably, with debug info, unfortunately I don't remember it exactly)
some info from mdb:
witch# mdb -k unix.9 vmcore.9 Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba sd stmf stmf_sbd s1394 fctl lofs sata random nfs idm cpc fcip crypto sppp ufs logindmux ptm ipc ] > ::status debugging crash dump vmcore.9 (64-bit) from witch operating system: 5.11 oi_dev (i86pc) image uuid: 53214ce9-0ff3-48eb-a21e-edc36b25b361 panic message: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420 dump content: kernel pages only > ::stack vpanic() assfail+0x89(fffffffffbfa3b78, fffffffffbfa3de0, 974) vn_free+0x111(ffffff039d062300) destroy_rnode4+0x66(ffffff01c3e00038) rp4_addfree+0x16e(ffffff01c3e00038, ffffff01c8c58c68) nfs4_inactive_otw+0x4f9(ffffff039d062300, ffffff01c8c58c68) nfs4_inactive_thread+0xb5(ffffffa6649d6000) thread_start+8() > ::panicinfo cpu 0 thread ffffff0005d13c40 message assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420 rdi fffffffffbf87248 rsi ffffff0005d138e0 rdx fffffffffbfa3de0 rcx ffffff0005d13920 r8 ffffff01ce9407a9 r9 0 rax ffffff0005d13830 rbx fffffffffbfa3de0 rbp ffffff0005d13910 r10 ffffff04d1d310a8 r10 ffffff04d1d310a8 r11 7 r12 fffffffffbfa3b78 r13 974 r14 ffffffa6649d6148 r15 ffffffa4d4c555b8 fsbase 0 gsbase fffffffffbc3e420 ds 4b es 4b fs 0 gs 1c3 trapno 0 err 0 rip fffffffffb86b760 cs 30 rflags 246 rsp ffffff0005d13828 ss 38 gdt_hi 0 gdt_lo e00001ef idt_hi 0 idt_lo d0000fff ldt 0 task 70 cr0 8005003b cr2 8046cbc cr3 5400000 cr4 6f8
The file is usr/src/uts/common/fs/vnode.c:
void vn_free(vnode_t *vp) { ASSERT(vp->v_shrlocks == NULL); ASSERT(vp->v_filocks == NULL); ...
Related issues
Updated by Marcel Telka about 9 years ago
- Category set to nfs - NFS server and client
Could you please make the crash dump file available? Thanks.
Updated by Denis Kozadaev about 9 years ago
yes, of course
http://silversoft.net/~denis/vmdump.9
Updated by Marcel Telka about 9 years ago
This is the vnode that caused the panic (please note v_filocks):
> ffffff039d062300::print vnode_t { v_lock = { _opaque = [ 0 ] } v_flag = 0x10002 v_count = 0x1 v_data = 0xffffff01c3e00038 v_vfsp = 0xffffff01ad7d6930 v_stream = 0 v_type = 1 (VREG) v_rdev = 0 v_vfsmountedhere = 0 v_op = 0xffffff01a9697c40 v_pages = 0 v_filocks = 0xffffffef2b3b6700 v_shrlocks = 0 v_nbllock = { _opaque = [ 0 ] } v_cv = { _opaque = 0 } v_locality = 0 v_femhead = 0 v_path = 0xffffff01a5cb9b28 "/home/andrej/foxeye/atest.html" v_rdcnt = 0 v_wrcnt = 0 v_mmap_read = 0 v_mmap_write = 0 v_mpssdata = 0 v_fopdata = 0 v_vsd_lock = { _opaque = [ 0 ] } v_vsd = 0 v_xattrdir = 0 v_count_dnlc = 0 } >
The machine panicked at 2013 Mar 15 14:19:05:
> time/Y time: time: 2013 Mar 15 14:19:05 >
There are three mounted nfs filesystems:
> ::fsinfo!grep nfs ffffff01ad7d6760 nfs /nfs ffffff01ad7d6590 nfs /home/denis ffffff01ad7d6930 nfs /home/andrej >
and we have several recovery related messages logged around the panic time. This is for the /nfs:
> ffffff01ad7d6760::print vfs_t vfs_data|::print mntinfo4_t mi_msg_list|::nfs4_mimsg ! tail [NFS4]2013 Mar 15 11:56:09: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:09: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:05: Starting recovery for mount /nfs (0xffffff04c70f9000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:05: Recovery done for mount /nfs (0xffffff04c70f9000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) >
This is for /home/denis:
> ffffff01ad7d6590::print vfs_t vfs_data|::print mntinfo4_t mi_msg_list|::nfs4_mimsg ! tail [NFS4]2013 Mar 15 11:56:09: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:09: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:56:58: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:09: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Starting recovery for mount /home/denis (0xffffffa20b73b000, flags 0x22) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:05: File ./.xsession-errors(rnode_pt ffffff01cee9c440) on server fucker could not be recovered and was closed. Couldn't reopen: NFSv4 error NFS4ERR_OLD_STATEID. [NFS4]2013 Mar 15 14:19:05: Op OP_OPEN got error NFS4ERR_OLD_STATEID causing recovery action NR_OLDSTATEID. [NFS4]2013 Mar 15 14:19:05: Recovery done for mount /home/denis (0xffffffa20b73b000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) >
And this is for /home/andrej:
> ffffff01ad7d6930::print vfs_t vfs_data|::print mntinfo4_t mi_msg_list|::nfs4_mimsg ! tail [NFS4]2013 Mar 15 11:57:09: Starting recovery for mount /home/andrej (0xffffffa6649d6000, flags 0x1) on server fucker, rnode_pt1 ./foxeye/foxeye.debug (0xffffff0499206030), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 11:57:10: Recovery done for mount /home/andrej (0xffffffa6649d6000) on server fucker, rnode_pt1 ./foxeye/foxeye.debug (0xffffff0499206030), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Op OP_RENEW got error NFS4ERR_EXPIRED causing recovery action NR_CLIENTID. [NFS4]2013 Mar 15 14:19:04: Starting recovery for mount /home/andrej (0xffffffa6649d6000, flags 0x1) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) [NFS4]2013 Mar 15 14:19:04: Couldn't reclaim lock for pid 151483 for file ./foxeye/logs/ugolok (rnode_pt ffffff04c39fc400) on (server fucker): error 10033 [NFS4]2013 Mar 15 14:19:04: Process 151483 lost its locks on file ./foxeye/logs/ugolok (rnode_pt ffffff04c39fc400) due to a NFS error (10033) on server fucker [NFS4]2013 Mar 15 14:19:05: File ./foxeye/atest.html(rnode_pt ffffff01c3e00038) on server fucker could not be recovered and was closed. Couldn't reopen: NFSv4 error NFS4ERR_OLD_STATEID. [NFS4]2013 Mar 15 14:19:05: Op OP_OPEN got error NFS4ERR_OLD_STATEID causing recovery action NR_OLDSTATEID. [NFS4]2013 Mar 15 14:19:05: Op OP_OPEN got error NFS4ERR_EXPIRED causing recovery action NR_CLIENTID. [NFS4]2013 Mar 15 14:19:05: Recovery done for mount /home/andrej (0xffffffa6649d6000) on server fucker, rnode_pt1 <NULL> (0x0), rnode_pt2 <NULL> (0x0) >
From the log we see that the recovery for /home/andrej/foxeye/atest.html failed (the file was closed). Immediately after that we panicked.
Updated by Piotr Jasiukajtis about 9 years ago
Denis Kozadaev wrote:
There is a kernel panic due to NFS i/o processing.
vmdump is available (probably, with debug info, unfortunately I don't remember it exactly)
some info from mdb:
witch# mdb -k unix.9 vmcore.9
Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci zfs ip hook neti sockfs arp usba sd stmf stmf_sbd s1394 fctl lofs sata random nfs idm cpc fcip crypto sppp ufs logindmux ptm ipc ]::status
debugging crash dump vmcore.9 (64-bit) from witch
operating system: 5.11 oi_dev (i86pc)
Denis, what is an illumos revision of this build?
Updated by Denis Kozadaev about 9 years ago
Unfortunately, I can't say anything about the revision :(
I did "hg pull -u" many times, the source tree is shared via nfs.
The only I can say now is that I booted the system when it was built,
this monday I will say you the date of some system file (/kernel/genunix, for example)
this date will be the nearest of the revision date.
I'm so sorry, guys...
Updated by Denis Kozadaev about 9 years ago
Hello, guys
the gate was built at 25 of feb.
-rwxr-xr-x 1 bin bin 3636544 Feb 25 17:41 genunix
take the latest revision before this date
and good luck
Updated by Marcel Telka over 5 years ago
- Subject changed from kernel panic (openindiana) to NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420
Updated by Marcel Telka over 5 years ago
Steps to reproduce
You need a debug build for the NFSv4 client (with compiled in asserts) to reproduce the panic.
- lock a file over NFSv4, leave the application running, so the file is opened and locked,
- remove the file locally at the NFS server,
- restart the NFS server (either
svcadm restart nfs/server
orreboot
), - once the NFS server is restarted the NFS client tries to reopen the file, but it fails (with the NFS4ERR_STALE error),
- try to terminate the locking application at the NFSv4 client, the system will panic.
> ::status debugging crash dump vmcore.0 (64-bit) from t3 operating system: 5.11 il-nightly-master-0-g81293f9 (i86pc) image uuid: c7393aa9-e737-4432-dbf6-d8db39fa9d4f panic message: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429 dump content: kernel pages only > ::stack vpanic() 0xfffffffffbdf77e8() vn_free+0x55(ffffff01a38a8300) destroy_rnode4+0x7c(ffffff01a3c18be8) rp4_addfree+0x2de(ffffff01a3c18be8, ffffff01ac2cf430) nfs4_inactive_otw+0x479(ffffff01a38a8300, ffffff01ac2cf430) nfs4_inactive_thread+0xde(ffffff01b16b5000) thread_start+8() >
Updated by Marcel Telka over 5 years ago
- Related to Bug #7721: Poor nfs4_frlock() performance for blocking locks added
Updated by Marcel Telka over 5 years ago
- Subject changed from NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2420 to NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429
Updated by Marcel Telka over 5 years ago
- Status changed from New to In Progress
- Assignee set to Marcel Telka
Updated by Marcel Telka over 5 years ago
Root cause
In a case the file open recovery fails (so the R4RECOVERR
is set for a particular rnode4_t
) then the reclaim_one_lock()
does not attempt to recover lost locks for such a file and returns success. This causes that the relock_file()
leaks lost locks (in v_filocks).
The fix
The fix changes reclaim_one_lock()
so it returns error (ESTALE
) in such a case so the relock_file()
have a chance to notice a problem and thus send SIGLOST
to the particular process via nfs4_send_siglost()
. Finally, the relock_skip_pid()
is updated to unregister the lost locks.
Updated by Marcel Telka over 5 years ago
Updated by Marcel Telka over 5 years ago
- Status changed from In Progress to Pending RTI
Updated by Electric Monk about 5 years ago
- Status changed from Pending RTI to Closed
- % Done changed from 0 to 100
git commit faf39f174e3f2f426d74639ff702c5ba9f29f56b
commit faf39f174e3f2f426d74639ff702c5ba9f29f56b Author: Marcel Telka <marcel@telka.sk> Date: 2017-02-27T17:15:57.000Z 3629 NFSv4 client: assertion failed: vp->v_filocks == NULL, file: ../../common/fs/vnode.c, line: 2429 Reviewed by: Arne Jansen <arne@die-jansens.de> Approved by: Robert Mustacchi <rm@joyent.com>