Bug #13739
closedvnode reference leak in copen()
100%
Description
While testing #13380, I had a path in the global zone lofs mounted into a NGZ, and then passed to a bhyve guest using virtio-9p.
After running an rsync in the guest to write files to the filesystem, the zone would no longer shut down due to references on the lofs mount point vnode.
Having confirmed that nothing was accessing the directory by looking at zone processes and using pfiles, I changed to the directory and attempted to unmount it:
# dtrace -n 'fbt::lo_unmount:entry{li = (struct loinfo *)args[0]->vfs_data;printf(" li_refct = %d, rootvp = %d\n", li->li_refct, li->li_rootvp->v_count)}' -c "umount `pwd`" dtrace: description 'fbt::lo_unmount:entry' matched 1 probe umount: /zones/deb/root/data/af/bhyve busy dtrace: pid 28664 has exited CPU ID FUNCTION:NAME 4 78768 lo_unmount:entry li_refct = 1, rootvp = 3611
The high reference count on the rootvp is a problem.
I used dtrace to watch holds and releases on this vnode while running rsync in the guest. The usual pattern was something like:
3 11363 copen:vn-hold bhyve vn-hold(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2545 3 11371 lookuppnatcred:vn-hold bhyve vn-hold(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2546 3 13564 lo_lookup:vn-hold bhyve vn-hold(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2547 3 11306 vn_rele:vn-rele bhyve vn-rele(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2546 3 11306 vn_rele:vn-rele bhyve vn-rele(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2545 3 11306 vn_rele:vn-rele bhyve vn-rele(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2544 3 26906 copen:return @504
However, occasionally, when lookupnameat() fails, it is more like:
3 11363 copen:vn-hold bhyve vn-hold(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2545 3 11371 lookuppnatcred:vn-hold bhyve vn-hold(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2546 3 13564 lo_lookup:vn-hold bhyve vn-hold(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2547 3 11306 vn_rele:vn-rele bhyve vn-rele(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2546 3 11306 vn_rele:vn-rele bhyve vn-rele(fffffe2416de4080[/zones/deb/root/data/af/bhyve]) 2545 3 31288 lookupnameat:entry ./bob/usr/lib/pymodules/python2.7/ndg_httpsclient-0.3.2.egg-info/namespace_packages.txt 3 31289 lookupnameat:return 2 3 26906 copen:return @504
i.e. there is a reference leak in this case.
Updated by Andy Fiddaman about 2 years ago
I tested this with a build and onu, and then running the workload that always previous reproduced the problem and confirmed that the vnode reference counts were the same before and after, and that the lofs filesystem could be unmounted after the test runs.
Updated by Dan McDonald about 2 years ago
Introduced in 2007 commit da6c28aaf62fa55f0fdb8004aa40f88f23bf53f0
Updated by Electric Monk about 2 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 99d240ec37397536063ecdea5644c8ef759c3e9a
commit 99d240ec37397536063ecdea5644c8ef759c3e9a Author: Andy Fiddaman <omnios@citrus-it.co.uk> Date: 2021-04-22T18:30:38.000Z 13739 vnode reference leak in copen() Reviewed by: Jason King <jason.king@joyent.com> Reviewed by: Mike Zeller <mike.zeller@joyent.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Approved by: Dan McDonald <danmcd@joyent.com>