Bug #1588
closednfs4 mirror mount hang
100%
Description
nfs4 mirror mounting facility hangs when the process receives a signal during on-demand mounting. The problem is that nfs4_trigger_domount_args_create() fails to check the return value of nfs4_trigger_ping_server() for EINTR, thus looping endlessly when a signal is pending.
Alternatively it could flag the call with nointr.
Files
Updated by Arne Jansen almost 12 years ago
A hotfix for oi_148:
echo 'nfs4_trigger_domount_args_create+0x34/v 0' | mdb -kw
This flags all calls to nfs4_trigger_ping_server as nointr.
Updated by Albert Lee almost 12 years ago
- Category set to nfs - NFS server and client
- Difficulty changed from Expert to Medium
Updated by Simon K over 11 years ago
- File 0001-Bug-fixed-nfs4-mirror-mounting-facility-hangs-when-t.patch 0001-Bug-fixed-nfs4-mirror-mounting-facility-hangs-when-t.patch added
I implemented a better error handling in nfs4_stub_vnops.c (see attachment). The function nfs4_trigger_domount_args_create() now returns EINTR if there's a RPC_INTR from nfs4_trigger_ping_server(), EINVAL in case of other errors or zero if everything is fine. I tested this patch on openindiana 151a and it works for me.
It's a git patch file but I think it's useful for you, too.
Updated by Simon K over 11 years ago
It's a little bit difficult to provoke this bug but here is what I did to reproduce this bug:
I used two openindiana 151a machines.
host1: I used dtrace to see if it works:
dtrace -n 'nfs4_trigger_domount_args_create:* { trace(arg1); }' -n 'nfs4_trigger_ping_server:* { trace(arg1); }' and/or dtrace -n 'nfs4*:return {trace(arg1);}'
host1: I mounted a filesystem via nfs from host2
host2: Typed "uadmin 1 1" to reboot
host1: I moved with cd as fast as possible(!) into a deeper filesystem from host2
host1: After some time I saw a few calls for nfs4_trigger_ping_server() in my dtrace output and I sent my cd process a SIGINTR with a ctrl+c. (If you can't see the call it's time to start over.) cd hung and I was able to see a pending interrupt signal from nfs4_trigger_ping_server() with dtrace.
With my latest patch (http://cr.illumos.org/view/e4qg2vxm/nfs-webrev/) dtrace looks like the following:
3 65386 nfs4_ephemeral_tree_hold:return 0 2 65416 nfs4_ping_server_common:return 5 2 65402 nfs4_trigger_ping_server:return 5 2 65416 nfs4_ping_server_common:return 5 2 65402 nfs4_trigger_ping_server:return 5 2 65416 nfs4_ping_server_common:return 18 2 65402 nfs4_trigger_ping_server:return 18 2 65408 nfs4_trigger_esi_destroy:return 2 2 65396 nfs4_trigger_domount_args_create:return 4 2 65388 nfs4_ephemeral_tree_decr:return 8 2 65392 nfs4_trigger_mount:return 4 2 65364 nfs4_trigger_access:return 4 2 65921 nfs4_waitfor_purge_complete:return 0 2 65921 nfs4_waitfor_purge_complete:return 0 2 66129 nfs4_validate_caches:return 0 2 65921 nfs4_waitfor_purge_complete:return 0
...and cd aborts.
Updated by Rich Lowe over 11 years ago
- Subject changed from nfs4 Mirror Mount hang to nfs4 mirror mount hang
Updated by Rich Lowe over 11 years ago
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Resolved in r13609 commit:7442c4b86390