4518 lockd: Cannot establish NLM service over <file desc. 9, protocol udp>

Review Request #13 — Created March 27, 2015 and submitted — Latest diff uploaded

marcel
illumos-gate
4518
general

webrev: http://cr.illumos.org/~webrev/marcel/il-4518-lockd/

The fix:

  • removes the cond_wait() call in sm_simu_crash_svc() function (file sm_proc.c)
    to avoid the EIO error,

  • implements the daemonize_init()/daemonize_fini() call sequence in the main()
    function (file sm_svc.c) to help with the ENOENT error,

  • moves the merge_hosts() and merge_ips() calls from the main() to a separate
    asynchronous thread (named thr_statd_merges()) and lets the statd_call_statd()
    function (in sm_statd.c) to wait until the thr_statd_merges() is completed.
    This fixes the ENOENT error.

More technical details (if anyone is interested) are in my March comments in #4518.

Note: 10.0.0.99 is IP address of a non-existing host.

Without the fix:
----------------

# svcadm disable nfs/nlockmgr
# svcadm disable nfs/status
# ln -s test /var/statmon/sm.bak/ipv4.10.0.0.99
# svcadm enable nfs/status ; svcadm enable nfs/nlockmgr
# sleep 70
# svcs -xv
svc:/network/nfs/nlockmgr:default (NFS lock manager)
 State: maintenance since Wed Mar 25 16:45:58 2015
Reason: Start method failed repeatedly, last exited with status 1.
   See: http://illumos.org/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M lockd
   See: /var/svc/log/network-nfs-nlockmgr:default.log
Impact: This service is not running.
# grep "NLM service" /var/adm/messages | tail -1
Mar 25 16:45:58 openindiana /usr/lib/nfs/lockd[7174]: [ID 491006 daemon.error] Cannot establish NLM service over <file desc. 9, protocol udp> : I/O error. Exiting
#

# svcadm disable nfs/nlockmgr
# svcadm disable nfs/status
# gsed -i -e 's/^nameserver.*$/nameserver 10.0.0.99/' /etc/resolv.conf
# svcadm enable nfs/status ; svcadm enable nfs/nlockmgr
# sleep 70
# svcs -xv
svc:/network/nfs/nlockmgr:default (NFS lock manager)
 State: maintenance since Wed Mar 25 16:55:03 2015
Reason: Start method failed repeatedly, last exited with status 1.
   See: http://illumos.org/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M lockd
   See: /var/svc/log/network-nfs-nlockmgr:default.log
Impact: This service is not running.
# grep "NLM service" /var/adm/messages | tail -1
Mar 25 16:55:03 openindiana /usr/lib/nfs/lockd[7233]: [ID 491006 daemon.error] Cannot establish NLM service over <file desc. 9, protocol udp> : No such file or directory. Exiting
#


With the fix:
-------------

# svcadm disable nfs/nlockmgr
# svcadm disable nfs/status
# ln -s test /var/statmon/sm.bak/ipv4.10.0.0.99
# svcadm enable nfs/status ; svcadm enable nfs/nlockmgr
# sleep 70
# svcs -xv
# grep "NLM service" /var/adm/messages | tail -1
#

# svcadm disable nfs/nlockmgr
# svcadm disable nfs/status
# gsed -i -e 's/^nameserver.*$/nameserver 10.0.0.99/' /etc/resolv.conf
# svcadm enable nfs/status ; svcadm enable nfs/nlockmgr
# sleep 70
# svcs -xv
# grep "NLM service" /var/adm/messages | tail -1
#
Loading...