Bug #393

Invalid nd_hostservlist contents crash mountd

Added by Albert Lee over 3 years ago. Updated about 2 years ago.

Status:Resolved Start date:2010-11-02
Priority:Low Due date:
Assignee:Vitaliy Gusev % Done:

100%

Category:lib - userland libraries Spent time: -
Target version:-
Difficulty:Medium Tags:

Description

In some cases mount crashes in do_logging_queue:

> $C
fe34ffa8 do_logging_queue+0x49(80615a0, 8060998, fe34ffc8, 8055e0a)
fe34ffc8 logging_svc+0x52(0, fef60000, fe34ffe8, feedd72e)
fe34ffe8 libc_hwcap2.so.1`_thrp_setup+0x9b(fede1a40)
fe34fff8 libc_hwcap2.so.1`_lwp_start(fede1a40, 0, 0, 0, 0, 0)
> ::regs
%cs = 0x0043            %eax = 0x4d580000 
%ds = 0x004b            %ebx = 0x00000000 
%ss = 0x004b            %ecx = 0x00000000 
%es = 0x004b            %edx = 0xfef61014 libc_hwcap2.so.1`libc_malloc_lock+0xc
%fs = 0x0000            %esi = 0x080615a0 
%gs = 0x01c3            %edi = 0x00000000 

 %eip = 0x08055d3d do_logging_queue+0x49
 %ebp = 0xfe34ffa8
%kesp = 0x00000000

%eflags = 0x00010246
  id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
  status=<of,df,IF,tf,sf,ZF,af,PF,cf>
> do_logging_queue+0x49::dis
do_logging_queue+0x2f:          pushl  %eax
do_logging_queue+0x30:          pushl  0x10(%esi)
do_logging_queue+0x33:          call   +0xad8   <getclientsnames_lazy>
do_logging_queue+0x38:          addl   $0x10,%esp
do_logging_queue+0x3b:          testl  %eax,%eax
do_logging_queue+0x3d:          je     +0x4     <do_logging_queue+0x43>
do_logging_queue+0x3f:          xorl   %ebx,%ebx
do_logging_queue+0x41:          jmp    +0x8     <do_logging_queue+0x4b>
do_logging_queue+0x43:          movl   -0x1c(%ebp),%eax
do_logging_queue+0x46:          movl   0x4(%eax),%eax
do_logging_queue+0x49:          movl   (%eax),%ebx
do_logging_queue+0x4b:          subl   $0x4,%esp
do_logging_queue+0x4e:          pushl  0xc(%esi)
do_logging_queue+0x51:          pushl  0x4(%esi)
do_logging_queue+0x54:          pushl  %ebx
do_logging_queue+0x55:          call   -0x11f6  <PLT:audit_mountd_mount>
do_logging_queue+0x5a:          addl   $0x10,%esp
do_logging_queue+0x5d:          movl   0x8(%esi),%eax
do_logging_queue+0x60:          testl  %eax,%eax
do_logging_queue+0x62:          je     +0xd     <do_logging_queue+0x71>
do_logging_queue+0x64:          subl   $0x8,%esp

This corresponds with http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/fs.d/nfs/mountd/mountd.c#304
host = clnames->h_hostservs[0].h_host;

With some debug info:
Nov  2 15:42:09 server nfs4cbd[1511]: [ID 867284 daemon.notice] nfsv4 cannot determine local hostname binding for transport tcp - delegations will not be available on this transport
Nov  2 15:42:39 server mountd[1453]: [ID 175181 daemon.error] lq->ld_netid = tcp
Nov  2 15:42:39 server mountd[1453]: [ID 589989 daemon.error] Could not find DNS entry for tcp
Nov  2 15:42:39 server mountd[1453]: [ID 708950 daemon.error] clnames = 0x8071998
Nov  2 15:42:39 server mountd[1453]: [ID 589525 daemon.error] clnames->h_cnt = 4
Nov  2 15:42:39 server mountd[1453]: [ID 318296 daemon.error] clnames->h_hostservs = 0x4d580000

0x4d580000 is unmapped, so getclientsnames_lazy is screwing up.


Related issues

duplicated by illumos gate - Bug #1878: mountd crashes during unsuccessful NFS mount Closed 2011-12-10

History

Updated by Albert Lee over 3 years ago

  • Category changed from cmd - userland programs to lib - userland libraries

The nd_hostservlist pointer itself appears to not be updated after the resolver returns, so it's just garbage. The cause has been determined to be a previously referenced (but not officially reported?) issue with the mdns nss module. Will look into isolating the root cause.

Updated by Robert Gordon about 3 years ago

I have just started to see this same issue on opensolaris 134; Here is what is logged around the event..

Apr  9 12:09:24 opensolaris mountd[527]: [ID 589989 daemon.error] Could not find DNS entry for tcp
Apr  9 12:55:15 opensolaris nfssrv: [ID 939466 kern.warning] WARNING: nfsauth: mountd has not established door
Apr  9 12:55:16 opensolaris mountd[895]: [ID 589989 daemon.error] Could not find DNS entry for tcp
Apr  9 12:55:48 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4
Apr  9 12:56:02 opensolaris last message repeated 23 times
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    9 mbp_13._net-assistant._udp.local. (47) RDLen 9: .w
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    9 GrowlServer._nssocketport._tcp.local. (47) RDLen 9: ..
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    9 mbp_13._growl._tcp.local. (47) RDLen 9: ..
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    9 40-37-71-72 mbp_13._sleep-proxy._udp.local. (47) RDLen 9: .l
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    9 mbp_13._servermgr._tcp.local. (47) RDLen 9: ..
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    7 mbp_13._servermgr._tcp.local. (47) RDLen 7: ..
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from  120 to 4500 for    7 mbp_13._servermgr._tcp.local. (47) RDLen 7: ..
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from  120 to 4500 for    9 mbp_13._servermgr._tcp.local. (47) RDLen 9: ..
Apr  9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4
Apr  9 13:43:45 opensolaris mountd[915]: [ID 589989 daemon.error] Could not find DNS entry for tcp
Apr  9 13:49:23 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4
Apr  9 13:49:37 opensolaris last message repeated 22 times
Apr  9 13:49:37 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to  120 for    9 mbp_13._net-assistant._udp.local. (47) RDLen 9: .w
Apr  9 13:49:39 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4
Apr  9 13:49:39 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from  120 to 4500 for    9 mbp_13._net-assistant._udp.local. (47) RDLen 9: ..
Apr  9 13:49:39 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4

Updated by Robert Gordon about 3 years ago

work-a-round: Turn off mdns in /etc/nsswitch.conf

Updated by Vitaliy Gusev over 2 years ago

  • Assignee set to Vitaliy Gusev
  • Difficulty set to Medium
  • Tags set to needs-triage

Updated by Vitaliy Gusev over 2 years ago

Problem is in:

_get_hostserv_inetnetdir_byname() can wrongly return success but
without any result because nss_search() doesn't always care about
opaque arg.

Updated by Vitaliy Gusev over 2 years ago

_get_hostserv_inetnetdir_byname() can wrongly return success but without any result because nss_search() doesn't always care about opaque arg.

Fix is: check nss_search's return value and detect error.

Updated by Vitaliy Gusev about 2 years ago

Another crash:

libc_hwcap1.so.1`_lwp_kill+0x15
libc_hwcap1.so.1`raise+0x22
libc_hwcap1.so.1`abort+0xf2
libc_hwcap1.so.1`_assert+0x82
getclientsnames_common+0x69
getclientsnames_lazy+0x28
do_logging_queue+0x3b
logging_svc+0x52
libc_hwcap1.so.1`_thrp_setup+0x9b
libc_hwcap1.so.1`_lwp_start

Related code:

getclientsnames_common(struct netconfig *nconf, struct netbuf **nbuf,
struct nd_hostservlist **serv) {
if (__netdir_getbyaddr_nosrv(nconf, serv, *nbuf) != 0) {
^^^^^
This function returned 0, but without set *serv.
...
*serv = anon_client(host);
if (*serv == NULL)
return (ENOMEM);
}
assert(*serv != NULL);
^^^
Was this ASSERT

}

Here is code that caused returning 0:

_get_hostserv_inetnetdir_byaddr() {
...
if ((tmphe = DOOR_GETIPNODEBYADDR_R((char *)&v4mapbuf,
16, AF_INET6, ndbuf4host->result,
ndbuf4host->buffer,
ndbuf4host->buflen, &h_errnop)) != NULL)
he = __mappedtov4(tmphe, &h_errnop);
if (!he) {
/* Failover case, try hosts db for v4 address */
he = DOOR_GETHOSTBYADDR_R((char *)
&(sa->sin_addr.s_addr), 4,
sa->sin_family, ndbuf4host->result,
ndbuf4host->buffer, ndbuf4host->buflen,
&h_errnop);
^^^^
h_errnop was set to 0, that is a bug.
if (!he) {
NSS_XbyY_FREE(&ndbuf4host);
if (ndbuf4serv)
NSS_XbyY_FREE(&ndbuf4serv);
_nderror = __herrno2netdir(h_errnop);
return (_nderror);
}

Updated by Rich Lowe about 2 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in r13580 e7b96961e15f

Also available in: Atom PDF