Bug #393
closedInvalid nd_hostservlist contents crash mountd
100%
Description
In some cases mount crashes in do_logging_queue
:
> $C fe34ffa8 do_logging_queue+0x49(80615a0, 8060998, fe34ffc8, 8055e0a) fe34ffc8 logging_svc+0x52(0, fef60000, fe34ffe8, feedd72e) fe34ffe8 libc_hwcap2.so.1`_thrp_setup+0x9b(fede1a40) fe34fff8 libc_hwcap2.so.1`_lwp_start(fede1a40, 0, 0, 0, 0, 0) > ::regs %cs = 0x0043 %eax = 0x4d580000 %ds = 0x004b %ebx = 0x00000000 %ss = 0x004b %ecx = 0x00000000 %es = 0x004b %edx = 0xfef61014 libc_hwcap2.so.1`libc_malloc_lock+0xc %fs = 0x0000 %esi = 0x080615a0 %gs = 0x01c3 %edi = 0x00000000 %eip = 0x08055d3d do_logging_queue+0x49 %ebp = 0xfe34ffa8 %kesp = 0x00000000 %eflags = 0x00010246 id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0 status=<of,df,IF,tf,sf,ZF,af,PF,cf> > do_logging_queue+0x49::dis do_logging_queue+0x2f: pushl %eax do_logging_queue+0x30: pushl 0x10(%esi) do_logging_queue+0x33: call +0xad8 <getclientsnames_lazy> do_logging_queue+0x38: addl $0x10,%esp do_logging_queue+0x3b: testl %eax,%eax do_logging_queue+0x3d: je +0x4 <do_logging_queue+0x43> do_logging_queue+0x3f: xorl %ebx,%ebx do_logging_queue+0x41: jmp +0x8 <do_logging_queue+0x4b> do_logging_queue+0x43: movl -0x1c(%ebp),%eax do_logging_queue+0x46: movl 0x4(%eax),%eax do_logging_queue+0x49: movl (%eax),%ebx do_logging_queue+0x4b: subl $0x4,%esp do_logging_queue+0x4e: pushl 0xc(%esi) do_logging_queue+0x51: pushl 0x4(%esi) do_logging_queue+0x54: pushl %ebx do_logging_queue+0x55: call -0x11f6 <PLT:audit_mountd_mount> do_logging_queue+0x5a: addl $0x10,%esp do_logging_queue+0x5d: movl 0x8(%esi),%eax do_logging_queue+0x60: testl %eax,%eax do_logging_queue+0x62: je +0xd <do_logging_queue+0x71> do_logging_queue+0x64: subl $0x8,%esp
This corresponds with http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/fs.d/nfs/mountd/mountd.c#304
host = clnames->h_hostservs[0].h_host;
With some debug info:
Nov 2 15:42:09 server nfs4cbd[1511]: [ID 867284 daemon.notice] nfsv4 cannot determine local hostname binding for transport tcp - delegations will not be available on this transport Nov 2 15:42:39 server mountd[1453]: [ID 175181 daemon.error] lq->ld_netid = tcp Nov 2 15:42:39 server mountd[1453]: [ID 589989 daemon.error] Could not find DNS entry for tcp Nov 2 15:42:39 server mountd[1453]: [ID 708950 daemon.error] clnames = 0x8071998 Nov 2 15:42:39 server mountd[1453]: [ID 589525 daemon.error] clnames->h_cnt = 4 Nov 2 15:42:39 server mountd[1453]: [ID 318296 daemon.error] clnames->h_hostservs = 0x4d580000
0x4d580000 is unmapped, so
getclientsnames_lazy
is screwing up.
Related issues
Updated by Albert Lee over 12 years ago
- Category changed from cmd - userland programs to lib - userland libraries
The nd_hostservlist
pointer itself appears to not be updated after the resolver returns, so it's just garbage. The cause has been determined to be a previously referenced (but not officially reported?) issue with the mdns nss module. Will look into isolating the root cause.
Updated by Robert Gordon about 12 years ago
I have just started to see this same issue on opensolaris 134; Here is what is logged around the event..
Apr 9 12:09:24 opensolaris mountd[527]: [ID 589989 daemon.error] Could not find DNS entry for tcp Apr 9 12:55:15 opensolaris nfssrv: [ID 939466 kern.warning] WARNING: nfsauth: mountd has not established door Apr 9 12:55:16 opensolaris mountd[895]: [ID 589989 daemon.error] Could not find DNS entry for tcp Apr 9 12:55:48 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4 Apr 9 12:56:02 opensolaris last message repeated 23 times Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 9 mbp_13._net-assistant._udp.local. (47) RDLen 9: .w Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 9 GrowlServer._nssocketport._tcp.local. (47) RDLen 9: .. Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 9 mbp_13._growl._tcp.local. (47) RDLen 9: .. Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 9 40-37-71-72 mbp_13._sleep-proxy._udp.local. (47) RDLen 9: .l Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 9 mbp_13._servermgr._tcp.local. (47) RDLen 9: .. Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 7 mbp_13._servermgr._tcp.local. (47) RDLen 7: .. Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4 Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 120 to 4500 for 7 mbp_13._servermgr._tcp.local. (47) RDLen 7: .. Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 120 to 4500 for 9 mbp_13._servermgr._tcp.local. (47) RDLen 9: .. Apr 9 12:56:02 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4 Apr 9 13:43:45 opensolaris mountd[915]: [ID 589989 daemon.error] Could not find DNS entry for tcp Apr 9 13:49:23 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4 Apr 9 13:49:37 opensolaris last message repeated 22 times Apr 9 13:49:37 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 4500 to 120 for 9 mbp_13._net-assistant._udp.local. (47) RDLen 9: .w Apr 9 13:49:39 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4 Apr 9 13:49:39 opensolaris mDNSResponder: [ID 702911 daemon.error] Correcting TTL from 120 to 4500 for 9 mbp_13._net-assistant._udp.local. (47) RDLen 9: .. Apr 9 13:49:39 opensolaris mDNSResponder: [ID 702911 daemon.error] ERROR: getOptRdata - unknown opt 4
Updated by Robert Gordon about 12 years ago
work-a-round: Turn off mdns in /etc/nsswitch.conf
Updated by Vitaliy Gusev over 11 years ago
- Assignee set to Vitaliy Gusev
- Difficulty set to Medium
- Tags set to needs-triage
Updated by Vitaliy Gusev over 11 years ago
Fixed patch: http://cr.illumos.org/view/bm9forx6
Updated by Vitaliy Gusev over 11 years ago
Problem is in:
_get_hostserv_inetnetdir_byname() can wrongly return success but
without any result because nss_search() doesn't always care about
opaque arg.
Updated by Vitaliy Gusev over 11 years ago
_get_hostserv_inetnetdir_byname() can wrongly return success but
without any result because nss_search() doesn't always care about
opaque arg.
Fix is: check nss_search's return value and detect error.
Updated by Vitaliy Gusev over 11 years ago
Another crash:
libc_hwcap1.so.1`_lwp_kill+0x15
libc_hwcap1.so.1`raise+0x22
libc_hwcap1.so.1`abort+0xf2
libc_hwcap1.so.1`_assert+0x82
getclientsnames_common+0x69
getclientsnames_lazy+0x28
do_logging_queue+0x3b
logging_svc+0x52
libc_hwcap1.so.1`_thrp_setup+0x9b
libc_hwcap1.so.1`_lwp_start
Related code:
getclientsnames_common(struct netconfig *nconf, struct netbuf **nbuf,
struct nd_hostservlist **serv)
{
if (__netdir_getbyaddr_nosrv(nconf, serv, *nbuf) != 0) {
^^^^^
This function returned 0, but without set *serv.
...
*serv = anon_client(host);
if (*serv == NULL)
return (ENOMEM);
}
assert(*serv != NULL);
^^^
Was this ASSERT
}
Here is code that caused returning 0:
_get_hostserv_inetnetdir_byaddr() {
...
if ((tmphe = DOOR_GETIPNODEBYADDR_R((char *)&v4mapbuf,
16, AF_INET6, ndbuf4host->result,
ndbuf4host->buffer,
ndbuf4host->buflen, &h_errnop)) != NULL)
he = __mappedtov4(tmphe, &h_errnop);
if (!he) {
/* Failover case, try hosts db for v4 address */
he = DOOR_GETHOSTBYADDR_R((char *)
&(sa->sin_addr.s_addr), 4,
sa->sin_family, ndbuf4host->result,
ndbuf4host->buffer, ndbuf4host->buflen,
&h_errnop);
^^^^
h_errnop was set to 0, that is a bug.
if (!he) {
NSS_XbyY_FREE(&ndbuf4host);
if (ndbuf4serv)
NSS_XbyY_FREE(&ndbuf4serv);
_nderror = __herrno2netdir(h_errnop);
return (_nderror);
}
Updated by Rich Lowe over 11 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Resolved in r13580 commit:e7b96961e15f