Project

General

Profile

Bug #12976

system panics with error in IP module

Added by Jason Matthews 18 days ago. Updated 17 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
networking
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Upgraded some systems to newer releases and started seeing crashes. These systems make heavy use of NAT. Let me know what other information I can provide.

root@dbb006:/var/crash/volatile# uname -a && mdb vmcore.0
SunOS dbb006 5.11 joyent_20200701T231659Z i86pc i386 i86pc
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba fctl stmf_sbd stmf zfs mm sd lofs idm mpt_sas sata mr_sas crypto random cpc fcp logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> ::status
debugging crash dump vmcore.0 (64-bit) from dbb006
operating system: 5.11 joyent_20200701T231659Z (i86pc)
git branch: release-20200702
git rev: 4e1f938f9e4d41dcfe106b432ae2b08d3d5ab7c5
image uuid: (not set)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffbe33059a910 addr=30 occurred in module "ip" due to a NULL pointer dereference
dump content: kernel pages only

root@dbb005:/var/crash/volatile# uname -a ; mdb vmcore.1
SunOS dbb005 5.11 joyent_20200603T203505Z i86pc i386 i86pc
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba fctl stmf_sbd stmf zfs sd mm lofs idm mpt_sas mr_sas sata crypto random cpc fcp logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> ::status
debugging crash dump vmcore.1 (64-bit) from dbb005
operating system: 5.11 joyent_20200603T203505Z (i86pc)
git branch: release-20200604
git rev: 71b43f2a12f58ef8bc5a1965a3b742749bb49231
image uuid: (not set)
panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffbe332e29910 addr=30 occurred in module "ip" due to a NULL pointer dereference
dump content: kernel pages only

History

#1

Updated by Dan McDonald 17 days ago

Please make the vmdump available? If there's sensitive information on it, I can assure you (esp. given it's a SmartOS dump) we can respect the privacy of the dump in question.

#2

Updated by Dan McDonald 17 days ago

Got the coredump. Some initial observations:

> $C
fffffbe33059ab10 udp_do_bind+0x76(fffffdfa95564580, fffffbe33059ab28, 20, fffffdfca739fc88, 0)
fffffbe33059ab60 udp_implicit_bind+0x61(fffffdfa95564580, fffffdfca739fc88)
fffffbe33059ac00 udp_send+0x1a8(fffffdfa95564580, fffffdf586b34520, fffffbe33059ad18, fffffdfca739fc88)
fffffbe33059ac90 so_sendmsg+0x252(fffffdfde0f502d8, fffffbe33059ad18, fffffbe33059ae78, fffffdfca739fc88)
fffffbe33059acf0 socket_sendmsg+0x62(fffffdfde0f502d8, fffffbe33059ad18, fffffbe33059ae78, fffffdfca739fc88)
fffffbe33059ad90 socket_vop_write+0x61(fffffdfde0f4fd80, fffffbe33059ae78, 0, fffffdfca739fc88, 0)
fffffbe33059ae30 fop_write+0x111(fffffdfde0f4fd80, fffffbe33059ae78, 0, fffffdfca739fc88, 0)
fffffbe33059af00 write+0x2c6(14b, c02e326700, 350)
fffffbe33059af10 sys_syscall+0x1a8()

So it died in udp_do_bind() as part of the implicit-bind one does with a sendto() or sendmsg() (or write() in this case?).

A quick look at the code suggests the conn_t has a NULL netstack for it to panic there, and sure enough:

> fffffdfa95564580::print -at conn_t conn_netstack
fffffdfa95564740 netstack_t *conn_netstack = 0
> 

So what's interesting about this bug & dump is: How did a conn_t end up with a NULL netstack pointer? To be continued...

#3

Updated by Dan McDonald 17 days ago

One more piece of data, this appears to be use-after-free:

> fffffdfa95564580::whatis
fffffdfa95564580 is freed from udp_conn_cache

So now to figure out what two threads running together can cause this.

Also available in: Atom PDF