panic in signalfd
When running a zone-reboot workload, I've been able to reliably induce failures similar to this on my memory-constrained test VM:
debugging crash dump vmcore.3 (64-bit) from 00-0c-29-18-ec-10 operating system: 5.11 joyent_20160218T004651Z (i86pc) image uuid: (not set) panic message: mutex_enter: bad mutex, lp=ffffff0151934718 owner=6f767a2f76656428 thread=ffffff014a5fd460 dump content: kernel pages only > $C ffffff0004ae7bb0 vpanic() ffffff0004ae7bd0 mutex_panic+0x58(fffffffffb94ceeb, ffffff0151934718) ffffff0004ae7c40 mutex_vector_enter+0x347(ffffff0151934718) ffffff0004ae7ca0 signalfd_pollwake_cb+0xc7(ffffff0151a5c028, 17) ffffff0004ae7d00 sigtoproc+0x1b7(ffffff0151a5c028, 0, 17) ffffff0004ae7d40 killall+0x96(2a, 0) ffffff0004ae7d70 zone_empty+0x2e(ffffff01631466c0) ffffff0004ae7db0 zone_shutdown+0xe6(2a) ffffff0004ae7f00 zone+0x237(5, 2a, 0, 1, 540) ffffff0004ae7f10 sys_syscall+0x1bd()
Updated by Patrick Mooney almost 6 years ago
Looking back, I believe part of the root cause behind this issue was how the lifetime of a signalfd instance related to its association with processes which may have utilized it. If a signalfd handle was shared between processes via fork(), the signalfd_close handler (which cleans up references in the proc_t) would only fire when the last fd reference was closed. This could lead to clean-up ordering which would result in stale/freed signalfd resources being referenced.
Updated by Electric Monk almost 6 years ago
- Status changed from New to Closed
commit abf99a006172ea5aab2246bda23f9d6d935bf1ad Author: Patrick Mooney <firstname.lastname@example.org> Date: 2016-10-13T23:27:17.000Z 7420 signalfd deadlock on pollwakeup 7421 panic in signalfd Reviewed by: Jerry Jelinek <email@example.com> Reviewed by: Robert Mustacchi <firstname.lastname@example.org> Reviewed by: Dan McDonald <email@example.com> Approved by: Gordon Ross <firstname.lastname@example.org>