Project

General

Profile

Bug #7420

signalfd deadlock on pollwakeup

Added by Robert Mustacchi about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
2016-09-27
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

> ::status
debugging crash dump vmcore.0 (64-bit) from azrael
operating system: 5.11 joyent_20160603T232338Z (i86pc)
image uuid: (not set)
panic message: Deadlock: cycle in blocking chain
dump content: kernel pages only
> $C
ffffff005c2f9850 vpanic()
ffffff005c2f98f0 0xfffffffffbb0cd3a()
ffffff005c2f9960 mutex_vector_enter+0x3a3(ffffff1328641480)
ffffff005c2f99e0 signalfd_poll+0xd6(12500000001, 1, 0, ffffff005c2f9b4e, ffffff005c2f9b40)
ffffff005c2f9a20 cdev_poll+0x38(12500000001, 1, 0, ffffff005c2f9b4e, ffffff005c2f9b40)
ffffff005c2f9a60 spec_poll+0x70(ffffff139a8e3480, 1, 0, ffffff005c2f9b4e, ffffff005c2f9b40, 0)
ffffff005c2f9ad0 fop_poll+0x3c(ffffff139a8e3480, 1, 0, ffffff005c2f9b4e, ffffff005c2f9b40, 0)
ffffff005c2f9bb0 dp_pcache_poll+0x18a(ffffff134e0e33f8, ffffff2a2739a840, ffffff2a26677af0, 12, ffffff005c2f9c4c)
ffffff005c2f9cb0 dpioctl+0x4a5(2c00000004, d001, ffffff005c2f9df0, 80200003, ffffff139e93ca28, ffffff005c2f9e1c)
ffffff005c2f9cf0 cdev_ioctl+0x39(2c00000004, d001, ffffff005c2f9df0, 80200003, ffffff139e93ca28, ffffff005c2f9e1c)
ffffff005c2f9d40 spec_ioctl+0x60(ffffff2a54e40c80, d001, ffffff005c2f9df0, 80200003, ffffff139e93ca28, ffffff005c2f9e1c, 0)
ffffff005c2f9dd0 fop_ioctl+0x55(ffffff2a54e40c80, d001, ffffff005c2f9df0, 80200003, ffffff139e93ca28, ffffff005c2f9e1c, 0)
ffffff005c2f9e70 lx_epoll_wait+0xa2(2, 7fffffeff060, 12, ffffffff)
ffffff005c2f9ef0 lx_syscall_enter+0x16f()
ffffff005c2f9f10 sys_syscall+0x16c()

Here is a basic analysis

*panic_thread::thread -p

ADDR PROC LWP CRED
ffffff1343da5b60 ffffff134c0d2040 ffffff1394eea780 ffffff139e93ca28

ffffff134c0d2040::ps

S PID PPID PGID SID UID FLAGS ADDR NAME
R 74207 74081 74207 74207 0 0x4a004000 ffffff134c0d2040 systemd

systemd is trying to get the mutex and panics
mutex is held by the following

> ffffff1328641480::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff1328641480 adapt ffffff133cab1420      -      -     yes

> ffffff133cab1420::thread -p
            ADDR             PROC              LWP             CRED
ffffff133cab1420 ffffff2a54efb028 ffffff138c27f340 ffffff134ca3cc80
> ffffff2a54efb028::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  47858  47852  47858  47858      0 0x4a014002 ffffff2a54efb028 dash

> ffffff133cab1420::findstack -v
stack pointer for thread ffffff133cab1420: ffffff005ef88aa0
[ ffffff005ef88aa0 _resume_from_idle+0x112() ]
  ffffff005ef88ad0 swtch+0x141()
  ffffff005ef88b70 turnstile_block+0x21a(0, 0, ffffff2a26677af0,
  fffffffffbc08aa0, 0, 0)
  ffffff005ef88be0 mutex_vector_enter+0x3a3(ffffff2a26677af0)
  ffffff005ef88c50 pollwakeup+0x1ce(ffffff1375fd51c8, 41)
  ffffff005ef88cb0 signalfd_pollwake_cb+0x9a(ffffff134c0d2040, 12)
  ffffff005ef88d10 sigtoproc+0x1b7(ffffff134c0d2040, 0, 12)
  ffffff005ef88d70 sigaddq+0xcf(ffffff134c0d2040, 0, ffffff005ef88d80, 1)
  ffffff005ef88df0 post_sigcld+0xc3(ffffff13383b1028, 0)
  ffffff005ef88e30 sigcld+0xba(ffffff13383b1028, 0)
  ffffff005ef88ec0 proc_exit+0xb55(1, 0)
  ffffff005ef88ee0 exit+0x15(1, 0)
  ffffff005ef88f00 rexit+0x18(0)
  ffffff005ef88f10 sys_syscall+0x1bd()

dash has the mutex systemd wants and is blocked on the following mutex

> ffffff2a26677af0::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff2a26677af0 adapt ffffff1343da5b60      -      -     yes
> ffffff1343da5b60::thread -p
            ADDR             PROC              LWP             CRED
ffffff1343da5b60 ffffff134c0d2040 ffffff1394eea780 ffffff139e93ca28
> ffffff134c0d2040::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  74207  74081  74207  74207      0 0x4a004000 ffffff134c0d2040 systemd
> ffffff1343da5b60::findstack -v
stack pointer for thread ffffff1343da5b60: ffffff005c2f91d0
  ffffff005c2f95f0 0xffffff005c2f9398()
  ffffff005c2f9960 0xffffff005c2f9960()
  ffffff005c2f99e0 signalfd_poll+0xd6(12500000001, 1, 0, ffffff005c2f9b4e,
  ffffff005c2f9b40)
  ffffff005c2f9a20 cdev_poll+0x38(12500000001, 1, 0, ffffff005c2f9b4e,
  ffffff005c2f9b40)
  ffffff005c2f9a60 spec_poll+0x70(ffffff139a8e3480, 1, 0, ffffff005c2f9b4e,
  ffffff005c2f9b40, 0)
  ffffff005c2f9ad0 fop_poll+0x3c(ffffff139a8e3480, 1, 0, ffffff005c2f9b4e,
  ffffff005c2f9b40, 0)
  ffffff005c2f9bb0 dp_pcache_poll+0x18a(ffffff134e0e33f8, ffffff2a2739a840,
  ffffff2a26677af0, 12, ffffff005c2f9c4c)
  ffffff005c2f9cb0 dpioctl+0x4a5(2c00000004, d001, ffffff005c2f9df0, 80200003,
  ffffff139e93ca28, ffffff005c2f9e1c)
  ffffff005c2f9cf0 cdev_ioctl+0x39(2c00000004, d001, ffffff005c2f9df0, 80200003
  , ffffff139e93ca28, ffffff005c2f9e1c)
  ffffff005c2f9d40 spec_ioctl+0x60(ffffff2a54e40c80, d001, ffffff005c2f9df0,
  80200003, ffffff139e93ca28, ffffff005c2f9e1c, 0)
  ffffff005c2f9dd0 fop_ioctl+0x55(ffffff2a54e40c80, d001, ffffff005c2f9df0,
  80200003, ffffff139e93ca28, ffffff005c2f9e1c, 0)
  ffffff005c2f9e70 lx_epoll_wait+0xa2(2, 7fffffeff060, 12, ffffffff)
  ffffff005c2f9ef0 lx_syscall_enter+0x16f()
  ffffff005c2f9f10 sys_syscall+0x16c()

History

#1

Updated by Patrick Mooney about 3 years ago

This is specifically a lock ordering problem between signalfd_poll and signalfd_pollwake_cb. The former acquires pollcache_t`pc_lock before proc_t`p_lock. The latter (via sigtoproc) reverses the order. Deferring the pollwake into a taskq means it can be performed without proc_t`p_lock held, avoiding the deadlock.

#2

Updated by Electric Monk about 3 years ago

  • Status changed from New to Closed

git commit abf99a006172ea5aab2246bda23f9d6d935bf1ad

commit  abf99a006172ea5aab2246bda23f9d6d935bf1ad
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2016-10-13T23:27:17.000Z

    7420 signalfd deadlock on pollwakeup
    7421 panic in signalfd
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Approved by: Gordon Ross <gwr@nexenta.com>

Also available in: Atom PDF