Project

General

Profile

Bug #8642

panic due to "deadman: timed out after 50 seconds of clock inactivity"

Added by Youzhong Yang over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2017-09-08
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

This issue has haunted us quite frequently since last month, and I was eventually able to reproduce it using a C program.

Here is how:

1. build the attached C program: gcc o /var/tmp/fstatat-test-1 fstatat-test-1.c
2. create a few thousand files under /var/log: for i in {1..5000}; do touch /var/log/logfile
$i; done
3. enable deadman watchdog: mdb -kwe 'deadman_enabled/W0t1'
4. run: /var/tmp/fstatat-test-1 -f /var/log -n 30

Depending on how many cores your host has, change the above 30 to be something close to your number of cores. For example, on our two sockets (Intel(r) Xeon(r) CPU E5-2690 0 2.90GHz) system, which has 32 cores with hyper-threading enabled, I was able to reproduce the crash with -n 30. While on our another two sockets (Intel(r) Xeon(r) CPU E5-2697A v4 2.60GHz) system, which has 64 logical cores with hyper-threading enabled, the crash was reproduced by -n 48.

Quite interestingly, once hyper-threading is disabled, no matter how hard I tried, I couldn't reproduce the issue.

Here is what the test program does:

- it takes a FOLDER and a NUMBER as input
- it forks 'NUMBER' of processes
- in each forked process, it scan the given FOLDER, for each entry it finds, do fstatat() call. Repeat this 1000 times.
- when all forked processes exit, repeat it again.

Here is a crash dump generated on a "E5-2690 0" system running SmartOS platform-20170831T155808Z:
https://drive.google.com/file/d/0B6o_PKnt911fNUl3S3B6Q19oT28/view?usp=sharing

and analysis of the crash dump:

> ::cpuinfo -v
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc43ec0  1b    1   10 109   no    no t-0    fffffe4227ee6c40 sched
                       |    |    |
            RUNNING <--+    |    +--> PIL THREAD
              READY         |          10 fffffe4227ee6c40
             EXISTS         |
             ENABLE         +-->  PRI THREAD           PROC
                                   15 fffffea3594060e0 fstatat-test-1

> fffffe4227ee6c40::findstack -v
stack pointer for thread fffffe4227ee6c40: fffffe4227ee6500
  fffffe4227ee6660 0xfffffea2e3c01378()
  fffffe4227ee66c0 dispdeq+0x177(fffffffffbc80da0)
  fffffe4227ee6730 disp_getbest+0x227(e)
  fffffe4227ee67c0 0xa()
  fffffe4227ee68e0 mutex_delay_default+7()
  fffffe4227ee6950 mutex_vector_enter+0xc5(fffffea31d37d500)
  fffffe4227ee69b0 clock_tick_process+0xce(fffffea327ceb040, 34b16, 1)
  fffffe4227ee6a30 clock_tick_execute_common+0x93(0, 1a, 20, 34b16, 1)
  fffffe4227ee6a70 clock_tick_schedule+0xa8(0)
  fffffe4227ee6af0 clock+0x2cb()
  fffffe4227ee6b90 cyclic_softint+0xfd(fffffffffbc398e0, 1)
  fffffe4227ee6ba0 cbe_softclock+0x17()
  fffffe4227ee6bf0 av_dispatch_softvect+0x78(a)
  fffffe4227ee6c20 apix_dispatch_softint+0x35(0, 0)
  fffffe42283b5b40 switch_sp_and_call+0x13()

> fffffea31d37d500::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
fffffea31d37d500 adapt fffffea35f301880      -      -      no

> fffffea35f301880::findstack -v
stack pointer for thread fffffea35f301880: fffffe4229773970
  fffffe4229773b20 mutex_delay_default+7()
  fffffe4229773b90 mutex_vector_enter+0xc5(fffffea358790740)
  fffffe4229773c30 lookuppnatcred+0x115(fffffe4229773c70, 0, 1, 0, fffffe4229773e68, fffffea358790740, fffffea472018648)
  fffffe4229773d40 lookupnameatcred+0xdd(fffffd7fffdff642, 0, 1, 0, fffffe4229773e68, fffffea358790740, fffffea472018648)
  fffffe4229773d90 lookupnameat+0x39(fffffd7fffdff642, 0, 1, 0, fffffe4229773e68, fffffea358790740)
  fffffe4229773e30 cstatat_getvp+0x10f(3, fffffd7fffdff642, 1, fffffe4229773e68, fffffe4229773e60)
  fffffe4229773ed0 cstatat+0x6f(3, fffffd7fffdff642, fffffd7fffdffa70, 0, 10)
  fffffe4229773f00 fstatat+0x42(3, fffffd7fffdff642, fffffd7fffdffa70, 0)
  fffffe4229773f10 sys_syscall+0x19f()

> fffffea358790740::vnode2path
/var/log

> fffffea358790740::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
fffffea358790740 adapt fffffea35f491b00      -      -     yes

> fffffea35f491b00::findstack -v
stack pointer for thread fffffea35f491b00: fffffe4229785700
  fffffe42297858f0 mutex_adaptive_tryenter+0x10()
  fffffe4229785920 vn_rele+0x24(fffffea358790740)
  fffffe4229785b90 lookuppnvp+0xcd3(fffffe4229785c70, 0, 1, 0, fffffe4229785e68, fffffea327222840, fffffea358790740, fffffea472018648)
  fffffe4229785c30 lookuppnatcred+0x176(fffffe4229785c70, 0, 1, 0, fffffe4229785e68, fffffea358790740, fffffea472018648)
  fffffe4229785d40 lookupnameatcred+0xdd(fffffd7fffdff642, 0, 1, 0, fffffe4229785e68, fffffea358790740, fffffea472018648)
  fffffe4229785d90 lookupnameat+0x39(fffffd7fffdff642, 0, 1, 0, fffffe4229785e68, fffffea358790740)
  fffffe4229785e30 cstatat_getvp+0x10f(3, fffffd7fffdff642, 1, fffffe4229785e68, fffffe4229785e60)
  fffffe4229785ed0 cstatat+0x6f(3, fffffd7fffdff642, fffffd7fffdffa70, 0, 10)
  fffffe4229785f00 fstatat+0x42(3, fffffd7fffdff642, fffffd7fffdffa70, 0)
  fffffe4229785f10 sys_syscall+0x19f()


Files

fstatat-test-1.c (3.48 KB) fstatat-test-1.c Youzhong Yang, 2017-09-08 05:45 PM
#3

Updated by Jerry Jelinek over 3 years ago

I have carefully gone through the dump and the analysis above looks correct. The thing that doesn't make sense is the information for thread fffffea35f491b00. If we look at the code we can see that we're trying to do a vn_rele on the vnode fffffea358790740. The only way we get to mutex_adaptive_tryenter is if the vnode's mutex was initially held by another thread when we called mutex_enter. We come out of the assembly code for mutex_enter and go into mutex_vector_enter. In that function we are looping and once we see the mutex is not held, we call mutex_adaptive_tryenter. That code is a simple block of assembly shown here

mutex_adaptive_tryenter:        movq   %gs:0x18,%rdx
mutex_adaptive_tryenter+9:      xorl   %eax,%eax
mutex_adaptive_tryenter+0xb:    lock cmpxchgq %rdx,(%rdi)
mutex_adaptive_tryenter+0x10:   jne    +0x6     <mutex_adaptive_tryenter+0x18>
mutex_adaptive_tryenter+0x12:   notl   %eax
mutex_adaptive_tryenter+0x14:   ret    
mutex_adaptive_tryenter+0x15:   nop    
mutex_adaptive_tryenter+0x16:   nop    
mutex_adaptive_tryenter+0x17:   nop    
mutex_adaptive_tryenter+0x18:   xorl   %eax,%eax
mutex_adaptive_tryenter+0x1a:   ret    

Since we can see that the vnode mutex (fffffea358790740) is currently held by this same thread (fffffea35f491b00) we know the cmpxchgq here has succeeded. What doesn't make sense is why we're stuck here. There is really nothing that should block this code at this spot until the deadman fires and nothing in the vn_rele code that should block at all, once we have the mutex.

Given that we never hit this problem when hyperthreading is disabled, I'd like to say that somehow this is a hardware problem, but I don't have any way to prove this or any errata to reference.

Also available in: Atom PDF