Project

General

Profile

Bug #9760

mp_startup_common races itself

Added by Jason King about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2018-08-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

History

#1

Updated by Jason King about 2 years ago

Upstream of Joyent OS-7079.

I've been experiencing frequent early boot hangs on my own server (Supermicro 5028-TN4). After getting kmdb working correctly w/ SOL, I've been able to drop to kmdb (via an NMI) when this happens. Every time this happens, a similar pattern is present:

[1]> ::cpustack -c 6
xc_serv+0x109(0, 0)
apix`apix_dispatch_by_vector+0x8c(f1)
apix`apix_dispatch_hilevel+0x15(f1, 0)
switch_sp_and_call+0x13()
apix`apix_do_interrupt+0x119(fffffe5929de0ac0, 3cf41ad)
_interrupt+0xba()
default_lock_delay+0x54(4)
mutex_vector_enter+0xc5(fffffffffbc8e428)
cpudrv`cpudrv_get_cpu+0x59(fffffe5929b34000)
cpudrv`cpudrv_get_speeds+0x23(fffffe5929b34000, fffffe5929de0cc8)
cpudrv`cpudrv_init+0x2c(fffffe5929b34000)
cpudrv`cpudrv_attach+0x161(fffffe5926373d58, 0)
devi_attach+0x92(fffffe5926373d58, 0)
attach_node+0xa7(fffffe5926373d58)
i_ndi_config_node+0x7d(fffffe5926373d58, 6, 0)
i_ddi_attachchild+0x48(fffffe5926373d58)
devi_attach_node+0x5e(fffffe5926373d58, 102000a)
ndi_devi_online+0x9a(fffffe5926373d58, a)
mach_cpu_create_device_node+0x9e(fffffe5929b45000, 0)
mp_startup_common+0x279(1)
mp_startup_boot+0xe()
0xfffffffffb8571de()
[1]> cpu_lock::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
fffffffffbc8e428 adapt fffffe007a6cac40      -      -      no
[1]> ::cpustack -c 7
xc_serv+0x109(0, 0)
apix`apix_dispatch_by_vector+0x8c(f1)
apix`apix_dispatch_hilevel+0x15(f1, 0)
switch_sp_and_call+0x13()
apix`apix_do_interrupt+0x119(fffffe5929e6de20, a)
_interrupt+0xba()
sema_held+0x11(fffffffffbc2d3b0)
pause_cpus+0xa4(0, 0)
cpu_add_active+0x25(fffffe5929e0a000)
mp_startup_common+0x41f(1)
mp_startup_boot+0xe()
0xfffffffffb8571de()

The CPU numbers can vary, but always CPU N-1 is waiting on the cpu_lock mutex in cpudrv_get_cpu while CPU N is in pause_cpus when the hang occurs. cpu_add_active is called while cpu_lock is held (from mp_startup_common):

mutex_enter(&cpu_lock);
cp->cpu_flags &= ~CPU_OFFLINE;
cpu_enable_intr(cp);
cpu_add_active(cp);
mutex_exit(&cpu_lock);

So CPU 8 holds cpu_lock while CPU 7 is blocked waiting to acquire cpu_lock. However, this will never succeed because pause_cpus is waiting for all online CPUS (which at this point are CPUS 0..N-1 to pause before proceeding (and then eventually releasing cpu_lock), thus the hang.
The following comments from mp_startup_common are suggestive:
/*
 * Setting the bit in cpu_ready_set must be the last operation in
 * processor initialization; the boot CPU will continue to boot once
 * it sees this bit set for all active CPUs.
 */
CPUSET_ATOMIC_ADD(cpu_ready_set, cp->cpu_id);

(void) mach_cpu_create_device_node(cp, NULL);

Note that CPU-1 in these instances is blocked in a call stack that includes mach_cpu_create_device_node. Looking further back, start_other_cpus executes the following loop:
for (who = 0; who < NCPU; who++) {
      if (!CPU_IN_SET(mp_cpus, who))
          continue;
      ASSERT(who != bootcpuid);

      mutex_enter(&cpu_lock);
      if (start_cpu(who) != 0)
          CPUSET_DEL(mp_cpus, who);
        cpu_state_change_notify(who, CPU_SETUP);
      mutex_exit(&cpu_lock);
}

and start_cpu does the following relevant pieces:
mutex_exit(&cpu_lock);
tempset = cpu_ready_set;
while (!CPU_IN_SET(tempset, who)) {
      drv_usecwait(1);
      tempset = *((volatile cpuset_t *)&cpu_ready_set);
}
mutex_enter(&cpu_lock);

Which now explains the problem – before CPU N-1 has completed startup, mp_startup_common is setting its respective bit in cpu_ready_set prompting start_other_cpus to begin starting the next cpu, and if CPU N takes cpu_lock before CPU N-1 can take it (when it is executing cpudrv_get_cpu), CPU N will never release it as it is waiting for CPU N-1 to pause, which it will never do because it's blocked on cpu_lock held by CPU N.
The fix seems to be to follow the suggestion of the comment in mp_startup_common and move CPUSET_ATOMIC_ADD(cpu_ready_set, cp->cpu_id); after the mach_cpu_create_device_node call.

#2

Updated by Electric Monk about 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit af1da5d88749a7acca42afebc5b3ca9ec806c095

commit  af1da5d88749a7acca42afebc5b3ca9ec806c095
Author: Jason King <jason.king@joyent.com>
Date:   2018-08-22T14:59:46.000Z

    9760 mp_startup_common races itself
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: Andy Fiddaman <omnios@citrus-it.co.uk>
    Reviewed by: Yuri Pankov <yuripv@yuripv.net>
    Reviewed by: Toomas Some <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF