Bug #9836
openEHCI appears to be causing a huge interrupt storm on my ivybridge system
0%
Description
I noticed one particular core was 40% on kernel, 60% idle. Using mpstat revealed that it's getting 100's of thousands of interrupts per second:
adam@sqlbox:/home/adam$ mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 63 0 14 195920 195677 255 3 14 16 0 290 1 41 0 59 1 80 0 5 145 19 200 3 16 22 0 314 0 0 0 99 2 82 0 54 164 67 193 3 15 22 0 299 0 0 0 99 3 64 0 3 139 36 172 3 15 23 0 219 0 0 0 99 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 4 0 12 198278 198066 132 0 6 3 0 284 0 40 0 60 1 0 0 13 146 34 138 0 9 1 0 20 0 0 0 100 2 0 0 15 71 8 52 0 3 1 0 41 0 0 0 100 3 0 0 0 61 9 44 0 2 0 0 6 0 0 0 100 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 12 198304 198089 134 1 5 5 0 283 0 40 0 60 1 0 0 15 147 26 118 0 2 2 0 21 0 0 0 100 2 0 0 3 94 12 69 0 3 1 0 53 0 0 0 100 3 0 0 18 53 8 44 1 3 0 0 6 0 1 0 99 ^C
Narrowed this down with a modified dtrace script to show aggregations of interrupts based on device name:
adam@sqlbox:/home/adam$ cat intby.d #!/usr/sbin/dtrace -s /* * inttimes.d - print interrupt on-cpu time. * Written using DTrace (Solaris 10 3/05) * * $Id: inttimes.d 3 2007-08-01 10:50:08Z brendan $ * * USAGE: inttimes.d # wait several seconds, then hit Ctrl-C * * FIELDS: * DEVICE instance name of device driver * TIME (ns) sum of time spent servicing interrupt (ns) * * BASED ON: /usr/demo/dtrace/intr.d * * SEE ALSO: * DTrace Guide "sdt Provider" chapter (docs.sun.com) * intrstat(1M) * * PORTIONS: Copyright (c) 2005 Brendan Gregg. * * * 28-Jun-2005 Brendan Gregg Created this. * 20-Apr-2006 " " Last update. */ #pragma D option quiet dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } sdt:::interrupt-complete /arg0 != 0/ { this->devi = (struct dev_info *)arg0; /* this checks the pointer is valid, */ self->name = this->devi != 0 ? stringof(`devnamesp[this->devi->devi_major].dn_name) : "?"; this->inst = this->devi != 0 ? this->devi->devi_instance : 0; @num[self->name, this->inst] = count(); self->name = 0; } dtrace:::END { printf("%11s %16s\n", "DEVICE", "Count"); printa("%10s%-3d %@16d\n", @num); }
The out revealed ehci is the culprit:
adam@sqlbox:/home/adam$ sudo ./intby.d Password: Tracing... Hit Ctrl-C to end. ^C DEVICE Count ehci1 2 e1000g1 9 mpt_sas0 110 ehci0 292199
Please let me know if there's anything I should try in terms of bios settings or system configuration.
Updated by Adam Stylinski almost 4 years ago
I should also add uname output:
adam@sqlbox:/home/adam$ uname -a
SunOS sqlbox 5.11 illumos-2eeda98682 i86pc i386 i86pc
Updated by Adam Stylinski almost 4 years ago
As it turns out, one boot environment prior does not suffer from this issue:
adam@sqlbox:/home/adam$ sudo intrstat Password: device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 3 0.0 0 0.0 ehci#0 | 0 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 0 0.0 0 0.0 0 0.0 device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 3 0.0 0 0.0 ehci#0 | 1 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 1 0.0 0 0.0 0 0.0 device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 1 0.0 0 0.0 ehci#0 | 0 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 0 0.0 0 0.0 0 0.0 device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 2 0.0 0 0.0 ehci#0 | 1 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 1 0.0 0 0.0 0 0.0 device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 1 0.0 0 0.0 ehci#0 | 0 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 0 0.0 0 0.0 0 0.0 mpt_sas#0 | 0 0.0 0 0.0 0 0.0 132 0.1 device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 3 0.0 0 0.0 ehci#0 | 1 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 1 0.0 0 0.0 0 0.0 mpt_sas#0 | 0 0.0 0 0.0 0 0.0 0 0.0 device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim -------------+------------------------------------------------------------ e1000g#1 | 0 0.0 0 0.0 2 0.0 0 0.0 ehci#0 | 1 0.0 0 0.0 0 0.0 0 0.0 ehci#1 | 0 0.0 1 0.0 0 0.0 0 0.0 mpt_sas#0 | 0 0.0 0 0.0 0 0.0 0 0.0 adam@sqlbox:/home/adam$ uname -a SunOS sqlbox 5.11 illumos-6640c13bea i86pc i386 i86pc
Updated by Adam Stylinski almost 4 years ago
Looking between those two revisions I only see two that could possibly do something like this: one that modifies an order for an atomic look during early CPU initialization, and the other more likely: a microcode update.
Updated by Adam Stylinski almost 4 years ago
Just did another pkg image-update, the interrupt storm appears to be gone in illumos-e6ae04828f. Not sure if that helps any.