Bug #3975
closedddi_periodic_add(9F) is entirely rubbish
100%
Description
The ixgbe driver registers a link check function to run once per second using the ddi_periodic_add(9F) subsystem. Unfortunately, this subsystem is complete tripe. The particularly unlucky combination of NIC parts and the type of transceiver-less direct-attach cable in use at the customer site appears to result in no initial link status change interrupt firing – when combined with the fact that ddi_periodic_add fails to execute a callback registered to run at DDI_IPL_0 even once, the link sits in limbo until prodded at the switch.
Specifically, that combination seems to be:
> ::prtconf ! grep -i ixgbe ffffff42e1a65d58 pciex8086,10fb, instance #0 (driver name: ixgbe) ffffff42e1a65ab0 pciex8086,10fb, instance #1 (driver name: ixgbe) ffffff42e1a85d48 pciex8086,10fb, instance #2 (driver name: ixgbe) ffffff42e1a85aa0 pciex8086,10fb, instance #3 (driver name: ixgbe) ... hw.phy = { ... type = 9 (ixgbe_phy_sfp_passive_unknown) id = 0x3 sfp_type = 3 (ixgbe_sfp_type_da_cu_core0) ... i.e.
IXGBE_SFF_DA_PASSIVE_CABLE and SFP_DA_CU_CORE0, an 82599-specific SFP "type"
that really means "direct-attach copper cable"
The transition from booted, and in limbo to the first shutdown on the switch yields the following change in the ixgbe_t for the NIC:
--- ixgbe_0_broken.txt 2013-07-12 15:24:38.000000000 -0700 +++ ixgbe_1_shut.txt 2013-07-12 15:24:58.000000000 -0700 @@ -208,7 +208,7 @@ eimc = 0 eicr = 0 ixgbe_state = 0x3 - link_state = -0t1 (LINK_STATE_UNKNOWN) + link_state = 0 (LINK_STATE_DOWN) link_speed = 0 link_duplex = 0 reset_count = 0 @@ -559,7 +559,7 @@ } watchdog_enable = 0x1 (B_TRUE) watchdog_start = 0x1 (B_TRUE) - watchdog_tid = 0x17fe967c9 + watchdog_tid = 0x17f9aa783 unicst_init = 0x1 (B_TRUE) unicst_avail = 0x7f unicst_total = 0x80 @@ -1017,7 +1017,7 @@ }, ... ] sys_page_size = 0x1000 - link_check_complete = 0 (0) + link_check_complete = 0x1 (B_TRUE) link_check_hrtime = 0x5614383bd148 periodic_id = 2 ixgbe_ks = 0xffffff42eccdd000
At this point, we go from shutdown to no shutdown, which starts packets flowing, and we see this change:
--- ixgbe_1_shut.txt 2013-07-12 15:24:58.000000000 -0700 +++ ixgbe_2_noshut.txt 2013-07-12 15:25:14.000000000 -0700 @@ -208,9 +208,9 @@ eimc = 0 eicr = 0 ixgbe_state = 0x3 - link_state = 0 (LINK_STATE_DOWN) - link_speed = 0 - link_duplex = 0 + link_state = 1 (LINK_STATE_UP) + link_speed = 0x2710 + link_duplex = 0x2 reset_count = 0 attach_progress = 0x1faff loopback_mode = 0 @@ -559,7 +559,7 @@ } watchdog_enable = 0x1 (B_TRUE) watchdog_start = 0x1 (B_TRUE) - watchdog_tid = 0x17f9aa783 + watchdog_tid = 0x17f98b203 unicst_init = 0x1 (B_TRUE) unicst_avail = 0x7f unicst_total = 0x80
The ixgbe_driver_link_check function definitely fires in response to both of these events:
3 52891 ixgbe_driver_link_check:entry ixgbe`ixgbe_intr_msix+0x2f2 unix`av_dispatch_autovect+0x7c unix`dispatch_hardint+0x33 unix`switch_sp_and_call+0x13
You can use the following DTrace invocation to see that the periodic link check is not firing:
dtrace -qn ' fbt::ixgbe_link_timer:entry { @[arg0] = count(); } tick-1s { printf("%Y\\n%8s %s\\n", walltimestamp, "COUNT", "IXGBE_T"); printa("%@8u %p\\n", @); trunc(@); }'
Basically the IPL_0 ddi periodics don't work at all.
Related issues
Updated by Robert Mustacchi about 10 years ago
- Status changed from New to Resolved
Resolved in a288e5a9793fdffe5e842d7e61ab45263e75eaca.