Project

General

Profile

Bug #3975

ddi_periodic_add(9F) is entirely rubbish

Added by Robert Mustacchi over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
kernel
Start date:
2013-08-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

The ixgbe driver registers a link check function to run once per second using the ddi_periodic_add(9F) subsystem. Unfortunately, this subsystem is complete tripe. The particularly unlucky combination of NIC parts and the type of transceiver-less direct-attach cable in use at the customer site appears to result in no initial link status change interrupt firing – when combined with the fact that ddi_periodic_add fails to execute a callback registered to run at DDI_IPL_0 even once, the link sits in limbo until prodded at the switch.

Specifically, that combination seems to be:

> ::prtconf ! grep -i ixgbe
            ffffff42e1a65d58 pciex8086,10fb, instance #0 (driver name: ixgbe)
            ffffff42e1a65ab0 pciex8086,10fb, instance #1 (driver name: ixgbe)
            ffffff42e1a85d48 pciex8086,10fb, instance #2 (driver name: ixgbe)
            ffffff42e1a85aa0 pciex8086,10fb, instance #3 (driver name: ixgbe)
...
    hw.phy = {
...
        type = 9 (ixgbe_phy_sfp_passive_unknown)
        id = 0x3
        sfp_type = 3 (ixgbe_sfp_type_da_cu_core0)
...
i.e.

IXGBE_SFF_DA_PASSIVE_CABLE and SFP_DA_CU_CORE0, an 82599-specific SFP "type"
that really means "direct-attach copper cable"

The transition from booted, and in limbo to the first shutdown on the switch yields the following change in the ixgbe_t for the NIC:

--- ixgbe_0_broken.txt    2013-07-12 15:24:38.000000000 -0700
+++ ixgbe_1_shut.txt    2013-07-12 15:24:58.000000000 -0700
@@ -208,7 +208,7 @@
     eimc = 0
     eicr = 0
     ixgbe_state = 0x3
-    link_state = -0t1 (LINK_STATE_UNKNOWN)
+    link_state = 0 (LINK_STATE_DOWN)
     link_speed = 0
     link_duplex = 0
     reset_count = 0
@@ -559,7 +559,7 @@
     }
     watchdog_enable = 0x1 (B_TRUE)
     watchdog_start = 0x1 (B_TRUE)
-    watchdog_tid = 0x17fe967c9
+    watchdog_tid = 0x17f9aa783
     unicst_init = 0x1 (B_TRUE)
     unicst_avail = 0x7f
     unicst_total = 0x80
@@ -1017,7 +1017,7 @@
         },
     ... ]
     sys_page_size = 0x1000
-    link_check_complete = 0 (0)
+    link_check_complete = 0x1 (B_TRUE)
     link_check_hrtime = 0x5614383bd148
     periodic_id = 2
     ixgbe_ks = 0xffffff42eccdd000

At this point, we go from shutdown to no shutdown, which starts packets flowing, and we see this change:

--- ixgbe_1_shut.txt    2013-07-12 15:24:58.000000000 -0700
+++ ixgbe_2_noshut.txt    2013-07-12 15:25:14.000000000 -0700
@@ -208,9 +208,9 @@
     eimc = 0
     eicr = 0
     ixgbe_state = 0x3
-    link_state = 0 (LINK_STATE_DOWN)
-    link_speed = 0
-    link_duplex = 0
+    link_state = 1 (LINK_STATE_UP)
+    link_speed = 0x2710
+    link_duplex = 0x2
     reset_count = 0
     attach_progress = 0x1faff
     loopback_mode = 0
@@ -559,7 +559,7 @@
     }
     watchdog_enable = 0x1 (B_TRUE)
     watchdog_start = 0x1 (B_TRUE)
-    watchdog_tid = 0x17f9aa783
+    watchdog_tid = 0x17f98b203
     unicst_init = 0x1 (B_TRUE)
     unicst_avail = 0x7f
     unicst_total = 0x80

The ixgbe_driver_link_check function definitely fires in response to both of these events:

  3  52891    ixgbe_driver_link_check:entry
              ixgbe`ixgbe_intr_msix+0x2f2
              unix`av_dispatch_autovect+0x7c
              unix`dispatch_hardint+0x33
              unix`switch_sp_and_call+0x13

You can use the following DTrace invocation to see that the periodic link check is not firing:

dtrace -qn '
  fbt::ixgbe_link_timer:entry {
    @[arg0] = count();
  }
  tick-1s {
    printf("%Y\\n%8s %s\\n", walltimestamp, "COUNT", "IXGBE_T");
    printa("%@8u %p\\n", @);
    trunc(@);
  }'

Basically the IPL_0 ddi periodics don't work at all.


Related issues

Related to illumos gate - Bug #4123: Locks should not be held across the call to ddi_periodic_delete(9f)Resolved2013-09-10

Actions

History

#1

Updated by Robert Mustacchi over 6 years ago

  • Status changed from New to Resolved

Resolved in a288e5a9793fdffe5e842d7e61ab45263e75eaca.

Also available in: Atom PDF