Project

General

Profile

Bug #5337

igb/ixgbe mishandle raw packets if cable problem

Added by Robert Mustacchi about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
2014-11-14
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

We discovered that we had systems using lldpd where lldpd was hanging.

The thread in the core is sitting in the kernel, in putmsg. In order to understand why it's not making progress, we'd need to debug this live.

On a lab system, it looks like a similar lldpd is at:

> 0t2426::pid2proc | ::walk thread | ::findstack
stack pointer for thread ffffff4327d0b080: ffffff01ed0ffae0
[ ffffff01ed0ffae0 _resume_from_idle+0xf4() ]
  ffffff01ed0ffb10 swtch+0x141()
  ffffff01ed0ffb80 cv_wait_sig+0x185()
  ffffff01ed0ffbb0 str_cv_wait+0x27()
  ffffff01ed0ffc60 strwaitq+0x2c3()
  ffffff01ed0ffd70 strputmsg+0x233()
  ffffff01ed0ffe60 msgio32+0x19e()
  ffffff01ed0ffeb0 putmsg32+0x6a()
  ffffff01ed0fff10 _sys_sysenter_post_swapgs+0x153()

Live investigation shows that this putmsg() is hanging more or less indefinitely. For this particular process, fd 10 is /dev/igb3:

2426:   /usr/sbin/lldpd
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:544,0 ino:19922952 uid:0 gid:3 rdev:38,2
      O_RDWR
      /devices/pseudo/mm@0:null
      offset:27523984
   1: S_IFCHR mode:0666 dev:544,0 ino:19922952 uid:0 gid:3 rdev:38,2
      O_RDWR
      /devices/pseudo/mm@0:null
      offset:27523984
   2: S_IFCHR mode:0666 dev:544,0 ino:19922952 uid:0 gid:3 rdev:38,2
      O_RDWR
      /devices/pseudo/mm@0:null
      offset:27523984
   3: S_IFSOCK mode:0666 dev:553,0 ino:39858 uid:0 gid:0 rdev:0,0
      O_RDWR
        SOCK_STREAM
        SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX /var/run/lldpd.sock
   6: S_IFCHR mode:0000 dev:545,0 ino:11953 uid:0 gid:0 rdev:185,1011
      O_RDWR
      /dev/net/igb0
      offset:0
   7: S_IFCHR mode:0000 dev:545,0 ino:25230 uid:0 gid:0 rdev:185,1012
      O_RDWR
      /dev/net/igb1
      offset:0
   8: S_IFCHR mode:0000 dev:545,0 ino:13433 uid:0 gid:0 rdev:185,1013
      O_RDWR
      /dev/net/igb2
      offset:0
   9: S_IFCHR mode:0000 dev:545,0 ino:13432 uid:0 gid:0 rdev:192,1014
      O_RDWR
      /dev/net/ixgbe0
      offset:0
  10: S_IFCHR mode:0000 dev:545,0 ino:13431 uid:0 gid:0 rdev:185,1015
      O_RDWR
      /dev/net/igb3
      offset:0
  11: S_IFCHR mode:0000 dev:545,0 ino:13430 uid:0 gid:0 rdev:192,1016
      O_RDWR
      /dev/net/ixgbe1
      offset:0

On this host at least, igb3 is down. It's possible that something has changed that causes us to hang in dlpi_send() when the datalink we're trying to send on is down.

Whatever the cause, this gets even worse; if we try to kill the process, we get here:

[root@headnode (sf-3) ~]# pstack 2426
2426:   /usr/sbin/lldpd
 fee49e15 close    (a)
 fef635dc dlpi_close (80743c8, 0, 0, 0) + 47
 0805a1d0 socketCleanupLLDP (8074318, 805ae1e, 806ff80, 5) + 39
 08054628 cleanupLLDP (8074318, 0, 804790c, feeb9000, fed02a40, feeb9000) + e2
 fee45d85 __sighndlr (f, 0, 804790c, 8054546, 0, 0) + 15
 fee3a7c3 call_user_handler (f, 0, 804790c, 0, 0, 0) + 292
 fee3a858 sigacthandler (f, 0, 804790c) + 77
 --- called from signal handler with signal 15 (SIGTERM) ---
 fee4a495 __putmsg (a, 0, 8047b58, 0) + 15
 fef62023 i_dlpi_strputmsg (80743c8, 0, 8074a68, 52, 0, fedd474e) + 57
 fef62b25 dlpi_send (80743c8, 8047c06, 6, 8074a68, 52, 0) + 45
 08059eb5 lldp_write (8074318, 805c03b, 806ff80, 52) + 48
 08057ec6 txFrame  (8074318, 3, 0, 1) + 2a
 080585f2 tx_do_tx_info_frame (8074318, 805c054, 806ff80, 805c09e) + 1f
 0805841b txStatemachineRun (8074318, 0, 80, 806fee0) + a0
 08053e15 main     (8047dec, feec3668, 8047e24, 80535af, 1, 8047e30) + 5da
 080535af _start   (1, 8047ee0, 0, 8047ef0, 8047f05, 8047f16) + 83
[root@headnode (sf-3) ~]# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd lofs idm mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
> 0t2426::pid2proc | ::walk thread | ::findstack
stack pointer for thread ffffff4327d0b080: ffffff01ed0ffab0
[ ffffff01ed0ffab0 _resume_from_idle+0xf4() ]
  ffffff01ed0ffae0 swtch+0x141()
  ffffff01ed0ffb80 cv_timedwait_sig_hires+0x39d()
  ffffff01ed0ffbb0 cv_reltimedwait_sig+0x4f()
  ffffff01ed0ffbe0 str_cv_wait+0x94()
  ffffff01ed0ffc80 strclose+0x25e()
  ffffff01ed0ffcd0 device_close+0xb9()
  ffffff01ed0ffd60 spec_close+0x17b()
  ffffff01ed0ffde0 fop_close+0x61()
  ffffff01ed0ffe20 closef+0x5e()
  ffffff01ed0ffe90 closeandsetf+0x398()
  ffffff01ed0ffeb0 close+0x13()
  ffffff01ed0fff10 _sys_sysenter_post_swapgs+0x153()

It does, however, eventually seem to give up and the process goes away. Also, when I start lldpd from scratch, it seems to succeed at putmsg() even on file descriptors referring to datalinks that are down:

71514:  putmsg(10, 0x00000000, 0x080478E8, 0)           = 0
sysinfo.machine: i86pc
sysinfo.sysname: SunOS
sysinfo.release: 5.11
lldp_systemdesc: i86pc/SunOS 5.11
lldp_systemname: headnode
71514:  forkx(0)                                        = 71517
71517:  forkx()         (returning as child ...)        = 71514
71517:  forkx(0)                                        = 71519
71519:  forkx()         (returning as child ...)        = 71517
71519:  open("/dev/null", O_RDWR)                       = 4
71519:  putmsg(10, 0x00000000, 0x08047A18, 0)           = 0
71519:  putmsg(9, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(8, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(7, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(6, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(10, 0x00000000, 0x08047A18, 0)           = 0
71519:  putmsg(9, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(8, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(7, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(6, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(10, 0x00000000, 0x08047A18, 0)           = 0
71519:  putmsg(9, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(8, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(7, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(6, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(10, 0x00000000, 0x08047A18, 0)           = 0
71519:  putmsg(9, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(8, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(7, 0x00000000, 0x08047A18, 0)            = 0
71519:  putmsg(6, 0x00000000, 0x08047A18, 0)            = 0

So it's not hanging at all here; instead, it's at:

[root@headnode (sf-3) ~]# pstack 71519
71519:  /usr/sbin/lldpd
 fee4a3c5 pollsys  (8047a20, 5, 8047b18, 0)
 fede48cf pselect  (b, 8047be0, feec2320, feec2320, 8047b18, 0) + 1bf
 fede4bd8 select   (b, 8047be0, 0, 0, 8047c70, 0) + 8e
 08053cbd main     (8047cac, feec3668, 8047ce4, 80535af, 1, 8047cf0) + 482
 080535af _start   (1, 8047dc0, 0, 8047dd0, 8047e14, 8047e28) + 83

That's what we would expect to see, and lldpneighbors works as well. So it doesn't seem like the problem is related to datalinks being down, either. New hypothesis: the problem is related to the environment in which lldpd starts up – perhaps some of the fds related to its datalinks are bad because of datalink creation/destruction during startup (but after lldpd started)?

--

This does not appear to be a regression, or not a recent one anyway. The problem is present as far back as 2013. What it does appear to be is a problem with igb (and possibly ixgbe as well, although a different mechanism may be saving us there) wherein igb_tx_ring_send() does not check the link state prior to calling igb_tx(). The latter then induces flow control, which in the single-ring configuration, MAC will proceed to hang waiting for an available slot in the ring. The ring has enough slots, and the transmit rate infrequent enough, that it takes some hours for the ring to fill (the datalink is down, so there is no one else sending). ixgbe has 8 tx rings, and the packet is being dropped by the MAC layer at

 11  35643               freemsgchain:entry
              mac`mac_pkt_drop+0x40
              mac`mac_tx_sring_enqueue+0x16e
              mac`mac_tx_soft_ring_process+0x9b
              mac`mac_tx_fanout_mode+0x14a
              mac`mac_tx+0xda
              dld`str_mdata_raw_put+0x111
              dld`dld_wput+0x1eb
              unix`putnext+0x225
              genunix`strput+0x29d
              genunix`strputmsg+0x2c6
              genunix`msgio32+0x19e
              genunix`putmsg32+0x6a
              unix`_sys_sysenter_post_swapgs+0x149

which prevents the hang. In the case of igb, nothing drops the packet and we wait forever for a ring slot. If the datalink were to come up, the ring would be emptied and we would resume making progress, but this constraint seems unreasonable; the expectation is that if the datalink is down, the packet is silently dropped in the same manner it would be for any other kind of network partition.

A survey of other drivers: e1000g, bge, and nxge check link state. ixgbe does not, but see above. Presumably bnx does, or provides multiple rings and thus ends up in the same place as ixgbe; however, bnx is closed-source and obsolete so it's more difficult to assess how we avoid the problem on those links.

We have determined that ixgbe is using the SRS_TX_FANOUT mechanism instead of the default, because it uses multiple rings by default; igb does not. This mechanism triggers the alternate behaviour in mac that causes the packets to be dropped and flow control not to be asserted. In the case of igb, flow control is asserted and we hang.

The fix is, at minimum, to make igb drop packets on tx if the link state is down (or, possibly, is not up). This is likely advisable with ixgbe as well, which though not affected in its default configuration, could conceivably be told to use a single ring, and in any case is surely relying on undocumented behavioural differences in MAC.

#1

Updated by Electric Monk about 6 years ago

  • Status changed from New to Closed

git commit b607c8a3bdb27d4fde6e3fc4bb6617a1d91bdca7

commit  b607c8a3bdb27d4fde6e3fc4bb6617a1d91bdca7
Author: Keith M Wesolowski <wesolows@foobazco.org>
Date:   2014-11-19T21:54:55.000Z

    5337 igb/ixgbe mishandle raw packets if cable problem
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Dan McDonald <danmcd@nexenta.com>
    Reviewed by: Jason King <jason.brian.king@gmail.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF