Project

General

Profile

Bug #12434

multiple vnics sharing the same VLAN ID don't receive interface down events

Added by Chip Schweiss 5 months ago. Updated 4 months ago.

Status:
Feedback
Priority:
Low
Assignee:
Category:
networking
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

When an interface is down IPMP will not always go to a degraded state, leaving the IPMP interface in a broken state:

# ipmpstat -gn
GROUP       GROUPNAME   STATE     FDT       INTERFACES
vlan2342i0  vlan2342i0  ok        --        vlan2342v0 vlan2342v1
vlan2307i0  vlan2307i0  degraded  --        vlan2307v1 [vlan2307v0]
vlan326i0   vlan326i0   degraded  --        vlan326v1 [vlan326v0]
vlan2308i0  vlan2308i0  degraded  --        vlan2308v1 [vlan2308v0]

# dladm show-vlan
LINK            VID      OVER         FLAGS
vlan326v0       326      ixgbe0       -----
vlan326v1       326      ixgbe1       -----
vlan2308v0      2308     ixgbe0       -----
vlan2308v1      2308     ixgbe1       -----
vlan2307v1      2307     ixgbe1       -----
vlan2307v0      2307     ixgbe0       -----
vlan2342v1      2342     ixgbe1       -----
vlan2342v0      2342     ixgbe0       -----

# dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
igb0         Ethernet             up         1000   full      igb0
igb1         Ethernet             unknown    0      half      igb1
igb6         Ethernet             unknown    0      half      igb6
igb4         Ethernet             unknown    0      half      igb4
igb2         Ethernet             unknown    0      half      igb2
igb3         Ethernet             unknown    0      half      igb3
ixgbe0       Ethernet             down       0      unknown   ixgbe0
igb7         Ethernet             unknown    0      half      igb7
igb5         Ethernet             unknown    0      half      igb5
ixgbe1       Ethernet             up         10000  full      ixgbe1

All IPMP interfaces should be degraded because ixgbe0 is down. However, vlan2342i0 is in an 'ok' state and will not function.

I'm not sure when this problem started. I haven't had a down network connection in several years. This is on OmniOS r151030ap.

History

#1

Updated by Chip Schweiss 5 months ago

It looks like this bug is in the VLAN layer, not IPMP:

# dladm
LINK        CLASS     MTU    STATE    BRIDGE     OVER
igb0        phys      1500   up       --         --
igb1        phys      1500   unknown  --         --
igb6        phys      1500   unknown  --         --
igb4        phys      1500   unknown  --         --
igb2        phys      1500   unknown  --         --
igb3        phys      1500   unknown  --         --
ixgbe0      phys      1500   down     --         --
igb7        phys      1500   unknown  --         --
igb5        phys      1500   unknown  --         --
ixgbe1      phys      1500   up       --         --
vlan326v0   vlan      1500   down     --         ixgbe0
vlan326v1   vlan      1500   up       --         ixgbe1
vlan2308v0  vlan      1500   down     --         ixgbe0
vlan2308v1  vlan      1500   up       --         ixgbe1
vlan2342vnic0 vnic    1500   up       --         ixgbe0
vlan2342vnic1 vnic    1500   up       --         ixgbe1
vlan2307v1  vlan      1500   up       --         ixgbe1
vlan2307v0  vlan      1500   down     --         ixgbe0
vlan2342v1  vlan      1500   up       --         ixgbe1
vlan2342v0  vlan      1500   up       --         ixgbe0
#2

Updated by Dan McDonald 5 months ago

This is weird.

There are two ways to create a "VLAN Nic". One is via `dladm create-vlan -v vid -l physN linkN`, the other is via `dladm create-vnic -v vid -l physN linkN`. The one that isn't failing properly for you was created by `create-vnic`. So the question is: What's different about that such that it doesn't propagate the failure?

IF you have your machine in this state still, a `savecore -L` should have the state one would need to dive into the internal structures and see what's going on, and it won't force a reboot. Also, for people reproducing this bug (including me some time this week), the experiment worth trying is:

1.) Have two physical same-link NICs on your test machine.
2.) Create one vlan over each using `dladm create-vlan...`
3.) Create one VNIC with a vlan id over each using `dladm create-vnic`
4.) Have an off-link peer where you can send packets over each vlan.
5.) Start moving bits.
6.) Unplug (virtually or otherwise) one of the nics.
7.) See if you end up in the state Chip describes.

#3

Updated by Dan McDonald 5 months ago

ALSO NOTE: This is on OmniOSce r151030ap. It may be necessary to reproduce this on that release or older alongside a current one, just in case it's a bug that's been fixed (e.g. #12355 maybe?).

#4

Updated by Dan McDonald 5 months ago

I configured a VMware VM using e1000g nics. I cannot, on 030ap, reproduce this. I'm not sure if it's e1000g vs. ixgbe, or dumb luck, or some other problem. I did mention 12355 earlier, but DTrace suggests THAT isn't tickled during NIC removal/restart.

# dladm show-link ; ipmpstat -g
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
e1000g1     phys      1500   up       --         --
vh84l0      vlan      1500   up       --         e1000g0
vh84l1      vlan      1500   up       --         e1000g1
vh86l0      vnic      1500   up       --         e1000g0
vh86l1      vnic      1500   up       --         e1000g1
GROUP       GROUPNAME   STATE     FDT       INTERFACES
sammy0      sammy0      ok        --        vh86l1 vh86l0
dave0       dave0       ok        --        vh84l1 vh84l0
<XXX REMOVE e1000g0, delay because this is over ssh over same link>
# sleep 10 ; dladm show-link ; ipmpstat -g
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   down     --         --
e1000g1     phys      1500   up       --         --
vh84l0      vlan      1500   down     --         e1000g0
vh84l1      vlan      1500   up       --         e1000g1
vh86l0      vnic      1500   down     --         e1000g0
vh86l1      vnic      1500   up       --         e1000g1
GROUP       GROUPNAME   STATE     FDT       INTERFACES
sammy0      sammy0      degraded  --        vh86l1 [vh86l0]
dave0       dave0       degraded  --        vh84l1 [vh84l0]
<XXX RE-ADD e1000g0>
# dladm show-link ; ipmpstat -g
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
e1000g1     phys      1500   up       --         --
vh84l0      vlan      1500   up       --         e1000g0
vh84l1      vlan      1500   up       --         e1000g1
vh86l0      vnic      1500   up       --         e1000g0
vh86l1      vnic      1500   up       --         e1000g1
GROUP       GROUPNAME   STATE     FDT       INTERFACES
sammy0      sammy0      ok        --        vh86l1 vh86l0
dave0       dave0       ok        --        vh84l1 vh84l0
<XXX REMOVE e1000g1>
# dladm show-link ; ipmpstat -g
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
e1000g1     phys      1500   down     --         --
vh84l0      vlan      1500   up       --         e1000g0
vh84l1      vlan      1500   down     --         e1000g1
vh86l0      vnic      1500   up       --         e1000g0
vh86l1      vnic      1500   down     --         e1000g1
GROUP       GROUPNAME   STATE     FDT       INTERFACES
sammy0      sammy0      degraded  --        vh86l0 [vh86l1]
dave0       dave0       degraded  --        vh84l0 [vh84l1]
<XXX RE-ADD e1000g1>
# dladm show-link ; ipmpstat -g
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
e1000g1     phys      1500   up       --         --
vh84l0      vlan      1500   up       --         e1000g0
vh84l1      vlan      1500   up       --         e1000g1
vh86l0      vnic      1500   up       --         e1000g0
vh86l1      vnic      1500   up       --         e1000g1
GROUP       GROUPNAME   STATE     FDT       INTERFACES
sammy0      sammy0      ok        --        vh86l1 vh86l0
dave0       dave0       ok        --        vh84l1 vh84l0
# 
#5

Updated by Chip Schweiss 5 months ago

I figured out how to reproduce this. The presence of vnics on the same interfaces will cause the creation of VLANs in an up state on down interfaces.

I'm collecting a savecore now and will update this ticket with a place to download it.

#6

Updated by Chip Schweiss 5 months ago

The savecore file can be downloaded at:

https://repo.nrg.wustl.edu/pub/zfs/vmdump_ipmp_down.0

#7

Updated by Dan McDonald 5 months ago

  • Subject changed from IPMP not detecting down interface to multiple vnics sharing the same VLAN ID don't receive interface down events

Chip's discovered this via IPMP weirdness, but it turns out the first-order problem is what is in the new subject line.

A simple reproduction:

1.) Create a vnic with VLAN ID x. This is the control group.

2.) Create another vnic with VLAN ID y. This is the experimental group.

3.) Unplug/replug the underlying NIC. Note how the vnics' state changes with the underlying physical NIC.

4.) Create a third vnic, ALSO WITH VLAN ID y.

5.) Unplug/replug the underlying NIC. Note how the vnics with VLAN ID y do NOT indicate down like the physical NIC does, but the single vnic with VLAN ID x does.

This also affects explicit vlan objects (`dladm create-vlan`) too if a same-NIC vnic uses the same VLAN ID. (One is not allowed to create multiple vlan objects sharing the same VLAN ID.)

Reproducible on OmniOSce r151030ap & r151033-latest (i.e. bloody), plus SmartOS build from February, 2020. Transcript from SmartOS below, showing the problem, then showing that removing the duplicate-VLAN-ID vnic makes things function properly:

[root@smartos ~]# dladm show-vnic
LINK         OVER       SPEED MACADDRESS        MACADDRTYPE VID  ZONE
net0         e1000g0    0     b2:90:e2:c7:64:98 fixed       0    300a0c6a-d8aa-e9d1-f586-c0f3b9afedb8
vh86l0       e1000g0    1000  2:8:20:d5:e2:b7   random      86   --
[root@smartos ~]# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
net0        vnic      1500   ?        --         e1000g0
vh86l0      vnic      1500   up       --         e1000g0
vh84l0      vlan      1500   up       --         e1000g0
[root@smartos ~]# dladm create-vnic -v 86 -l e1000g0 vh5150l0
    XXX KEBE SAYS first sleep is unplug the cable, then 15 seconds later, plug it back in.
[root@smartos ~]# sleep 10 ; dladm show-link ; sleep 10 ; dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   down     --         --
net0        vnic      1500   ?        --         e1000g0
vh86l0      vnic      1500   up       --         e1000g0
vh84l0      vlan      1500   down     --         e1000g0
vh5150l0    vnic      1500   up       --         e1000g0
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
net0        vnic      1500   ?        --         e1000g0
vh86l0      vnic      1500   up       --         e1000g0
vh84l0      vlan      1500   up       --         e1000g0
vh5150l0    vnic      1500   up       --         e1000g0
[root@smartos ~]# dladm delete-vnic vh5150l0
    XXX KEBE SAYS first sleep is unplug the cable, then 15 seconds later, plug it back in.
[root@smartos ~]# sleep 10 ; dladm show-link ; sleep 10 ; dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   down     --         --
net0        vnic      1500   ?        --         e1000g0
vh86l0      vnic      1500   down     --         e1000g0
vh84l0      vlan      1500   down     --         e1000g0
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
net0        vnic      1500   ?        --         e1000g0
vh86l0      vnic      1500   up       --         e1000g0
vh84l0      vlan      1500   up       --         e1000g0
[root@smartos ~]# 

#8

Updated by Dan McDonald 5 months ago

  • Category set to networking
  • Status changed from New to Feedback
  • Assignee set to Dan McDonald
#9

Updated by Dan McDonald 5 months ago

Found the problem:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/mac/mac_client.c#547

And I quote:

547/*
548 * Return the link state of the specified client. If here are more
549 * than one clients of the underying mac_impl_t, the link state
550 * will always be UP regardless of the link state of the underlying
551 * mac_impl_t. This is needed to allow the MAC clients to continue
552 * to communicate with each other even when the physical link of
553 * their mac_impl_t is down.
554 */

and

563    /*
564     * Returns LINK_STATE_UP if there are other MAC clients defined on
565     * mac_impl_t which share same VLAN ID as that of mcip. Note that
566     * if 'mcip' has more than one VID's then we match ANY one of the
567     * VID's with other MAC client's VID's and return LINK_STATE_UP.
568     */

For a unshared-VLAN-ID vnic, the call stack to alter the link state is as follows:

  0   | vnic_notify_cb:entry                  
  0    -> mac_client_stat_get                 
  0      -> mac_client_link_state             
  0        -> mac_stat_get                    
  0        <- mac_stat_get                    Returns 0x0
  0      <- mac_client_link_state             Returns 0x0
  0    <- mac_client_stat_get                 Returns 0x0 <==== KEBE SAYS 0 is DOWN

For a shared one, it's different:

  0   | vnic_notify_cb:entry                  
  0    -> mac_client_stat_get                 
  0      -> mac_client_link_state             
  0      <- mac_client_link_state             Returns 0x1
  0    <- mac_client_stat_get                 Returns 0x1 <==== KEBE SAYS 1 is UP
  0    -> mac_link_update                     

For an IPMP group, this is bad, per the original report.

For every other use of VNICs or VLANs, including and especially communicating with each other (e.g. two zones with two vnics on the same VLAN ID shouldn't get cut off from each other because the underlying physical NIC got unplugged), this is actually good, as it does not prevent inter-zone/intra-machine communication.

I'm lowering the priority of this bug for now, as the filer indicated the duplicate VLAN ID was a mistake/typo. I won't close this yet, but I'm leaning toward a "works as designed/not a bug" designation. I'd like the filer, and anyone else following this bug, to comment.

#10

Updated by Dan McDonald 5 months ago

  • Priority changed from Normal to Low
#11

Updated by Stephan Althaus 5 months ago

Hi!
I have an issue on my host with 2 zones/vnics that might be related.

If the physical port is down for some reason,
and reconnects after a while,
i think the vnics will not be available again via the physical link.

i stick to vlan id 0 always as i don't have/need vlan's.

i have to verify my assumption when i get physical access to the machine somewhen..

#12

Updated by Chip Schweiss 5 months ago

Actually using the same VLAN was not a mistake. In this case, I had previously stopped the LX zone, but didn't remove the vnic.

Any time I run LX zones, I have to use vnics. These will always share a vlan with other existing NFS services which are shared via IPs attached to IPMP over vlans, not vnics. Both network stacks need to function simultaneously.

#13

Updated by Dan McDonald 5 months ago

Stephan Althaus wrote:

Hi!
I have an issue on my host with 2 zones/vnics that might be related.

If the physical port is down for some reason,
and reconnects after a while,
i think the vnics will not be available again via the physical link.

i stick to vlan id 0 always as i don't have/need vlan's.

i have to verify my assumption when i get physical access to the machine somewhen..

Outputs of:

dladm show-link
dladm show-ether
dladm show-vnic

will be helpful.

#14

Updated by Dan McDonald 4 months ago

Chip Schweiss wrote:

Any time I run LX zones, I have to use vnics. These will always share a vlan with other existing NFS services which are shared via IPs attached to IPMP over vlans, not vnics. Both network stacks need to function simultaneously.

Without serious rearchitecture, the tradeoff stands as follows:

1.) (today's state) Multiple-same-VLAN-vnics won't go down if the lower-link does, so inter-vnic/intra-machine communication can continue.

2.) (an easy candidate for a tunable) Multiple-same-VLAN-vnics WILL go down if the lower-link does, breaking inter-vnic/intra-machine communication, BUT enabling failover/detection mechanisms (like IPMP) to do the right thing.

#15

Updated by Chip Schweiss 4 months ago

So if you want to run any type of zone attached to a vlan, you don't get to have any sort of bonded network on the same vlan.

This would break LCAP just the same.

My vote is for #2, which should be the default behavior when network bonding is in place.

Also available in: Atom PDF