Bug #12434
openmultiple vnics sharing the same VLAN ID don't receive interface down events
0%
Description
When an interface is down IPMP will not always go to a degraded state, leaving the IPMP interface in a broken state:
# ipmpstat -gn
GROUP GROUPNAME STATE FDT INTERFACES
vlan2342i0 vlan2342i0 ok -- vlan2342v0 vlan2342v1
vlan2307i0 vlan2307i0 degraded -- vlan2307v1 [vlan2307v0]
vlan326i0 vlan326i0 degraded -- vlan326v1 [vlan326v0]
vlan2308i0 vlan2308i0 degraded -- vlan2308v1 [vlan2308v0]
# dladm show-vlan
LINK VID OVER FLAGS
vlan326v0 326 ixgbe0 -----
vlan326v1 326 ixgbe1 -----
vlan2308v0 2308 ixgbe0 -----
vlan2308v1 2308 ixgbe1 -----
vlan2307v1 2307 ixgbe1 -----
vlan2307v0 2307 ixgbe0 -----
vlan2342v1 2342 ixgbe1 -----
vlan2342v0 2342 ixgbe0 -----
# dladm show-phys
LINK MEDIA STATE SPEED DUPLEX DEVICE
igb0 Ethernet up 1000 full igb0
igb1 Ethernet unknown 0 half igb1
igb6 Ethernet unknown 0 half igb6
igb4 Ethernet unknown 0 half igb4
igb2 Ethernet unknown 0 half igb2
igb3 Ethernet unknown 0 half igb3
ixgbe0 Ethernet down 0 unknown ixgbe0
igb7 Ethernet unknown 0 half igb7
igb5 Ethernet unknown 0 half igb5
ixgbe1 Ethernet up 10000 full ixgbe1
All IPMP interfaces should be degraded because ixgbe0 is down. However, vlan2342i0 is in an 'ok' state and will not function.
I'm not sure when this problem started. I haven't had a down network connection in several years. This is on OmniOS r151030ap.
Updated by Chip Schweiss about 2 years ago
It looks like this bug is in the VLAN layer, not IPMP:
# dladm
LINK CLASS MTU STATE BRIDGE OVER
igb0 phys 1500 up -- --
igb1 phys 1500 unknown -- --
igb6 phys 1500 unknown -- --
igb4 phys 1500 unknown -- --
igb2 phys 1500 unknown -- --
igb3 phys 1500 unknown -- --
ixgbe0 phys 1500 down -- --
igb7 phys 1500 unknown -- --
igb5 phys 1500 unknown -- --
ixgbe1 phys 1500 up -- --
vlan326v0 vlan 1500 down -- ixgbe0
vlan326v1 vlan 1500 up -- ixgbe1
vlan2308v0 vlan 1500 down -- ixgbe0
vlan2308v1 vlan 1500 up -- ixgbe1
vlan2342vnic0 vnic 1500 up -- ixgbe0
vlan2342vnic1 vnic 1500 up -- ixgbe1
vlan2307v1 vlan 1500 up -- ixgbe1
vlan2307v0 vlan 1500 down -- ixgbe0
vlan2342v1 vlan 1500 up -- ixgbe1
vlan2342v0 vlan 1500 up -- ixgbe0
Updated by Dan McDonald about 2 years ago
This is weird.
There are two ways to create a "VLAN Nic". One is via `dladm create-vlan -v vid -l physN linkN`, the other is via `dladm create-vnic -v vid -l physN linkN`. The one that isn't failing properly for you was created by `create-vnic`. So the question is: What's different about that such that it doesn't propagate the failure?
IF you have your machine in this state still, a `savecore -L` should have the state one would need to dive into the internal structures and see what's going on, and it won't force a reboot. Also, for people reproducing this bug (including me some time this week), the experiment worth trying is:
1.) Have two physical same-link NICs on your test machine.
2.) Create one vlan over each using `dladm create-vlan...`
3.) Create one VNIC with a vlan id over each using `dladm create-vnic`
4.) Have an off-link peer where you can send packets over each vlan.
5.) Start moving bits.
6.) Unplug (virtually or otherwise) one of the nics.
7.) See if you end up in the state Chip describes.
Updated by Dan McDonald about 2 years ago
ALSO NOTE: This is on OmniOSce r151030ap. It may be necessary to reproduce this on that release or older alongside a current one, just in case it's a bug that's been fixed (e.g. #12355 maybe?).
Updated by Dan McDonald about 2 years ago
I configured a VMware VM using e1000g nics. I cannot, on 030ap, reproduce this. I'm not sure if it's e1000g vs. ixgbe, or dumb luck, or some other problem. I did mention 12355 earlier, but DTrace suggests THAT isn't tickled during NIC removal/restart.
# dladm show-link ; ipmpstat -g LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- e1000g1 phys 1500 up -- -- vh84l0 vlan 1500 up -- e1000g0 vh84l1 vlan 1500 up -- e1000g1 vh86l0 vnic 1500 up -- e1000g0 vh86l1 vnic 1500 up -- e1000g1 GROUP GROUPNAME STATE FDT INTERFACES sammy0 sammy0 ok -- vh86l1 vh86l0 dave0 dave0 ok -- vh84l1 vh84l0 <XXX REMOVE e1000g0, delay because this is over ssh over same link> # sleep 10 ; dladm show-link ; ipmpstat -g LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 down -- -- e1000g1 phys 1500 up -- -- vh84l0 vlan 1500 down -- e1000g0 vh84l1 vlan 1500 up -- e1000g1 vh86l0 vnic 1500 down -- e1000g0 vh86l1 vnic 1500 up -- e1000g1 GROUP GROUPNAME STATE FDT INTERFACES sammy0 sammy0 degraded -- vh86l1 [vh86l0] dave0 dave0 degraded -- vh84l1 [vh84l0] <XXX RE-ADD e1000g0> # dladm show-link ; ipmpstat -g LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- e1000g1 phys 1500 up -- -- vh84l0 vlan 1500 up -- e1000g0 vh84l1 vlan 1500 up -- e1000g1 vh86l0 vnic 1500 up -- e1000g0 vh86l1 vnic 1500 up -- e1000g1 GROUP GROUPNAME STATE FDT INTERFACES sammy0 sammy0 ok -- vh86l1 vh86l0 dave0 dave0 ok -- vh84l1 vh84l0 <XXX REMOVE e1000g1> # dladm show-link ; ipmpstat -g LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- e1000g1 phys 1500 down -- -- vh84l0 vlan 1500 up -- e1000g0 vh84l1 vlan 1500 down -- e1000g1 vh86l0 vnic 1500 up -- e1000g0 vh86l1 vnic 1500 down -- e1000g1 GROUP GROUPNAME STATE FDT INTERFACES sammy0 sammy0 degraded -- vh86l0 [vh86l1] dave0 dave0 degraded -- vh84l0 [vh84l1] <XXX RE-ADD e1000g1> # dladm show-link ; ipmpstat -g LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- e1000g1 phys 1500 up -- -- vh84l0 vlan 1500 up -- e1000g0 vh84l1 vlan 1500 up -- e1000g1 vh86l0 vnic 1500 up -- e1000g0 vh86l1 vnic 1500 up -- e1000g1 GROUP GROUPNAME STATE FDT INTERFACES sammy0 sammy0 ok -- vh86l1 vh86l0 dave0 dave0 ok -- vh84l1 vh84l0 #
Updated by Chip Schweiss about 2 years ago
I figured out how to reproduce this. The presence of vnics on the same interfaces will cause the creation of VLANs in an up state on down interfaces.
I'm collecting a savecore now and will update this ticket with a place to download it.
Updated by Chip Schweiss about 2 years ago
The savecore file can be downloaded at:
Updated by Dan McDonald about 2 years ago
- Subject changed from IPMP not detecting down interface to multiple vnics sharing the same VLAN ID don't receive interface down events
Chip's discovered this via IPMP weirdness, but it turns out the first-order problem is what is in the new subject line.
A simple reproduction:
1.) Create a vnic with VLAN ID x. This is the control group.
2.) Create another vnic with VLAN ID y. This is the experimental group.
3.) Unplug/replug the underlying NIC. Note how the vnics' state changes with the underlying physical NIC.
4.) Create a third vnic, ALSO WITH VLAN ID y.
5.) Unplug/replug the underlying NIC. Note how the vnics with VLAN ID y do NOT indicate down like the physical NIC does, but the single vnic with VLAN ID x does.
This also affects explicit vlan objects (`dladm create-vlan`) too if a same-NIC vnic uses the same VLAN ID. (One is not allowed to create multiple vlan objects sharing the same VLAN ID.)
Reproducible on OmniOSce r151030ap & r151033-latest (i.e. bloody), plus SmartOS build from February, 2020. Transcript from SmartOS below, showing the problem, then showing that removing the duplicate-VLAN-ID vnic makes things function properly:
[root@smartos ~]# dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID ZONE net0 e1000g0 0 b2:90:e2:c7:64:98 fixed 0 300a0c6a-d8aa-e9d1-f586-c0f3b9afedb8 vh86l0 e1000g0 1000 2:8:20:d5:e2:b7 random 86 -- [root@smartos ~]# dladm show-link LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- net0 vnic 1500 ? -- e1000g0 vh86l0 vnic 1500 up -- e1000g0 vh84l0 vlan 1500 up -- e1000g0 [root@smartos ~]# dladm create-vnic -v 86 -l e1000g0 vh5150l0 XXX KEBE SAYS first sleep is unplug the cable, then 15 seconds later, plug it back in. [root@smartos ~]# sleep 10 ; dladm show-link ; sleep 10 ; dladm show-link LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 down -- -- net0 vnic 1500 ? -- e1000g0 vh86l0 vnic 1500 up -- e1000g0 vh84l0 vlan 1500 down -- e1000g0 vh5150l0 vnic 1500 up -- e1000g0 LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- net0 vnic 1500 ? -- e1000g0 vh86l0 vnic 1500 up -- e1000g0 vh84l0 vlan 1500 up -- e1000g0 vh5150l0 vnic 1500 up -- e1000g0 [root@smartos ~]# dladm delete-vnic vh5150l0 XXX KEBE SAYS first sleep is unplug the cable, then 15 seconds later, plug it back in. [root@smartos ~]# sleep 10 ; dladm show-link ; sleep 10 ; dladm show-link LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 down -- -- net0 vnic 1500 ? -- e1000g0 vh86l0 vnic 1500 down -- e1000g0 vh84l0 vlan 1500 down -- e1000g0 LINK CLASS MTU STATE BRIDGE OVER e1000g0 phys 1500 up -- -- net0 vnic 1500 ? -- e1000g0 vh86l0 vnic 1500 up -- e1000g0 vh84l0 vlan 1500 up -- e1000g0 [root@smartos ~]#
Updated by Dan McDonald about 2 years ago
- Category set to networking
- Status changed from New to Feedback
- Assignee set to Dan McDonald
Updated by Dan McDonald about 2 years ago
Found the problem:
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/mac/mac_client.c#547
And I quote:
547/* 548 * Return the link state of the specified client. If here are more 549 * than one clients of the underying mac_impl_t, the link state 550 * will always be UP regardless of the link state of the underlying 551 * mac_impl_t. This is needed to allow the MAC clients to continue 552 * to communicate with each other even when the physical link of 553 * their mac_impl_t is down. 554 */
and
563 /* 564 * Returns LINK_STATE_UP if there are other MAC clients defined on 565 * mac_impl_t which share same VLAN ID as that of mcip. Note that 566 * if 'mcip' has more than one VID's then we match ANY one of the 567 * VID's with other MAC client's VID's and return LINK_STATE_UP. 568 */
For a unshared-VLAN-ID vnic, the call stack to alter the link state is as follows:
0 | vnic_notify_cb:entry 0 -> mac_client_stat_get 0 -> mac_client_link_state 0 -> mac_stat_get 0 <- mac_stat_get Returns 0x0 0 <- mac_client_link_state Returns 0x0 0 <- mac_client_stat_get Returns 0x0 <==== KEBE SAYS 0 is DOWN
For a shared one, it's different:
0 | vnic_notify_cb:entry 0 -> mac_client_stat_get 0 -> mac_client_link_state 0 <- mac_client_link_state Returns 0x1 0 <- mac_client_stat_get Returns 0x1 <==== KEBE SAYS 1 is UP 0 -> mac_link_update
For an IPMP group, this is bad, per the original report.
For every other use of VNICs or VLANs, including and especially communicating with each other (e.g. two zones with two vnics on the same VLAN ID shouldn't get cut off from each other because the underlying physical NIC got unplugged), this is actually good, as it does not prevent inter-zone/intra-machine communication.
I'm lowering the priority of this bug for now, as the filer indicated the duplicate VLAN ID was a mistake/typo. I won't close this yet, but I'm leaning toward a "works as designed/not a bug" designation. I'd like the filer, and anyone else following this bug, to comment.
Updated by Stephan Althaus about 2 years ago
Hi!
I have an issue on my host with 2 zones/vnics that might be related.
If the physical port is down for some reason,
and reconnects after a while,
i think the vnics will not be available again via the physical link.
i stick to vlan id 0 always as i don't have/need vlan's.
i have to verify my assumption when i get physical access to the machine somewhen..
Updated by Chip Schweiss about 2 years ago
Actually using the same VLAN was not a mistake. In this case, I had previously stopped the LX zone, but didn't remove the vnic.
Any time I run LX zones, I have to use vnics. These will always share a vlan with other existing NFS services which are shared via IPs attached to IPMP over vlans, not vnics. Both network stacks need to function simultaneously.
Updated by Dan McDonald about 2 years ago
Stephan Althaus wrote:
Hi!
I have an issue on my host with 2 zones/vnics that might be related.If the physical port is down for some reason,
and reconnects after a while,
i think the vnics will not be available again via the physical link.i stick to vlan id 0 always as i don't have/need vlan's.
i have to verify my assumption when i get physical access to the machine somewhen..
Outputs of:
dladm show-link dladm show-ether dladm show-vnic
will be helpful.
Updated by Dan McDonald about 2 years ago
Chip Schweiss wrote:
Any time I run LX zones, I have to use vnics. These will always share a vlan with other existing NFS services which are shared via IPs attached to IPMP over vlans, not vnics. Both network stacks need to function simultaneously.
Without serious rearchitecture, the tradeoff stands as follows:
1.) (today's state) Multiple-same-VLAN-vnics won't go down if the lower-link does, so inter-vnic/intra-machine communication can continue.
2.) (an easy candidate for a tunable) Multiple-same-VLAN-vnics WILL go down if the lower-link does, breaking inter-vnic/intra-machine communication, BUT enabling failover/detection mechanisms (like IPMP) to do the right thing.
Updated by Chip Schweiss about 2 years ago
So if you want to run any type of zone attached to a vlan, you don't get to have any sort of bonded network on the same vlan.
This would break LCAP just the same.
My vote is for #2, which should be the default behavior when network bonding is in place.