Bug #10001
opencreating duplicate vnics is possible, and breaks dladm
0%
Description
A user on IRC reports:
<jlinnosa> Hi all. I just managed to create vnics with same name as existing vnics and now datalink stuff won't work... Any advice? <richlowe> assuming it's not Joyent's zone stuff, I think your only option is probably editing the cfg's by hand, and filing a bug to say how you managed it <richlowe> /etc/dladmm/*.conf I think, but you may want to ask someone like danmcd who's more regularly thinking about the network stuff <richlowe> I think, were I editting the files even though it said not to, I'd probably try my hand at just renaming the duplicate so I could delete it/edit it properly <jlinnosa> it's omnios r151026, and i just basically copy-pasted 'dladm create-vnic -l aggr0 netns20; dladm create-vnic -l igb4 netns21' twice... <richlowe> Wow <richlowe> See, I thought the tools were like... at least that not broken <richlowe> :\ <jlinnosa> right after 'dladm show-vnic' listed everything nicely, but everything stopped working after 'dladm delete-vnic netns20'... [15:39]
(interspersed wih my responses).
Basically, dladm allowed a duplicate vnic to be created (it shouldn't), and then failed
Related issues
Updated by Rich Lowe about 5 years ago
<jlinnosa> yeah, 'svc:/network/datalink-management:default (data-link management daemon)' is in maintenance <jlinnosa> 'Reason: Start method exited with $SMF_EXIT_ERR_FATAL.' <jlinnosa> '[ Nov 21 15:49:37 Stopping because process dumped core. ]
Updated by Jaakko Linnosaari about 5 years ago
From syslog:
Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('unix', 0, 'netns20'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns20', 0, 'mac_misc_stat'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns20', 0, 'mac_rx_swlane0'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns20', 0, 'mac_tx_swlane0'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns20', 0, 'mac_tx_hwlane1'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns20', 0, 'mac_tx_hwlane0'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns20', 0, 'mac_rx_swlane0_fanout0'): namespace collision Nov 21 15:48:20 brunhes mac: [ID 469746 kern.info] NOTICE: vnic1056 registered Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('link', 0, 'netns20'): namespace collision Nov 21 15:48:20 brunhes mac: [ID 435574 kern.info] NOTICE: vnic1056 link up, 1000 Mbps, unknown duplex Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('unix', 0, 'netns21'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns21', 0, 'mac_misc_stat'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns21', 0, 'mac_rx_swlane0'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns21', 0, 'mac_tx_hwlane0'): namespace collision Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('netns21', 0, 'mac_rx_swlane0_fanout0'): namespace collision Nov 21 15:48:20 brunhes mac: [ID 469746 kern.info] NOTICE: vnic1057 registered Nov 21 15:48:20 brunhes unix: [ID 665567 kern.warning] WARNING: kstat_create('link', 0, 'netns21'): namespace collision Nov 21 15:48:20 brunhes mac: [ID 435574 kern.info] NOTICE: vnic1057 link up, 1000 Mbps, unknown duplex Nov 21 15:49:00 brunhes mac: [ID 469746 kern.info] NOTICE: vnic1058 registered Nov 21 15:49:00 brunhes mac: [ID 435574 kern.info] NOTICE: vnic1058 link up, 1000 Mbps, unknown duplex Nov 21 15:49:00 brunhes mac: [ID 469746 kern.info] NOTICE: vnic1059 registered Nov 21 15:49:00 brunhes mac: [ID 435574 kern.info] NOTICE: vnic1059 link up, 1000 Mbps, unknown duplex Nov 21 15:49:37 brunhes zoneadmd[7399]: [ID 702911 daemon.error] [zone 'ns2'] datalinks remain in zone after shutdown Nov 21 15:49:37 brunhes zoneadmd[7399]: [ID 702911 daemon.error] [zone 'ns2'] unable to unconfigure network interfaces in zone Nov 21 15:49:37 brunhes zoneadmd[7399]: [ID 702911 daemon.error] [zone 'ns2'] unable to destroy zone Nov 21 15:49:37 brunhes dlmgmtd[17590]: [ID 597718 daemon.warning] dlmgmt_process_db_onereq() read operation on persistent configuration failed: Invalid argument Nov 21 15:49:37 brunhes dlmgmtd[17590]: [ID 807205 daemon.error] unable to initialize daemon: Invalid argument Nov 21 15:49:37 brunhes svc.startd[9]: [ID 652011 daemon.warning] svc:/network/datalink-management:default: Method "/lib/svc/method/svc-dlmgmtd" failed with exit status 95. Nov 21 15:49:37 brunhes svc.startd[9]: [ID 748625 daemon.error] network/datalink-management:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details) Nov 21 15:49:37 brunhes fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major Nov 21 15:49:37 brunhes EVENT-TIME: Wed Nov 21 15:49:37 EET 2018 Nov 21 15:49:37 brunhes PLATFORM: ProLiant-DL360e-Gen8, CSN: xxxxxxxxxx, HOSTNAME: brunhes Nov 21 15:49:37 brunhes SOURCE: software-diagnosis, REV: 0.1 Nov 21 15:49:37 brunhes EVENT-ID: 6069c745-b588-caeb-eee3-df15e7f01b2f Nov 21 15:49:37 brunhes DESC: A service failed - a start, stop or refresh method failed. Nov 21 15:49:37 brunhes Refer to http://illumos.org/msg/SMF-8000-YX for more information. Nov 21 15:49:37 brunhes AUTO-RESPONSE: The service has been placed into the maintenance state. Nov 21 15:49:37 brunhes IMPACT: svc:/network/datalink-management:default is unavailable. Nov 21 15:49:37 brunhes REC-ACTION: Run 'svcs -xv svc:/network/datalink-management:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.
Updated by Jaakko Linnosaari about 5 years ago
# uname -a SunOS brunhes 5.11 omnios-r151026-d9b45886bd i86pc i386 i86pc
Updated by Peter Tribble about 5 years ago
Tested in tribblix m20 and m20.5, and omnitribblix m20.5 - all fail, so this confirms it's a problem with vanilla illumos-gate
To reproduce:
create an exclusive-ip zone with a vnic
then, with the zone running, ettempt to create a vnic of the same name in the global zone
Updated by Carlos Neira almost 5 years ago
Peter Tribble wrote:
Tested in tribblix m20 and m20.5, and omnitribblix m20.5 - all fail, so this confirms it's a problem with vanilla illumos-gate
To reproduce:
create an exclusive-ip zone with a vnic
then, with the zone running, ettempt to create a vnic of the same name in the global zone
Ths change only addresses creating duplicated vnics https://github.com/omniosorg/illumos-omnios/pull/367 using dladm, I don't know if this is enough or some other checks should be in place.
Updated by Andy Fiddaman almost 5 years ago
- Status changed from New to In Progress
- Assignee set to Andy Fiddaman
- Tags deleted (
needs-triage)
Updated by Andy Fiddaman almost 5 years ago
The problem occurs because when the duplicate link is returned to the global zone, dlmgmtd crashes with this stack:
--------------------- thread# 2 / lwp# 2 --------------------- feeec827 _lwp_kill (fed1e974, fed1e974, b0, feed2ee4) + 7 feed30d8 _assfail (fed6167c, fed61634, 289) + 205 feed31ed assfail (fed6167c, fed61634, 289) + 21 fed6141e avl_add (806bb9c, 806d930) + 48 080558a4 dlmgmt_setzoneid (fed1edf4, fed1ed80, fed1eda8, 0, 806cd58) + 175 0805548a dlmgmt_handler (0, fed1edf4, c, 0, 0) + 96 feeecebd __door_return () + 3d
There is a block of code in dlmgmt_setzoneid()
that is supposed to detect duplicates in the destination zone:
if (zoneid != GLOBAL_ZONEID && link_by_name(linkp->ll_link, newzoneid) != NULL) { err = EEXIST; goto done; }
but this never triggers because there is an earlier check for
zoneid != GLOBAL ZONEID
that returns EACCESS. I think the original author actually intended if (newzoneid != GLOBAL_ZONEID)
here, but that is still not enough because if the new zone is the GZ then link_by_name()
also consults the on-loan-link list.
There are two ways to fix this:
1. Prevent allocation of a duplicate-named VNIC. This is already effectively enforced when zones are not involved as the backend call to zone_add_datalink()
already returns EEXIST for that case.
2. Fix the code block above to prevent a link changing zone if that would result in duplicates in the destination zone.
I'm going for 2. Either way, for illumos-gate*, VNICs should all be given unique names. With this fix in place, shutting down a zone that has a VNIC with the same name as one already in the GZ will result in the zone hanging in the 'down' state, requiring administrator intervention; I think this is appropriate.
- * illumos-joyent has 20 additional commits in this area which restructure the way in which VNICs work including (effectively) support for VNICs in zones having the same names. I did port these to omnios and experiment but there are other enhancements here such as automatically removing VNICs during zone shutdown. Either way, Joyenteurs, you'll find that the fix in illumos-joyent for this precise problem (OS-1457 dladm won't show or create vnics) is different and that's because so much else is including the data structures and lookup functions.
Updated by Andy Fiddaman almost 5 years ago
Review at https://illumos.org/rb/r/1482/
Testing notes, with the new dlmgmtd
in place:
bloody# dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID test0 vioif0 1000 2:8:20:67:d4:82 random 0
Normal case, zone startup/shutdown
bloody# zoneadm -z test boot bloody# zlogin test dladm show-link LINK CLASS MTU STATE BRIDGE OVER test0 vnic 1500 up -- ? bloody# zoneadm -z test halt
Error case:
bloody# dladm create-vnic -l vioif0 test0 dladm: vnic creation over vioif0 failed: object already exists bloody# zoneadm -z test boot bloody# dladm create-vnic -l vioif0 test0 bloody# dladm show-vnic LINK OVER SPEED MACADDRESS MACADDRTYPE VID test0 vioif0 1000 2:8:20:67:d4:82 random 0 test0 vioif0 1000 2:8:20:68:da:d6 random 0 bloody# zoneadm -z test halt zone 'test': datalinks remain in zone after shutdown zone 'test': unable to unconfigure network interfaces in zone zone 'test': unable to destroy zone bloody# bloody# svcs -x bloody# zoneadm list -vc ID NAME STATUS PATH BRAND IP 0 global running / ipkg shared 1 test down /zones/test pkgsrc excl
Updated by Andy Fiddaman almost 5 years ago
- Related to Bug #7464: crossbow allows to create a vnic with existing name added