Project

General

Profile

Actions

Bug #15359

closed

Failed create of IPv6 addrconf address made it undeletable

Added by Ryan Goodfellow over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Seemingly the twin of #14724, failing to create an addrconf address results in an observable yet uncreatable address.

$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        dhcp     ok           192.168.1.12/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::4a4d:7eff:fef7:eeb/10
e1000g0/v6        addrconf ok           2603:3024:b0b:3400::1135/128
cxgbe0/v6         addrconf ok           fe80::207:43ff:fe29:a910/10
cxgbe1/v6         addrconf ok           fe80::207:43ff:fe29:a918/10

$ ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Permission denied

$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        dhcp     ok           192.168.1.12/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::4a4d:7eff:fef7:eeb/10
e1000g0/v6        addrconf ok           2603:3024:b0b:3400::1135/128
cxgbe0/v6         addrconf ok           fe80::207:43ff:fe29:a910/10
cxgbe1/v6         addrconf ok           fe80::207:43ff:fe29:a918/10

$ pfexec ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Address object already exists

$ pfexec ipadm delete-addr cxgbe0/foo
$ pfexec ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Addrconf already in progress
$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        dhcp     ok           192.168.1.12/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::4a4d:7eff:fef7:eeb/10
e1000g0/v6        addrconf ok           2603:3024:b0b:3400::1135/128
cxgbe0/v6         addrconf ok           fe80::207:43ff:fe29:a910/10
cxgbe1/v6         addrconf ok           fe80::207:43ff:fe29:a918/10
$ pfexec ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Addrconf already in progress

Related issues

Related to illumos gate - Bug #13316: ipmgmtd inconsistent with kernel on failureClosedDan Cross

Actions
Related to illumos gate - Bug #15648: Attempting to create multiple link-local IPv6 addresses leaves subinterfaces behindClosedAndy Fiddaman

Actions
Actions #1

Updated by Robert Mustacchi over 1 year ago

  • Project changed from site to illumos gate
Actions #2

Updated by Dan Cross about 1 year ago

I can't seem to be able to reproduce this behavior on a system built from recent bits (in particular, with the fix for #13316 in place). Ry, is this still an issue?

Actions #3

Updated by Ryan Goodfellow about 1 year ago

I believe so.

$ ssh vandor1
The illumos Project     stlouis-0-g759d140cf3   May. 10, 2023
illumos development build: ry 2023-May-10 [illumos]

ry@vandor1:~$ cat /etc/versions/build
heads/stlouis-0-g759d140cf3 # <-- after 2a295025ab352ac2f6469a947d5b5e2f9379f943

ry@vandor1:~$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        dhcp     ok           192.168.1.10/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::4a4d:7eff:fef5:f002%e1000g0/10
e1000g0/v6        addrconf ok           2603:3024:b0b:3400::1f79/128
cxgbe0/v6         addrconf ok           fe80::207:43ff:fe2c:2a0%cxgbe0/10

ry@vandor1:~$ ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Permission denied

ry@vandor1:~$ ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        dhcp     ok           192.168.1.10/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::4a4d:7eff:fef5:f002%e1000g0/10
e1000g0/v6        addrconf ok           2603:3024:b0b:3400::1f79/128
cxgbe0/v6         addrconf ok           fe80::207:43ff:fe2c:2a0%cxgbe0/10

ry@vandor1:~$ pfexec ipadm delete-addr cxgbe0/foo ## <-- there should be nothing to delete here, yet there is

ry@vandor1:~$ pfexec ipadm create-addr -T addrconf cxgbe0/foo ## <-- this should succeed

ipadm: Could not create address: Addrconf already in progress ## <-- zombies!

ry@vandor1:~$ date
Wed May 10 07:58:22 UTC 2023

ry@vandor1:~$ pfexec ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Addrconf already in progress

# [several more attempts ...]

ry@vandor1:~$ date
Wed May 10 08:05:13 UTC 2023 # <-- 5+ minutes later

ry@vandor1:~$ pfexec ipadm create-addr -T addrconf cxgbe0/foo
ipadm: Could not create address: Addrconf already in progress
Actions #4

Updated by Dan Cross about 1 year ago

So I was hopeful that the fix for #13316 would also address this, as the symptoms seem so similar. But Ry's reproduction was disheartening.

However, in reading that, I noticed an important detail: cxbge0/v6 is an addrconf address and already exists on the interface, thus the attempt to create a second addrconf address fails (the error message there is, admittedly, a bit misleading). The upshot is that there are still bugs here, but the situation isn't as dire as I had feared: we can manually clean everything up.

One remaining bug is that, despite address creation failure due to lack of permissions, deletion of the address object succeeds without error: this implies that ipmgmtd is still holding on to some state from the creation failure. Indeed, with an intervening delete-addr, a second attempt to create that address fails, even with the necessary permissions:

$ pfexec ipadm create-addr -T addrconf igb0/foo
ipadm: Could not create address: Address object already exists
$ 

This is precisely what #13316 was meant to address. What's going on here?

We're back to our good friend lnum. A fair bit of inspection later, if I dtrace into the process, I can see a failing address creation invocation results in calls to ipmgmt_aobjmap_op, once with ADDROBJ_LOOKUPADD (before permission fail) and lnum -1, and again with ADDROBJ_DELETE with lnum 0.

Note that, addrconf'd addresses are special-cased as we search through the list of address objects for deletion. From ipmgmt_persist.c:

                        if (strcmp(head->am_aobjname,
                            nodep->am_aobjname) == 0) {
                                nodep->am_atype = head->am_atype;
                                /*
                                 * There could be multiple IPV6_ADDRCONF nodes,
                                 * with same address object name, so check for
                                 * logical number also.
                                 */
                                if (head->am_atype !=
                                    IPADM_ADDR_IPV6_ADDRCONF ||
                                    nodep->am_lnum == head->am_lnum)
                                        break;
                        }

The successful deletion request does a lookup before it requests deletion, and caches the lnum that's already in the list, passing that in the delete request.

Actions #5

Updated by Andy Fiddaman about 1 year ago

  • Related to Bug #13316: ipmgmtd inconsistent with kernel on failure added
Actions #6

Updated by Andy Fiddaman about 1 year ago

  • Category set to networking
  • Status changed from New to In Progress
  • Assignee changed from Ryan Goodfellow to Dan Cross
Actions #7

Updated by Andy Fiddaman about 1 year ago

Looks like you found the fundamental problem here.
When we call into i_ipadm_create_ipv6addrs(), the entry in ipmgmtd is still a placeholder, so has its ifnum set to -1.
When i_ipadm_create_ipv6addrs() fails, it returns having set addr->ipadm_lifnum to 0 and so the subsequent cleanup fails due to the mismatch when the daemon tries to delete the placeholder.

% dtrace -n 'pid$target::i_ipadm_create_ipv6addrs:entry{self->addr = args[1]; printf("ifnum: %d", self->addr->ipadm_lifnum)}' \
    -n 'pid$target::i_ipadm_create_ipv6addrs:return{printf("ifnum: %d", self->addr->ipadm_lifnum)}' \
    -c 'ipadm create-addr -T addrconf cxgbe0/xx'
dtrace: description 'pid$target::i_ipadm_create_ipv6addrs:entry' matched 1 probe
dtrace: description 'pid$target::i_ipadm_create_ipv6addrs:return' matched 1 probe
ipadm: Could not create address: Permission denied
dtrace: pid 101350 has exited
CPU     ID                    FUNCTION:NAME
 69  60306   i_ipadm_create_ipv6addrs:entry ifnum: -1
 69  60307  i_ipadm_create_ipv6addrs:return ifnum: 0
Actions #8

Updated by Dan Cross about 1 year ago

We've got a patch we're testing, but another interesting failure mode got kicked up in this general area, and I'm preserving it here for posterity:

Suppose I have created an addrconf address object called nic/v6 that is associated with multiple addresses (say, a link-local address and a SLAAC address). Suppose further that an attempt to create another addrconf address, nic/foo, fails due to a permissions issue, leaving behind an address object in ipmgmtd. Then, if I delete nic/foo, that also takes out the non-LL address for nic/v6. For example:

: bride; ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        static   ok           192.168.2.20/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::3eec:efff:fe31:a22a%e1000g0/10
e1000g0/v6        addrconf ok           2603:3005:b04:8102:3eec:efff:fe31:a22a/64
: bride; ipadm create-addr -T addrconf e1000g0/foo
ipadm: Could not create address: Permission denied
: bride; pfexec ipadm delete-addr e1000g0/foo
: bride; ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        static   ok           192.168.2.20/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::3eec:efff:fe31:a22a%e1000g0/10
: bride; pfexec ipadm create-addr -T addrconf e1000g0/v6
ipadm: Could not create address: Address object already exists
: bride; ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        static   ok           192.168.2.20/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::3eec:efff:fe31:a22a%e1000g0/10
: bride; pfexec ipadm delete-addr e1000g0/v6
: bride; ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        static   ok           192.168.2.20/24
lo0/v6            static   ok           ::1/128
: bride; pfexec ipadm create-addr -T addrconf e1000g0/v6
: bride; ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
e1000g0/v4        static   ok           192.168.2.20/24
lo0/v6            static   ok           ::1/128
e1000g0/v6        addrconf ok           fe80::3eec:efff:fe31:a22a%e1000g0/10
e1000g0/v6        addrconf ok           2603:3005:b04:8102:3eec:efff:fe31:a22a/64
: bride;
Actions #9

Updated by Electric Monk about 1 year ago

  • Gerrit CR set to 2846
Actions #10

Updated by Andy Fiddaman about 1 year ago

  • Related to Bug #15648: Attempting to create multiple link-local IPv6 addresses leaves subinterfaces behind added
Actions #11

Updated by Dan Cross about 1 year ago

With the current fix in CR 2846 that Andy's put together, we don't have the issue I referred to in comment 8.

We have tested this change pretty extensively on our own hardware, and I'll put it through its paces on local test hardware, making sure we have no regressions for IPv6 LL and SLAAC addresses. Static configuration of IPv4 and IPv6 addresses works as expected, and I've tested DHCP on our hardware, as well.

Actions #12

Updated by Electric Monk about 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit c8152f8f417d34bc129af68ca0b76cfa9ca764db

commit  c8152f8f417d34bc129af68ca0b76cfa9ca764db
Author: Andy Fiddaman <illumos@fiddaman.net>
Date:   2023-05-15T22:31:58.000Z

    15648 Attempting to create multiple link-local IPv6 addresses leaves subinterfaces behind
    15359 Failed create of IPv6 addrconf address made it undeletable
    Reviewed by: Dan Cross <cross@oxidecomputer.com>
    Reviewed by: Bill Sommerfeld <sommerfeld@alum.mit.edu>
    Reviewed by: Dan McDonald <danmcd@mnx.io>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF