Project

General

Profile

Actions

Bug #15732

closed

Cannot bring down zone due to lingering IP references

Added by Andy Fiddaman about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
oxide:stlouis#414

Description

We have been seeing problems with zones sometimes getting stuck in shutting down. At this point, the zones are empty but they do not shut down. Here is an example:

BRM42220018 # zoneadm list -ivc
  ID NAME             STATUS     PATH                           BRAND    IP
   0 global           running    /                              ipkg     shared
  38 oxz_crucible_pantry shutting_down /zone/oxz_crucible_pantry      omicron1 excl

BRM42220018 # pgrep -lf zoneadm
  811 zoneadmd -z oxz_crucible_pantry
 1031 /usr/sbin/zoneadm -z oxz_crucible_pantry halt

BRM42220018 # pstack 1031
1031:   /usr/sbin/zoneadm -z oxz_crucible_pantry halt
 fee8c04a door     (8, 8045a04, 0, 0, 0, 3)
 fee77406 door_call (8, 8045a04) + c9
 fe79565c zonecfg_call_zoneadmd (8046599, 8045e5c, fefc1a20, 1) + 17d
 0805c0e7 halt_func (0, 80464ec, 0, 80464dc) + 121
 080562fd parse_and_run (1, 80464e8, 100, f8f62e80) + 2f
 08060960 main     (fee8d657, fef03484, 80464b8, 8055d1b) + 2d5
 08055d1b _start_crt (4, 80464dc, f8f366e7, 0, 0, 0) + 9a
 08055bea _start   (4, 8046584, 8046596, 8046599, 80465ad, 0) + 1a

BRM42220018 # pstack 811
811:    zoneadmd -z oxz_crucible_pantry
------------------------- thread# 1 --------------------------
 feedb807 pollsys  (8046d5c, 4, 0, 0)
 fee6712e poll     (8046d5c, 4, ffffffff) + 68
 0805caec do_console_io (807b968, 6, 2) + 1e6
 0805cd6b serve_console (807b968) + 87
 0805b930 main     (feedd657, fef53484) + 96d
 080580cb _start_crt (3, 8047ed8, fefd06e7, 0, 0, 0) + 9a
 08057f9a _start   (3, 8047f88, 8047f91, 8047f94, 0, 8047fa8) + 1a
------------------------- thread# 2 --------------------------
 feedc1c8 door     (0, 0, 0, 0, 0, 8)
 feec1d19 door_unref_func (32b) + 45
 feed72ef _thrp_setup (fece0a40) + 7f
 feed75a0 _lwp_start (fece0a40, 0, 0, 0, 0, 0)
------------------------- thread# 3 --------------------------
------------------------- thread# 4 --------------------------
 feedc200 door     (0, 0, 0, fe5ffe00, f5f00, a)
 feec2927 door_create_func (0) + 4c
 feed72ef _thrp_setup (fece1a40) + 7f
 feed75a0 _lwp_start (fece1a40, 0, 0, 0, 0, 0)
BRM42220018 # pfexec mdb -ke '0t811::pid2proc | ::walk thread | ::findstack-v'
stack pointer for thread fffffcfa69494c20 (zoneadmd/1): fffff78825384c40
[ fffff78825384c40 _resume_from_idle+0x12b() ]
  fffff78825384c70 swtch+0x139()
  fffff78825384cf0 cv_wait_sig_swap_core+0x160(fffffcfb0ed958d2, fffffcfb0ed95898, 0)
  fffff78825384d10 cv_wait_sig_swap+0x17(fffffcfb0ed958d2, fffffcfb0ed95898)
  fffff78825384d40 cv_timedwait_sig_hrtime+0x35(fffffcfb0ed958d2, fffffcfb0ed95898, ffffffffffffffff)
  fffff78825384e10 poll_common+0x59a(8046d5c, 4, 0, 0)
  fffff78825384ec0 pollsys+0xfa(8046d5c, 4, 0, 0)
  fffff78825384f10 sys_syscall32+0x105()
stack pointer for thread fffffcfa6fd57100 (zoneadmd/2): fffff78828157d80
[ fffff78828157d80 _resume_from_idle+0x12b() ]
  fffff78828157db0 swtch+0x139()
  fffff78828157e20 cv_wait_sig+0x154(fffffcfab24264a8, fffffffffc4465e0)
  fffff78828157e50 door_unref+0x9f()
  fffff78828157ec0 doorfs32+0x1d5(0, 0, 0, 0, 0, 8)
  fffff78828157f10 sys_syscall32+0x105()
stack pointer for thread fffffcfa3ad37c40 (zoneadmd/3): fffff7882815d3d0
[ fffff7882815d3d0 _resume_from_idle+0x12b() ]
  fffff7882815d400 swtch+0x139()
  fffff7882815d430 cv_wait+0x70(fffffcfabef93940, fffffcfabef938d0)
  fffff7882815d490 ip_modclose+0xd3(fffffcfabef936e8)
  fffff7882815d4d0 ip_close+0x26(fffffcfb5b7977b0, 3, fffffcfa92d0e850)
  fffff7882815d550 qdetach+0xa1(fffffcfb5b7977b0, 1, 3, fffffcfa92d0e850, 0)
  fffff7882815d5f0 strclose+0x2a8(fffffcfbbfedeb00, 3, fffffcfa92d0e850)
  fffff7882815d640 device_close+0x5c(fffffcfbbc640b80, 3, fffffcfa92d0e850)
  fffff7882815d6d0 spec_close+0x129(fffffcfbbc640b80, 3, 1, 0, fffffcfa92d0e850, 0)
  fffff7882815d750 fop_close+0x66(fffffcfbbc640b80, 3, 1, 0, fffffcfa92d0e850, 0)
  fffff7882815d790 closef+0x63(fffffcfb605bb3a8)
  fffff7882815d860 munlink+0x270(fffffcfa9ade0818, fffffcfb3ddeaa90, 2, fffffcfaf1ac58f8, fffff7882815dadc, fffffcfb088ec7d0)
  fffff7882815d8d0 munlinkall+0x4c(fffffcfa9ade0818, 2, fffffcfaf1ac58f8, fffff7882815dadc, fffffcfb088ec7d0)
  fffff7882815da60 strioctl+0x3270(fffffcfa9f6d5240, 5317, ffffffffffffffff, 80200000, 2, fffffcfaf1ac58f8, fffff7882815dadc)
  fffff7882815dac0 ldi_ioctl+0x84(fffffcfa9b49ca60, 5317, ffffffffffffffff, 80000000, fffffcfaf1ac58f8, fffff7882815dadc)
  fffff7882815db40 str_stack_shutdown+0xb0(26, fffffcfb088ec7d0)
  fffff7882815dbc0 netstack_apply_shutdown+0xd6(fffffffffc25a348, fffffcfbbb448400, 2)
  fffff7882815dc00 apply_all_modules_reverse+0x39(fffffcfbbb448400, fffffffffc0cc420)
  fffff7882815dc40 netstack_zone_shutdown+0xdb(26, fffffcfbbb448400)
  fffff7882815dcb0 zsd_apply_shutdown+0x1d6(0, 1, fffffcfa2c8e3680, 2)
  fffff7882815dd00 zsd_apply_all_keys+0x53(fffffffffc1bc5e0, fffffcfa2c8e3680)
  fffff7882815dd50 zone_zsd_callbacks+0xc7(fffffcfa2c8e3680, 1)
  fffff7882815dd80 zone_shutdown+0x15a(26)
  fffff7882815dec0 zone+0x337(5, 26, 0, fe6fd808, 26)
  fffff7882815df10 sys_syscall32+0x105()
stack pointer for thread fffffcfb0e715be0 (zoneadmd/4): fffff78828114d50
[ fffff78828114d50 _resume_from_idle+0x12b() ]
  fffff78828114d80 swtch+0x139()
  fffff78828114db0 shuttle_swtch+0x21e(fffffffffc4465e0)
  fffff78828114e50 door_return+0x224(0, 0, 0, 0, fe5ffe00, f5f00)
  fffff78828114ec0 doorfs32+0x1f4(0, 0, 0, fe5ffe00, f5f00, a)
  fffff78828114f10 sys_syscall32+0x105()

Inspecting the zone from mdb, we can see that there are a number of references on the ill corresponding to one of the zone's network interfaces:

> ::zone
            ADDR     ID STATUS        NAME                 PATH
fffffffffc2a06c0      0 running       global               /
fffffcfa2c8e3680     38 empty         oxz_crucible_pantry  /zone/oxz_crucible_pantry/root/
> ::ill -s oxz_crucible_pantry -v
ADDR             NAME     VER TYPE       WQ               IPST             FLAGS
PHYINT            CNT     GROUP

fffffcfabef936e8 oxControlService16 v6  ETHER      fffffcfb5b7978a8 fffffcfaccab9000 RUNNING,MULTICAST
fffffcfbb29841c8   73     0
                    |
                    +-->  73 ill_refcnt         references from active threads
                           0 ill_ire_cnt        ires referencing this ill
                           0 ill_nce_cnt        nces referencing this ill
                           0 ill_ncec_cnt       ncecs referencing this ill
                           0 ill_ilm_cnt        ilms referencing this ill

I set the following dtrace script running to watch for ill_refcnt holds and releases and let it run until this problem manifested again.

#!/usr/sbin/dtrace -Cs

ill_refhold:entry,ill_refhold_locked:entry
{
        @refs[curthread, arg0] = sum(1);
        @stacks[curthread, arg0, stack()] = count();
}

ill_refrele:entry
{
        @refs[curthread, arg0] = sum(-1);
}

profile:::tick-60sec
{
        trunc(@refs, 20);
        printf("REFS:");
        printa("\n%a, %p = %@x", @refs);
}

END
{
        trunc(@refs, 20);
        printf("\nSTACKS:\n");
        printa("%a, %p, %k = %@x\n", @stacks);
        printf("REFS:\n");
        printa("%a, %p = %@x\n", @refs);
}

This produced data that showed that references were being leaked in this stack:

ip`ill_lookup_on_ifindex+0x8a
ip`ip_recv_attr_from_mblk+0x1a4
ip`squeue_try_drain_one+0xed
ip`squeue_synch_exit+0xa3
ip`tcp_connect+0x128
sockfs`so_connect+0xce
sockfs`socket_connect+0x42
sockfs`connect+0x91
unix`sys_syscall+0x17d

and indeed there is a missing call to ira_release() in squeue_try_drain_one().

Having found this, and tested a fix, I checked SmartOS and found that the same fix was made there a while ago, although according to the summary they were seeing a credential leak rather than the ill. Either way, the cause and fix are the same - namely https://github.com/TritonDataCenter/illumos-joyent/commit/7354012d871a98cfeba6ab962af30b16d0455e5f


Related issues

Related to illumos gate - Bug #13149: elide squeue wake-ups when prudentClosedPatrick Mooney

Actions
Actions #1

Updated by Andy Fiddaman about 1 year ago

  • Related to Bug #13149: elide squeue wake-ups when prudent added
Actions #2

Updated by Andy Fiddaman about 1 year ago

The SmartOS bug that led to the same fix for this is https://smartos.org/bugview/OS-6435
This is a followup to #13149 that was not upstreamed.

Actions #3

Updated by Electric Monk about 1 year ago

  • Gerrit CR set to 2912
Actions #4

Updated by Andy Fiddaman about 1 year ago

I tested this on the same set of systems that previously reliably reproduced the fault. I was able to confirm that it was now possible to halt zones even when there was a trip through squeue_try_drain_one() - observed with dtrace - and that listing all ill instances from mdb would sometimes show 0 references against each, where it didn't before.

# mdb -k
> ::ill -v ! grep ill_refcnt | sort | uniq -c
218                     +-->   0 ill_refcnt         references from active threads
Actions #5

Updated by Electric Monk about 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 3a7e2f8dc0c13ba74ff90394a857c2a174a243a0

commit  3a7e2f8dc0c13ba74ff90394a857c2a174a243a0
Author: Andy Fiddaman <illumos@fiddaman.net>
Date:   2023-06-12T12:01:17.000Z

    15732 Cannot bring down zone due to lingering IP references
    Reviewed by: Robert Mustacchi <rm+illumos@fingolfin.org>
    Reviewed by: Dan Cross <cross@oxidecomputer.com>
    Reviewed by: Dan McDonald <danmcd@mnx.io>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF