Project

General

Profile

Bug #1918

stack overflow from mac_promisc_dispatch()

Added by Bryan Cantrill over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2011-12-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:

Description

We saw a double fault with the following stack trace:

ffffff004133d130 ip_input+0x2e(ffffff091efb8c28, 0, ffffff0963c4e360, ffffff004133d160)
ffffff004133d1f0 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff0963c4e360, 1)
ffffff004133d240 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff0963c4e6e0, 1)
ffffff004133d2b0 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff0963c4e6e0, ffffff0915366898)
ffffff004133d3a0 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff0963c4e6e0, ffffff004133d3b0)
ffffff004133d430 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff0963c4e6e0, 0, 1, 0)
ffffff004133d4c0 mac_tx+0x35d(ffffff0915366898, ffffff0963c4e6e0, 0, 1, 0)
ffffff004133d520 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff0963c4e6e0, 0, 1)
ffffff004133d670 ip_xmit+0x7eb(ffffff0963c4e6e0, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff004133d770 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff0963c4e6e0, ffffff091cf355f8, ffffff004133d880)
ffffff004133d850 ill_input_short_v4+0x6ce(ffffff0963c4e6e0, ffffff091cf355f8, ffffff091cf35608, ffffff004133d880, ffffff004133da10)
ffffff004133da80 ip_input+0x23b(ffffff091efb8c28, 0, ffffff0963c4e6e0, ffffff004133dab0)
ffffff004133db40 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff0963c4e6e0, 1)
ffffff004133db90 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff096477fa80, 1)
ffffff004133dc00 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff096477fa80, ffffff0915366898)
ffffff004133dcf0 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff096477fa80, ffffff004133dd00)
ffffff004133dd80 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff096477fa80, 0, 1, 0)
ffffff004133de10 mac_tx+0x35d(ffffff0915366898, ffffff096477fa80, 0, 1, 0)
ffffff004133de70 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff096477fa80, 0, 1)
ffffff004133dfc0 ip_xmit+0x7eb(ffffff096477fa80, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff004133e0c0 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff096477fa80, ffffff09101703b8, ffffff004133e1d0)
ffffff004133e1a0 ill_input_short_v4+0x6ce(ffffff096477fa80, ffffff09101703b8, ffffff09101703c8, ffffff004133e1d0, ffffff004133e360)
ffffff004133e3d0 ip_input+0x23b(ffffff091efb8c28, 0, ffffff096477fa80, ffffff004133e400)
ffffff004133e490 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff096477fa80, 1)
ffffff004133e4e0 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff0964d044a0, 1)
ffffff004133e550 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff0964d044a0, ffffff0915366898)
ffffff004133e640 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff0964d044a0, ffffff004133e650)
ffffff004133e6d0 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff0964d044a0, 0, 1, 0)
ffffff004133e760 mac_tx+0x35d(ffffff0915366898, ffffff0964d044a0, 0, 1, 0)
ffffff004133e7c0 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff0964d044a0, 0, 1)
ffffff004133e910 ip_xmit+0x7eb(ffffff0964d044a0, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff004133ea10 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff0964d044a0, ffffff0909bdb778, ffffff004133eb20)
ffffff004133eaf0 ill_input_short_v4+0x6ce(ffffff0964d044a0, ffffff0909bdb778, ffffff0909bdb788, ffffff004133eb20, ffffff004133ecb0)
ffffff004133ed20 ip_input+0x23b(ffffff091efb8c28, 0, ffffff0964d044a0, ffffff004133ed50)
ffffff004133ede0 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff0964d044a0, 1)
ffffff004133ee30 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff091cf01ce0, 1)
ffffff004133eea0 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff091cf01ce0, ffffff0915366898)
ffffff004133ef90 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff091cf01ce0, ffffff004133efa0)
ffffff004133f020 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff091cf01ce0, 0, 1, 0)
ffffff004133f0b0 mac_tx+0x35d(ffffff0915366898, ffffff091cf01ce0, 0, 1, 0)
ffffff004133f110 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff091cf01ce0, 0, 1)
ffffff004133f260 ip_xmit+0x7eb(ffffff091cf01ce0, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff004133f360 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff091cf01ce0, ffffff0909d53db8, ffffff004133f470)
ffffff004133f440 ill_input_short_v4+0x6ce(ffffff091cf01ce0, ffffff0909d53db8, ffffff0909d53dc8, ffffff004133f470, ffffff004133f600)
ffffff004133f670 ip_input+0x23b(ffffff091efb8c28, 0, ffffff091cf01ce0, ffffff004133f6a0)
ffffff004133f730 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff091cf01ce0, 1)
ffffff004133f780 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff09263ddcc0, 1)
ffffff004133f7f0 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff09263ddcc0, ffffff0915366898)
ffffff004133f8e0 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff09263ddcc0, ffffff004133f8f0)
ffffff004133f970 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff09263ddcc0, 0, 1, 0)
ffffff004133fa00 mac_tx+0x35d(ffffff0915366898, ffffff09263ddcc0, 0, 1, 0)
ffffff004133fa60 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff09263ddcc0, 0, 1)
ffffff004133fbb0 ip_xmit+0x7eb(ffffff09263ddcc0, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff004133fcb0 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff09263ddcc0, ffffff0930019238, ffffff004133fdc0)
ffffff004133fd90 ill_input_short_v4+0x6ce(ffffff09263ddcc0, ffffff0930019238, ffffff0930019248, ffffff004133fdc0, ffffff004133ff50)
ffffff004133ffc0 ip_input+0x23b(ffffff091efb8c28, 0, ffffff09263ddcc0, ffffff004133fff0)
ffffff0041340080 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff09263ddcc0, 1)
ffffff00413400d0 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff0961205aa0, 1)
ffffff0041340140 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff0961205aa0, ffffff0915366898)
ffffff0041340230 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff0961205aa0, ffffff0041340240)
ffffff00413402c0 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff0961205aa0, 0, 1, 0)
ffffff0041340350 mac_tx+0x35d(ffffff0915366898, ffffff0961205aa0, 0, 1, 0)
ffffff00413403b0 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff0961205aa0, 0, 1)
ffffff0041340500 ip_xmit+0x7eb(ffffff0961205aa0, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff0041340600 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff0961205aa0, ffffff0925b6e9f8, ffffff0041340710)
ffffff00413406e0 ill_input_short_v4+0x6ce(ffffff0961205aa0, ffffff0925b6e9f8, ffffff0925b6ea08, ffffff0041340710, ffffff00413408a0)
ffffff0041340910 ip_input+0x23b(ffffff091efb8c28, 0, ffffff0961205aa0, ffffff0041340940)
ffffff00413409d0 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff0961205aa0, 1)
ffffff0041340a20 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff09610523c0, 1)
ffffff0041340a90 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff09610523c0, ffffff0915366898)
ffffff0041340b80 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff09610523c0, ffffff0041340b90)
ffffff0041340c10 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff09610523c0, 0, 1, 0)
ffffff0041340ca0 mac_tx+0x35d(ffffff0915366898, ffffff09610523c0, 0, 1, 0)
ffffff0041340d00 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff09610523c0, 0, 1)
ffffff0041340e50 ip_xmit+0x7eb(ffffff09610523c0, ffffff090aaa5c68, 80020000, 5c, 0, 0, 0, 0)
ffffff0041340f50 ire_recv_forward_v4+0x837(ffffff091d6f0858, ffffff09610523c0, ffffff090a3534b8, ffffff0041341060)
ffffff0041341030 ill_input_short_v4+0x6ce(ffffff09610523c0, ffffff090a3534b8, ffffff090a3534c8, ffffff0041341060, ffffff00413411f0)
ffffff0041341260 ip_input+0x23b(ffffff091efb8c28, 0, ffffff09610523c0, ffffff0041341290)
ffffff0041341320 dls_rx_promisc+0x181(ffffff092609cbe8, 0, ffffff09610523c0, 1)
ffffff0041341370 mac_promisc_dispatch_one+0x94(ffffff093fce24d8, ffffff09685025c0, 1)
ffffff00413413e0 mac_promisc_dispatch+0x110(ffffff091cf15510, ffffff09685025c0, ffffff0915366898)
ffffff00413414d0 mac_tx_send+0x483(ffffff0915366898, ffffff091cf38848, ffffff09685025c0, ffffff00413414e0)
ffffff0041341560 mac_tx_single_ring_mode+0x91(ffffff0926100300, ffffff09685025c0, e12a951f, 1, 0)
ffffff00413415f0 mac_tx+0x35d(ffffff0915366898, ffffff09685025c0, e12a951f, 1, 0)
ffffff0041341650 str_mdata_fastpath_put+0xa4(ffffff092609cbe8, ffffff09685025c0, e12a951f, 1)
ffffff00413417a0 ip_xmit+0x7eb(ffffff09685025c0, ffffff090aaa5c68, 180036060, 5c, e12a951f, 0, 0, ffffff0954a07468)
ffffff00413419d0 ire_send_wire_v4+0x345(ffffff091d6f0858, ffffff09685025c0, ffffff0910170c78, ffffff0954a07300, ffffff095cf3b9a8)
ffffff0041341a50 conn_ip_output+0x250(ffffff09685025c0, ffffff0954a07300)
ffffff0041341ae0 tcp_output+0x86d(ffffff0954a8f840, ffffff09677df620, ffffff092ff0bec0, 0)
ffffff0041341b70 squeue_enter+0x440(ffffff092ff0bec0, ffffff09677df620, ffffff09677df620, 1, 0, 4, 7)
ffffff0041341bf0 tcp_sendmsg+0x104(ffffff0954a8f840, ffffff09677df620, ffffff0041341ce0, ffffff09080480c0)
ffffff0041341c80 so_sendmsg+0x21b(ffffff095e5d8300, ffffff0041341ce0, ffffff0041341e20, ffffff09080480c0)
ffffff0041341cd0 socket_sendmsg+0x61(ffffff095e5d8300, ffffff0041341ce0, ffffff0041341e20, ffffff09080480c0)
ffffff0041341d50 socket_vop_write+0x63(ffffff095cf40180, ffffff0041341e20, 0, ffffff09080480c0, 0)
ffffff0041341de0 fop_write+0xd3(ffffff095cf40180, ffffff0041341e20, 0, ffffff09080480c0, 0)
ffffff0041341e90 write+0x2e2(4, 80e8880, 34)
ffffff0041341ec0 write32+0x22(4, 80e8880, 34)
ffffff0041341f10 _sys_sysenter_post_swapgs+0x149()

Note that a "snoop I external0" was running - with external0 being a VNIC on ixgbe0. Moreover, there was an additional promiscuous listener on ixgbe0 (a virtual machine) -- which may or may not be a critical ingredient.

History

#1

Updated by Dan McDonald over 7 years ago

Okay, I know what's wrong, but I'm not sure how to get into this state:

- ipnet snooping is supposed to never see looped-back unicast
transmissions, because dls_accept_promisc/dls_accept_common is
supposed to return FALSE (i.e. don't loop the transmitted packet back).
- Problem is, on the core dump:
1.) The dld_str_t's ds_promisc is set to 0!  It's supposed to
be set to something non-zero.
2.) Because of this, dls_accept_common() actually returns
TRUE, and we then go up to ip_input, where forwarding now
happens... again and again.

Again, I'm not at all sure how to get a node's dld_str_t into this state during promiscuous on/off.

#2

Updated by Bryan Cantrill over 7 years ago

Dan made the key breakthrough: ds_promisc is 0, but it's still on the mi_promisc_list. Looking at the code paths whereby this is possible leads to the smoking gun:


> ::stacks -c mac_callback_remove_wait
THREAD           STATE    SOBJ                COUNT
ffffff0041b7bc40 SLEEP    CV                      1
                 swtch+0x145
                 cv_wait+0x61
                 mac_callback_remove_wait+0x36
                 mac_promisc_remove+0xa5
                 dls_promisc+0x7f
                 proto_promiscoff_req+0xdb
                 dld_proto+0x51
                 dld_wput_nondata_task+0x8b
                 taskq_d_thread+0xb1
                 thread_start+8

proto_promiscoff_req() clears ds_promisc, but if walkers are active (as they are in this case), the code will fall into mac_callback_remove_wait() and leave it to the walker to clean things up via MAC_PROMISC_WALKER_DCR(). The fix is that dls_accept_promisc() must recognize this window: is ds_promisc is 0, it should always return B_FALSE.

#3

Updated by Dan McDonald over 7 years ago

I think I've been able to reproduce the bug.

Here's the scenario:

1.) Enable ip_forwarding.

2.) Consider this segment of code in proto_promiscoff_req:

    665 
    666     promisc_saved = dsp->ds_promisc;
    667     switch (dlp->dl_level) {
    668     case DL_PROMISC_SAP:
    669         if (!(dsp->ds_promisc & DLS_PROMISC_SAP)) {
    670             dl_err = DL_NOTENAB;
    671             goto failed;
    672         }
    673         dsp->ds_promisc &= ~DLS_PROMISC_SAP;
    674         break;
    675 
    676     case DL_PROMISC_MULTI:
    677         if (!(dsp->ds_promisc & DLS_PROMISC_MULTI)) {
    678             dl_err = DL_NOTENAB;
    679             goto failed;
    680         }
    681         dsp->ds_promisc &= ~DLS_PROMISC_MULTI;
    682         break;
    683 
    684     case DL_PROMISC_PHYS:
    685         if (!(dsp->ds_promisc & DLS_PROMISC_PHYS)) {
    686             dl_err = DL_NOTENAB;
    687             goto failed;
    688         }
    689         dsp->ds_promisc &= ~DLS_PROMISC_PHYS;
    690         break;
    691 
    692     default:
    693         dl_err = DL_NOTSUPPORTED;
    694         goto failed;
    695     }
    696 

     /* XXX KEBE SAYS here's a window. */

    697     mac_perim_enter_by_mh(dsp->ds_mh, &mph);
Notice how the mac perimeter isn't entered until AFTER ds_promisc is zeroed out?

Anytime between the switch and the mac_perim_enter_by_mh(), a packet could be promiscuously looped back and hit the scenario described in the last comment.

3.) Use DTrace's chill (with an increased limit) to keep that window wide-open.

#!/usr/sbin/dtrace -FCws

proto_promiscoff_req:entry
{
        self->trace = 1;
        printf("Hello\n");
}

dls_promisc:entry
/self->trace == 1/
{
        printf("Good...\n");
        chill(4000000000); /* Needs kernel help to be this big. */
        printf("\t...bye\n");
}

3.) Get a good source of off-link traffic flowing (assuming you have a route for it). I used ssh to an offlink host, and then had it "od -tx4 /dev/urandom". Lots of bits flowing.

4.) Run "snoop -I <interface with offlink route> -o >somewhere".

5.) Let a few packets get captured, then kill the snoop. If it doesn't panic, go back to step 4; it will happen.

6.) BOOM, say goodbye to your stack.

I believe the proper solution is to move the mac_perim_enter_by_mh() to the beginning of the code snippet I showed above. It's 2am, though, I'll test this tomorrow.

#4

Updated by Bryan Cantrill over 7 years ago

FWIW, here are the diffs for the dls_accept_promisc() solution:


--- a/usr/src/uts/common/io/dls/dls.c
+++ b/usr/src/uts/common/io/dls/dls.c
@@ -24,6 +24,10 @@
  */

 /*
+ * Copyright 2011 Joyent, Inc.  All rights reserved.
+ */
+
+/*
  * Data-Link Services Module
  */

@@ -596,6 +600,22 @@ boolean_t
 dls_accept_promisc(dld_str_t *dsp, mac_header_info_t *mhip, dls_rx_t *ds_rx,
     void **ds_rx_arg, boolean_t loopback)
 {
+       if (dsp->ds_promisc == 0) {
+               /*
+                * If there are active walkers of the mi_promisc_list when
+                * promiscuousness is disabled, ds_promisc will be cleared,
+                * but the DLS will remain on the mi_promisc_list until the
+                * walk is completed.  If we do not recognize this case here,
+                * we won't properly execute the ds_promisc case in the common
+                * accept routine -- and we will potentially accept a packet
+                * that has originated with this DLS (which in turn can
+                * induce recursion and death by stack overflow).  If
+                * ds_promisc is zero, we know that we are in this window --
+                * and we refuse to accept the packet.
+                */
+               return (B_FALSE);
+       }
+
        return (dls_accept_common(dsp, mhip, ds_rx, ds_rx_arg, B_TRUE,
            loopback));
 }

Will discuss with Dan the relative merits/risk of this solution versus his proposal of moving the mac_perim_enter_by_mh()...

#5

Updated by Dan McDonald over 7 years ago

Simply moving the mac_perim_enter_by_mh() doesn't help. The aforementioned DTrace script still makes the machine stack-overflow.

The problem is that regardless, there is still a window where promiscuous processing can take place while the ds_promisc value is possibly 0 (which ignores any loopback semantics, causing a forwarding loop when ip_forwarding is enabled).

The solution is to guarantee that before adding promiscuous processing hooks, ds_promisc has its bits turned on, and that ONLY AFTER REMOVING promiscuous processing hooks, ds_promisc has its bits turned off.

The solution centers around a change to the semantics and calling setup of dls_promisc() function. Its current semantics are that it expects the caller to change ds_promisc prior to calling, at which time it compares the new ds_promisc to the passed-in "old_flags". The new semantics are that "new_flags" are passed in, and dls_promisc() sets ds_promisc per the needed guarantees in the previous paragraph. The existing code is a bit hairy in its error cases, but in the enabling case, the caller covers this, and in the disabling case, errors only occur when VLAN failures come into play.

Some code gets cleaner, other code merely changes. A webrev is here:

http://www.kebe.com/~danmcd/webrevs/1918/

for perusal. Modules build and are modlintlib-clean, and a paired dls and dld replacement shows that the DTrace script can no longer keep the window open and cause a stack overflow.

#6

Updated by Dan McDonald over 7 years ago

Also, can someone with permission move this out of "site" and into "illumos kernel" or whereever this bug really belongs?

#7

Updated by Bryan Horstmann-Allen over 7 years ago

  • Project changed from site to illumos gate
#8

Updated by Dan McDonald over 7 years ago

  • Category set to kernel
  • Assignee set to Dan McDonald
  • % Done changed from 0 to 70
  • Difficulty changed from Medium to Hard
  • Tags deleted (needs-triage)
#9

Updated by Rich Lowe over 7 years ago

  • Status changed from New to Resolved
  • % Done changed from 70 to 100

Resolved in r13556 commit:6d80fb09c7b8

Also available in: Atom PDF