Project

General

Profile

Bug #11490

SRS ring polling disabled for VLANs

Added by Robert Mustacchi 5 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Through investigation we discovered there are a large number of zones
with a large number of MAC rxsdrops. This is surprising, because in
theory there should be rings in play. In fact, here we are on QP01C02H01
which is running joyent_20160204T171016Z.

If we look at all of it's ixgbe_t's:

> ::prtconf ! grep ixgbe | awk '{ print $1}' > /var/tmp/rm/ixgbe
> ::cat /var/tmp/rm/ixgbe | ::print struct dev_info devi_driver_data | ::print ixgbe_t num_rx_groups num_rx_rings
num_rx_groups = 0x20
num_rx_rings = 0x80
num_rx_groups = 0x20
num_rx_rings = 0x80
num_rx_groups = 0x20
num_rx_rings = 0x80
num_rx_groups = 0x20
num_rx_rings = 0x80

If we look at this, we see that we have plenty of rings and groups, far
more than this box has vnics.

[root@QP01C02H01 (jupiter) /var/tmp/rm]# dladm show-vnic -p -o link | wc -l
      19

However, if we look at these mac srs drops stats, we see something interesting:

igb0:0:mac_misc_stat:rxsdrops   0
igb0:0:mac_rx_hwlane0:rxsdrops  0
igb0:0:mac_rx_swlane0:rxsdrops  0
igb1:0:mac_misc_stat:rxsdrops   0
igb1:0:mac_rx_hwlane0:rxsdrops  0
igb1:0:mac_rx_swlane0:rxsdrops  0
ixgbe0:0:mac_misc_stat:rxsdrops 0
ixgbe0:0:mac_rx_hwlane4:rxsdrops        0
ixgbe0:0:mac_rx_hwlane5:rxsdrops        0
ixgbe0:0:mac_rx_hwlane6:rxsdrops        0
ixgbe0:0:mac_rx_hwlane7:rxsdrops        0
ixgbe0:0:mac_rx_swlane0:rxsdrops        0
ixgbe1:0:mac_misc_stat:rxsdrops 0
ixgbe1:0:mac_rx_hwlane4:rxsdrops        0
ixgbe1:0:mac_rx_hwlane5:rxsdrops        0
ixgbe1:0:mac_rx_hwlane6:rxsdrops        0
ixgbe1:0:mac_rx_hwlane7:rxsdrops        0
ixgbe1:0:mac_rx_swlane0:rxsdrops        0
ixgbe2:0:mac_misc_stat:rxsdrops 0
ixgbe2:0:mac_rx_hwlane4:rxsdrops        0
ixgbe2:0:mac_rx_hwlane5:rxsdrops        0
ixgbe2:0:mac_rx_hwlane6:rxsdrops        0
ixgbe2:0:mac_rx_hwlane7:rxsdrops        0
ixgbe2:0:mac_rx_swlane0:rxsdrops        0
ixgbe3:0:mac_misc_stat:rxsdrops 0
ixgbe3:0:mac_rx_hwlane4:rxsdrops        0
ixgbe3:0:mac_rx_hwlane5:rxsdrops        0
ixgbe3:0:mac_rx_hwlane6:rxsdrops        0
ixgbe3:0:mac_rx_hwlane7:rxsdrops        0
ixgbe3:0:mac_rx_swlane0:rxsdrops        0
z13_eth0:0:mac_misc_stat:rxsdrops       0
z13_eth0:0:mac_rx_swlane0:rxsdrops      4953776
z14_eth0:0:mac_misc_stat:rxsdrops       0
z14_eth0:0:mac_rx_swlane0:rxsdrops      1079036
z15_eth0:0:mac_misc_stat:rxsdrops       0
z15_eth0:0:mac_rx_swlane0:rxsdrops      3453358
z16_eth0:0:mac_misc_stat:rxsdrops       0
z16_eth0:0:mac_rx_swlane0:rxsdrops      4766797
z17_eth0:0:mac_misc_stat:rxsdrops       0
z17_eth0:0:mac_rx_swlane0:rxsdrops      4373173
z18_eth0:0:mac_misc_stat:rxsdrops       0
z18_eth0:0:mac_rx_swlane0:rxsdrops      299795
z19_eth0:0:mac_misc_stat:rxsdrops       0
z19_eth0:0:mac_rx_swlane0:rxsdrops      7299191
z20_eth0:0:mac_misc_stat:rxsdrops       0
z20_eth0:0:mac_rx_swlane0:rxsdrops      5496324
z21_eth0:0:mac_misc_stat:rxsdrops       0
z21_eth0:0:mac_rx_swlane0:rxsdrops      1701288
z22_eth0:0:mac_misc_stat:rxsdrops       0
z22_eth0:0:mac_rx_swlane0:rxsdrops      346209
z23_eth0:0:mac_misc_stat:rxsdrops       0
z23_eth0:0:mac_rx_swlane0:rxsdrops      8937596
z24_eth0:0:mac_misc_stat:rxsdrops       0
z24_eth0:0:mac_rx_swlane0:rxsdrops      4499318
z25_eth0:0:mac_misc_stat:rxsdrops       0
z25_eth0:0:mac_rx_swlane0:rxsdrops      253240
z26_eth0:0:mac_misc_stat:rxsdrops       0
z26_eth0:0:mac_rx_swlane0:rxsdrops      4503777
z27_eth0:0:mac_misc_stat:rxsdrops       0
z27_eth0:0:mac_rx_swlane0:rxsdrops      4788049
z28_eth0:0:mac_misc_stat:rxsdrops       0
z28_eth0:0:mac_rx_swlane0:rxsdrops      2351624
z29_eth0:0:mac_misc_stat:rxsdrops       0
z29_eth0:0:mac_rx_swlane0:rxsdrops      320165
z30_eth0:0:mac_misc_stat:rxsdrops       0
z30_eth0:0:mac_rx_swlane0:rxsdrops      8184349
z31_eth0:0:mac_misc_stat:rxsdrops       0
z31_eth0:0:mac_rx_swlane0:rxsdrops      225596

Note how all these zones only have software lane stats while the ixgbe
instances have multiple hardware lanes, the expected four, in fact.

Okay, so what does MAC actually thing we have for all these clients...

> ::walk mac_client_impl_cache | ::printf "%p %s\n" mac_client_impl_t . mci_name ! grep z | awk '{ print $1 }' > /var/tmp/rm/vnic.mci
> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_ring_group
mci_flent->fe_rx_ring_group = 0xffffff32936a76c0
mci_flent->fe_rx_ring_group = 0xffffff32936a7b40
mci_flent->fe_rx_ring_group = 0xffffff32936a7ab0
mci_flent->fe_rx_ring_group = 0xffffff32936a7a20
mci_flent->fe_rx_ring_group = 0xffffff32936a7900
mci_flent->fe_rx_ring_group = 0xffffff32936a7870
mci_flent->fe_rx_ring_group = 0xffffff32936a77e0
mci_flent->fe_rx_ring_group = 0xffffff32936a7990
mci_flent->fe_rx_ring_group = 0xffffff32936a7750
mci_flent->fe_rx_ring_group = 0xffffff32936a7630
mci_flent->fe_rx_ring_group = 0xffffff32936a75a0
mci_flent->fe_rx_ring_group = 0xffffff32936a7510
mci_flent->fe_rx_ring_group = 0xffffff32936a73f0
mci_flent->fe_rx_ring_group = 0xffffff32936a7360
mci_flent->fe_rx_ring_group = 0xffffff32936a72d0
mci_flent->fe_rx_ring_group = 0xffffff32936a7240
mci_flent->fe_rx_ring_group = 0xffffff32936a71b0
mci_flent->fe_rx_ring_group = 0xffffff32936a7120
mci_flent->fe_rx_ring_group = 0xffffff32936a7480
> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_srs_cnt
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_ring_group | ::print mac_group_t mrg_cur_count
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4
mrg_cur_count = 0x4

So based on this, in theory each of these should all have four rings. So, why
don't they seem to be using them and instead only showing software lanes?
That's a bit odd. So, the real question is can the srs in the system support
polling. In theory, they should be able to if we have these rings. So let's see.

> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_srs[0] | ::print mac_soft_ring_set_t srs_state
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000812
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000812
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000002
srs_state = 0x24000812
srs_state = 0x24000002
srs_state = 0x24000002

Well, the srs polling capable flag is 0x00000008, so nope. Well, that's not
what we expect.

> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_srs[0] | ::print mac_soft_ring_set_t srs_soft_ring_count
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18
srs_soft_ring_count = 0x18

Okay, they have software rings at least (and quite a lot). But those don't let us poll. In fact, here's a representative srs:

srs_rx = {
    srs_rx.sr_func = mac_rx_deliver
    srs_rx.sr_arg1 = 0xffffff67b10de560
    srs_rx.sr_arg2 = 0
    srs_rx.sr_lower_proc = mac_rx_srs_process
    srs_rx.sr_poll_pkt_cnt = 0
    srs_rx.sr_poll_thres = 0x10
    srs_rx.sr_hiwat = 0x400
    srs_rx.sr_lowat = 0x100
    srs_rx.sr_stat = {
        mrs_lclbytes = 0x227f1a6
        mrs_lclcnt = 0x834b1
        mrs_pollcnt = 0
        mrs_pollbytes = 0
        mrs_intrcnt = 0x515963262
        mrs_intrbytes = 0x10b7f97e27fb
        mrs_sdrops = 0x6f6b89
        mrs_chaincntundr10 = 0
        mrs_chaincnt10to50 = 0
        mrs_chaincntover50 = 0
        mrs_ierrors = 0
    }
    srs_rx.sr_poll_on = 0
    srs_rx.sr_worker_poll_on = 0
    srs_rx.sr_poll_off = 0
    srs_rx.sr_poll_thr_sig = 0x1e928ddc
    srs_rx.sr_poll_thr_busy = 0x1e928ddc
    srs_rx.sr_poll_drain_no_poll = 0
    srs_rx.sr_poll_no_poll = 0
    srs_rx.sr_below_hiwat = 0
    srs_rx.sr_poll_again = 0
    srs_rx.sr_poll_sig_worker = 0
    srs_rx.sr_poll_intr_enable = 0
    srs_rx.sr_poll_goto_sleep = 0
    srs_rx.sr_drain_again = 0x1467148
    srs_rx.sr_drain_poll_sig = 0x1e928ddc
    srs_rx.sr_drain_worker_sig = 0x1e0a87af
    srs_rx.sr_drain_poll_running = 0
    srs_rx.sr_drain_keep_polling = 0
    srs_rx.sr_drain_finish_intr = 0xd5d787ab
    srs_rx.sr_poll_worker_wakeup = 0
}

Well. That's awkard. it has never had polling turned on ever. Which fits that
it's not polling capable. So that's odd. Now, let's try and approach this from
another angle. If we look at the mac_srs output on the box:

> ::mac_srs ! grep eth
            ADDR LINK_NAME                       FLENT          HW RING DIR
...
ffffff719a5a0340 z31_eth0             ffffff719a6801c8                0 TX  
ffffff67d5002000 z31_eth0             ffffff719a6801c8                0 RX  
ffffff67d5002cc0 z30_eth0             ffffff7142bddb38                0 TX  
ffffff702d6c3040 z28_eth0             ffffff34b0eba020                0 TX  
ffffff67d5003980 z30_eth0             ffffff7142bddb38                0 RX  
ffffff702d6c3d00 z28_eth0             ffffff34b0eba020                0 RX  
ffffff67d5004640 z29_eth0             ffffff7142be4620                0 TX  
ffffff702d6c49c0 z27_eth0             ffffff7142bf1bf0                0 TX  
ffffff67d5005300 z29_eth0             ffffff7142be4620                0 RX  
ffffff702d6c5680 z27_eth0             ffffff7142bf1bf0                0 RX  
ffffff702d6c6340 z26_eth0             ffffff7142bff1c0                0 TX  
ffffff67c5888000 z19_eth0             ffffff67c6a4c6c0                0 TX  
ffffff67c5888cc0 z13_eth0             ffffff67c5bd50e8                0 RX  
ffffff67c5889980 z14_eth0             ffffff67c6a45bd8                0 TX  
ffffff67c588a640 z15_eth0             ffffff67c6a38608                0 RX  
ffffff70ab60c040 z22_eth0             ffffff70dd7571b8                0 TX  
ffffff703ca8c000 z20_eth0             ffffff702c8550f8                0 RX  
ffffff70ab60cd00 z22_eth0             ffffff70dd7571b8                0 RX  
ffffff703ca8ccc0 z18_eth0             ffffff702c8626c8                0 TX  
ffffff70ab60d9c0 z21_eth0             ffffff702c847b28                0 TX  
ffffff703ca8d980 z18_eth0             ffffff702c8626c8                0 RX  
ffffff70ab60e680 z21_eth0             ffffff702c847b28                0 RX  
ffffff703ca8e640 z16_eth0             ffffff67c6a2b038                0 TX  
ffffff70ab60f340 z20_eth0             ffffff702c8550f8                0 TX  
ffffff703ca8f300 z16_eth0             ffffff67c6a2b038                0 RX  
ffffff329ceaf300 z17_eth0             ffffff67c5bdbbd0                0 RX  
ffffff329ceb0040 z17_eth0             ffffff67c5bdbbd0                0 TX  
ffffff67b18d1d00 z23_eth0             ffffff6792741b10                0 TX  
ffffff67b18d29c0 z23_eth0             ffffff6792741b10                0 RX  
ffffff710a913000 z26_eth0             ffffff7142bff1c0                0 RX  
ffffff67b18d3680 z14_eth0             ffffff67c6a45bd8                0 RX  
ffffff710a913cc0 z25_eth0             ffffff70dd735b30                0 TX  
ffffff67b18d4340 z13_eth0             ffffff67c5bd50e8                0 TX  
ffffff710a914980 z25_eth0             ffffff70dd735b30                0 RX  
ffffff710a915640 z24_eth0             ffffff70dd743100                0 TX  
ffffff710a916300 z24_eth0             ffffff70dd743100                0 RX  
ffffff67f8396040 z15_eth0             ffffff67c6a38608                0 TX  
ffffff67f83979c0 z19_eth0             ffffff67c6a4c6c0                0 RX  

We'll see no vnic has hardware rings. And if we look at another box in the metadata tier:

> ::mac_srs ! grep net | grep RX
            ADDR LINK_NAME                       FLENT          HW RING DIR
...
ffffff60b26c0300 z5_net2              ffffff60b27356b8 ffffff32936f2230 RX  
ffffff60b26c1040 z5_net2              ffffff60b27356b8 ffffff32936f2170 RX  
ffffff60b26c1d00 z5_net2              ffffff60b27356b8 ffffff32936f20b0 RX  
ffffff60b26c29c0 z5_net2              ffffff60b27356b8                0 RX  
ffffff5e79583040 z4_net1              ffffff5d7e32e5f8 ffffff32936f2470 RX  
ffffff5e79583d00 z4_net1              ffffff5d7e32e5f8 ffffff32936f23b0 RX  
ffffff60b26c4340 z5_net1              ffffff60b273c1a0                0 RX  
ffffff5e795849c0 z4_net1              ffffff5d7e32e5f8                0 RX  
ffffff5e79586340 z4_net0              ffffff5d7e3350e0                0 RX  
ffffff34c2e08040 z2_net0              ffffff34b696f020                0 RX  
ffffff34c2e099c0 z1_net1              ffffff34b6975b08 ffffff32936f2ef0 RX  
ffffff34c2e0a680 z1_net1              ffffff34b6975b08 ffffff32936f2e30 RX  
ffffff34c2e0b340 z1_net1              ffffff34b6975b08 ffffff32936f2d70 RX  
ffffff5d7e36d040 z3_net1              ffffff5d7e33bbc8                0 RX  
ffffff332ecee000 z1_net1              ffffff34b6975b08 ffffff32936f2cb0 RX  
ffffff5d7e36e9c0 z3_net0              ffffff5d7e3426b0                0 RX  
ffffff332eceecc0 z1_net1              ffffff34b6975b08                0 RX  
ffffff5d7e370340 z2_net1              ffffff5d7e349198 ffffff32936f2bf0 RX  
ffffff3320e94000 z1_net0              ffffff32bfd2d180                0 RX  
ffffff34cc555000 z2_net1              ffffff5d7e349198 ffffff32936f2b30 RX  
ffffff34cc555cc0 z2_net1              ffffff5d7e349198 ffffff32936f2a70 RX  
ffffff34cc556980 z2_net1              ffffff5d7e349198 ffffff32936f29b0 RX  
ffffff34cc557640 z2_net1              ffffff5d7e349198                0 RX  
ffffff5e79577cc0 z5_net0              ffffff5d7e327b10                0 RX  
ffffff5e79579640 z4_net1              ffffff5d7e32e5f8 ffffff32936f25f0 RX  
ffffff5d7e9b9cc0 z3_net1              ffffff5d7e33bbc8 ffffff32936f28f0 RX  
ffffff5e7957a300 z4_net1              ffffff5d7e32e5f8 ffffff32936f2530 RX  
ffffff5d7e9ba980 z3_net1              ffffff5d7e33bbc8 ffffff32936f2830 RX  
ffffff5d7e9bb640 z3_net1              ffffff5d7e33bbc8 ffffff32936f2770 RX  
ffffff5d7e9bc300 z3_net1              ffffff5d7e33bbc8 ffffff32936f26b0 RX  
ffffff60b26bf640 z5_net2              ffffff60b27356b8 ffffff32936f22f0 RX

Okay, so given all this, what's going on. It appears that we think the groups
are allocated. But this is even weirder. If we look at ::mac_group on the
original box:

> ::mac_group
ADDR             TYPE     STATE      NRINGS NCLIENTS RINGS           
ffffff32936cb040 RECEIVE  RESERVED        1        1 ffffff329367e5b0
ffffff32936a7000 RECEIVE  SHARED          4        0 ffffff32933b9038
ffffff32936a7090 RECEIVE  RESERVED        4        1 ffffff3293684c40
ffffff32936a7120 RECEIVE  RESERVED        4        1 ffffff3293684940
ffffff32936a71b0 RECEIVE  RESERVED        4        1 ffffff3293684640
ffffff32936a7240 RECEIVE  RESERVED        4        1 ffffff3293684340
ffffff32936a72d0 RECEIVE  RESERVED        4        1 ffffff3293684040
ffffff32936a7360 RECEIVE  RESERVED        4        1 ffffff3293683c48
ffffff32936a73f0 RECEIVE  RESERVED        4        1 ffffff3293683948
ffffff32936a7480 RECEIVE  RESERVED        4        1 ffffff3293683648
ffffff32936a7510 RECEIVE  RESERVED        4        1 ffffff3293683348
ffffff32936a75a0 RECEIVE  RESERVED        4        1 ffffff3293683048
ffffff32936a7630 RECEIVE  RESERVED        4        1 ffffff3293682c50
ffffff32936a76c0 RECEIVE  RESERVED        4        1 ffffff3293682950
ffffff32936a7750 RECEIVE  RESERVED        4        1 ffffff3293682650
ffffff32936a77e0 RECEIVE  RESERVED        4        1 ffffff3293682350
ffffff32936a7870 RECEIVE  RESERVED        4        1 ffffff3293682050
ffffff32936a7900 RECEIVE  RESERVED        4        1 ffffff3293681c58
ffffff32936a7990 RECEIVE  RESERVED        4        1 ffffff3293681958
ffffff32936a7a20 RECEIVE  RESERVED        4        1 ffffff3293681658
ffffff32936a7ab0 RECEIVE  RESERVED        4        1 ffffff3293681358
ffffff32936a7b40 RECEIVE  RESERVED        4        1 ffffff3293681058
ffffff32936a7bd0 RECEIVE  REGISTERED      4        0 ffffff3293680c60
ffffff32936a7c60 RECEIVE  REGISTERED      4        0 ffffff3293680960
ffffff32936a7cf0 RECEIVE  REGISTERED      4        0 ffffff3293680660
ffffff32936a7d80 RECEIVE  REGISTERED      4        0 ffffff3293680360
ffffff32936a7e10 RECEIVE  REGISTERED      4        0 ffffff3293680060
ffffff32936a7ea0 RECEIVE  REGISTERED      4        0 ffffff329367fc68
ffffff32936a7f30 RECEIVE  REGISTERED      4        0 ffffff329367f968
ffffff32936a7fc0 RECEIVE  REGISTERED      4        0 ffffff329367f668
ffffff32936a8050 RECEIVE  REGISTERED      4        0 ffffff329367f368
ffffff32936a80e0 RECEIVE  REGISTERED      4        0 ffffff329367f068
ffffff32936a8170 RECEIVE  REGISTERED      4        0 ffffff329367ec70
ffffff3290504000 RECEIVE  SHARED          4        0 ffffff32933bfc08
ffffff3290504090 RECEIVE  RESERVED        4        1 ffffff32933bf908
ffffff3290504120 RECEIVE  REGISTERED      4        0 ffffff32933bf608
ffffff32905041b0 RECEIVE  REGISTERED      4        0 ffffff32933bf308
ffffff3290504240 RECEIVE  REGISTERED      4        0 ffffff32933bf008
ffffff32905042d0 RECEIVE  REGISTERED      4        0 ffffff32933bec10
ffffff3290504360 RECEIVE  REGISTERED      4        0 ffffff32933be910
ffffff32905043f0 RECEIVE  REGISTERED      4        0 ffffff32933be610
ffffff3290504480 RECEIVE  REGISTERED      4        0 ffffff32933be310
ffffff3290504510 RECEIVE  REGISTERED      4        0 ffffff32933be010
ffffff32905045a0 RECEIVE  REGISTERED      4        0 ffffff32933bdc18
ffffff3290504630 RECEIVE  REGISTERED      4        0 ffffff32933bd918
ffffff32905046c0 RECEIVE  REGISTERED      4        0 ffffff32933bd618
ffffff3290504750 RECEIVE  REGISTERED      4        0 ffffff32933bd318
ffffff32905047e0 RECEIVE  REGISTERED      4        0 ffffff32933bd018
ffffff3290504870 RECEIVE  REGISTERED      4        0 ffffff32933bcc20
ffffff3290504900 RECEIVE  REGISTERED      4        0 ffffff32933bc920
ffffff3290504990 RECEIVE  REGISTERED      4        0 ffffff32933bc620
ffffff3290504a20 RECEIVE  REGISTERED      4        0 ffffff32933bc320
ffffff3290504ab0 RECEIVE  REGISTERED      4        0 ffffff32933bc020
ffffff3290504b40 RECEIVE  REGISTERED      4        0 ffffff32933bbc28
ffffff3290504bd0 RECEIVE  REGISTERED      4        0 ffffff32933bb928
ffffff3290504c60 RECEIVE  REGISTERED      4        0 ffffff32933bb628
ffffff3290504cf0 RECEIVE  REGISTERED      4        0 ffffff32933bb328
ffffff3290504d80 RECEIVE  REGISTERED      4        0 ffffff32933bb028
ffffff3290504e10 RECEIVE  REGISTERED      4        0 ffffff32933bac30
ffffff3290504ea0 RECEIVE  REGISTERED      4        0 ffffff32933ba930
ffffff3290504f30 RECEIVE  REGISTERED      4        0 ffffff32933ba630
ffffff3290504fc0 RECEIVE  REGISTERED      4        0 ffffff32933ba330
ffffff3290505050 RECEIVE  REGISTERED      4        0 ffffff32933ba030
ffffff32905050e0 RECEIVE  REGISTERED      4        0 ffffff32933b9c38
ffffff3290505170 RECEIVE  REGISTERED      4        0 ffffff32933b9938
ffffff329369f000 RECEIVE  SHARED          4        0 ffffff32937703a8
ffffff329369f090 RECEIVE  RESERVED        4        1 ffffff32937700a8
ffffff329369f120 RECEIVE  REGISTERED      4        0 ffffff32937e0cb0
ffffff329369f1b0 RECEIVE  REGISTERED      4        0 ffffff32937e09b0
ffffff329369f240 RECEIVE  REGISTERED      4        0 ffffff32937e06b0
ffffff329369f2d0 RECEIVE  REGISTERED      4        0 ffffff32937e03b0
ffffff329369f360 RECEIVE  REGISTERED      4        0 ffffff32937e00b0
ffffff329369f3f0 RECEIVE  REGISTERED      4        0 ffffff32937dfcb8
ffffff329369f480 RECEIVE  REGISTERED      4        0 ffffff32937df9b8
ffffff329369f510 RECEIVE  REGISTERED      4        0 ffffff32937df6b8
ffffff329369f5a0 RECEIVE  REGISTERED      4        0 ffffff32937df3b8
ffffff329369f630 RECEIVE  REGISTERED      4        0 ffffff32937df0b8
ffffff329369f6c0 RECEIVE  REGISTERED      4        0 ffffff32937dec00
ffffff329369f750 RECEIVE  REGISTERED      4        0 ffffff32937de900
ffffff329369f7e0 RECEIVE  REGISTERED      4        0 ffffff32937de600
ffffff329369f870 RECEIVE  REGISTERED      4        0 ffffff32937de300
ffffff329369f900 RECEIVE  REGISTERED      4        0 ffffff32937de000
ffffff329369f990 RECEIVE  REGISTERED      4        0 ffffff32937ddc08
ffffff329369fa20 RECEIVE  REGISTERED      4        0 ffffff32937dd908
ffffff329369fab0 RECEIVE  REGISTERED      4        0 ffffff32937dd608
ffffff329369fb40 RECEIVE  REGISTERED      4        0 ffffff32937dd308
ffffff329369fbd0 RECEIVE  REGISTERED      4        0 ffffff32937dd008
ffffff329369fc60 RECEIVE  REGISTERED      4        0 ffffff32937dcc10
ffffff329369fcf0 RECEIVE  REGISTERED      4        0 ffffff32937dc910
ffffff329369fd80 RECEIVE  REGISTERED      4        0 ffffff32937dc610
ffffff329369fe10 RECEIVE  REGISTERED      4        0 ffffff32937dc310
ffffff329369fea0 RECEIVE  REGISTERED      4        0 ffffff32937dc010
ffffff329369ff30 RECEIVE  REGISTERED      4        0 ffffff32937dbc18
ffffff329369ffc0 RECEIVE  REGISTERED      4        0 ffffff32937db918
ffffff32936a0050 RECEIVE  REGISTERED      4        0 ffffff32937db618
ffffff32936a00e0 RECEIVE  REGISTERED      4        0 ffffff32937db318
ffffff32936a0170 RECEIVE  REGISTERED      4        0 ffffff32937db018
ffffff3290824000 RECEIVE  SHARED          4        0 ffffff329367e070
ffffff3290824090 RECEIVE  RESERVED        4        1 ffffff3293776c78
ffffff3290824120 RECEIVE  REGISTERED      4        0 ffffff3293776978
ffffff32908241b0 RECEIVE  REGISTERED      4        0 ffffff3293776678
ffffff3290824240 RECEIVE  REGISTERED      4        0 ffffff3293776378
ffffff32908242d0 RECEIVE  REGISTERED      4        0 ffffff3293776078
ffffff3290824360 RECEIVE  REGISTERED      4        0 ffffff3293775c80
ffffff32908243f0 RECEIVE  REGISTERED      4        0 ffffff3293775980
ffffff3290824480 RECEIVE  REGISTERED      4        0 ffffff3293775680
ffffff3290824510 RECEIVE  REGISTERED      4        0 ffffff3293775380
ffffff32908245a0 RECEIVE  REGISTERED      4        0 ffffff3293775080
ffffff3290824630 RECEIVE  REGISTERED      4        0 ffffff3293774c88
ffffff32908246c0 RECEIVE  REGISTERED      4        0 ffffff3293774988
ffffff3290824750 RECEIVE  REGISTERED      4        0 ffffff3293774688
ffffff32908247e0 RECEIVE  REGISTERED      4        0 ffffff3293774388
ffffff3290824870 RECEIVE  REGISTERED      4        0 ffffff3293774088
ffffff3290824900 RECEIVE  REGISTERED      4        0 ffffff3293773c90
ffffff3290824990 RECEIVE  REGISTERED      4        0 ffffff3293773990
ffffff3290824a20 RECEIVE  REGISTERED      4        0 ffffff3293773690
ffffff3290824ab0 RECEIVE  REGISTERED      4        0 ffffff3293773390
ffffff3290824b40 RECEIVE  REGISTERED      4        0 ffffff3293773090
ffffff3290824bd0 RECEIVE  REGISTERED      4        0 ffffff3293772c98
ffffff3290824c60 RECEIVE  REGISTERED      4        0 ffffff3293772998
ffffff3290824cf0 RECEIVE  REGISTERED      4        0 ffffff3293772698
ffffff3290824d80 RECEIVE  REGISTERED      4        0 ffffff3293772398
ffffff3290824e10 RECEIVE  REGISTERED      4        0 ffffff3293772098
ffffff3290824ea0 RECEIVE  REGISTERED      4        0 ffffff3293771ca0
ffffff3290824f30 RECEIVE  REGISTERED      4        0 ffffff32937719a0
ffffff3290824fc0 RECEIVE  REGISTERED      4        0 ffffff32937716a0
ffffff3290825050 RECEIVE  REGISTERED      4        0 ffffff32937713a0
ffffff32908250e0 RECEIVE  REGISTERED      4        0 ffffff32937710a0
ffffff3290825170 RECEIVE  REGISTERED      4        0 ffffff3293770ca8
ffffff329377b6c0 RECEIVE  RESERVED        1        1 ffffff329367e430

So who actually has reserved rings?

> ::mac_group -r ! grep RESERVED | awk '{ print $1 }' > /var/tmp/rm/groups
> ::cat /var/tmp/rm/groups | ::print mac_group_t mrg_clients->mgc_client->mci_name
mrg_clients->mgc_client->mci_name = [ "igb0" ]
mrg_clients->mgc_client->mci_name = [ "ixgbe1" ]
mrg_clients->mgc_client->mci_name = [ "z13_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z14_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z15_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z16_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z17_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z18_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z19_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z20_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z21_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z22_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z23_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z24_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z25_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z26_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z27_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z28_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z29_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z30_eth0" ]
mrg_clients->mgc_client->mci_name = [ "z31_eth0" ]
mrg_clients->mgc_client->mci_name = [ "ixgbe0" ]
mrg_clients->mgc_client->mci_name = [ "ixgbe3" ]
mrg_clients->mgc_client->mci_name = [ "ixgbe2" ]
mrg_clients->mgc_client->mci_name = [ "igb1" ]

Okay. So we have assigned these VNICs hardware rings! Which actually matches
what dladm things. So why didn't we activate the SRS on them and why don't we
think they're polling capable?!

So, if we look at another machine for a moment, it's worth comparing and
looking at the fact that some nics have rings and others don't. For example on
the host we mentioned above (QP01C02H06), there are several things that prove
we have hardware rings and polling capable srs entries, even for our vnics!

> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_srs_cnt
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x2
mci_flent->fe_rx_srs_cnt = 0x2
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x1
mci_flent->fe_rx_srs_cnt = 0x5
mci_flent->fe_rx_srs_cnt = 0x1

So, let's review the conditions under which we can get these additional srs
entries. Why is it that some have only the single fe_rx_srs_cnt of zero and
others more. So, if we look at mac_rx_srs_group_setup(), then we'll see that an
important criteria is whether or not the flent has a group assigned to it.

On this machine, they all have a group:

> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_ring_group
mci_flent->fe_rx_ring_group = 0xffffff3291bb1120
mci_flent->fe_rx_ring_group = 0xffffff325fce11b0
mci_flent->fe_rx_ring_group = 0xffffff325fce1120
mci_flent->fe_rx_ring_group = 0xffffff3291bb1090
mci_flent->fe_rx_ring_group = 0xffffff325fce1090
mci_flent->fe_rx_ring_group = 0xffffff32937001c0
mci_flent->fe_rx_ring_group = 0xffffff3293700a80
mci_flent->fe_rx_ring_group = 0xffffff325f92c090
mci_flent->fe_rx_ring_group = 0xffffff329052c090
mci_flent->fe_rx_ring_group = 0xffffff3291bb1360
mci_flent->fe_rx_ring_group = 0xffffff325fce13f0
mci_flent->fe_rx_ring_group = 0xffffff325f92c120
mci_flent->fe_rx_ring_group = 0xffffff3291bb12d0
mci_flent->fe_rx_ring_group = 0xffffff325fce1360
mci_flent->fe_rx_ring_group = 0xffffff3291bb1240
mci_flent->fe_rx_ring_group = 0xffffff325fce12d0
mci_flent->fe_rx_ring_group = 0xffffff3291bb11b0
mci_flent->fe_rx_ring_group = 0xffffff325fce1240
> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_ring_group | ::print mac_group_t mrg_state
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
mrg_state = 2 (MAC_GROUP_STATE_RESERVED)

This at least matches what we expect. They are all there and they're all
reserved. So, this is what we'd ultimately expect to end up happening here to
make the eligible for creating an srs and being in polling mode. So, for the
ones that aren't, the question is what's the state of the ring.

> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_ring_group | ::print mac_group_t mrg_rings | ::list mac_ring_t mr_next 
 | ::print mac_ring_t mr_state ! sort | uniq -c
  66 mr_state = 0x2 (MR_INUSE)

So, all these rings are in use. Now, if we continue going through the function
that would assign the additional srs instances, mac_rx_srs_group_setup,
we'll see that the next check is the client state flags and the vid.

> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_state_flags | ::eval '. & 0x2000 = D'
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               
                0               

So, 0x2000 is MCIS_NO_UNICAST_ADDR, which is false, so that condition isn't
triggering. Which leaves the unforutnate, but likely and expected case – the
VLAN tag.

> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_flow_desc.fd_mask
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x5
mci_flent->fe_flow_desc.fd_mask = 0x5
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x5
mci_flent->fe_flow_desc.fd_mask = 0x5
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x5
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x5
mci_flent->fe_flow_desc.fd_mask = 0x1
mci_flent->fe_flow_desc.fd_mask = 0x5

Note how several of these have an fd_mask of 0x5. That means that they have a
VLAN. And of course, thanks to the comment in the source code that we need to
do software classification to validate the VLAN tag.

From here, there are a couple of important next steps. It'll be useful first to
verify the impact of this. After we've verified that, we can figure out the
priority and what to do from there, likely this falls into other ideas around
having the GLDv3 ring support do VLAN classification when possible.

From an initial glance we should be able to add this to rings on i40e and
ixgbe.


I'm picking this up as I've spent the last several weeks implementing VLAN HW classification support for VLAN switching capable ixgbe parts. This work will also include some of the general scaffolding needed in GLDv3 to support negotiation of this capability. Other classes of parts, say i40e, will need additional modifications to support this feature and that work should be filed under a separate ticket. This ticket will act as the staging area for initial HW VLAN classification support.


I tested various scenarios with a script I wrote (to make sure my tests were repeatable and less prone to human error). I won't claim these tests cover everything, but they should cover the majority of what we typically see in a production environment. The aggr tests were run on several scenarios: 1) aggr of two ixgbe ports, 2) aggr of mixed ixgbe/igb, 3) aggr with 9000 MTU.

  • Verify I can create a VNIC on ixgbe.
  • Verify I can create a VLAN VNIC on ixgbe.
  • Verify that a VNIC on ixgbe gets HW resources and traffic received on this VNIC uses those HW resources.
  • Verify that a VLAN VNIC on ixgbe gets HW resource and traffic received on this VLAN VNIC uses those HW resources.
  • Verify that two VLAN VNICs sharing the same unicast MAC address are placed on the default group and their Rx traffic is SW classified.
  • Create more VLAN VNICs than there are HW groups so that I know some of the VLAN VNICs are on the default group, delete all the VLAN VNICs on HW groups, delete all but one VLAN VNIC on the default group, verify I can still receive traffic on that remaining VLAN VNIC. This tests the VLAN reference counting and some internal ixgbe logic related to the VFTA/VLVF.
  • Create one VLAN VNIC with rxrings=sw to put in on default group, create another VLAN VNIC with same vid on non-default group, verify Rx traffic flows over VLAN VNIC #1, delete VLAN VNIC #2, reverify traffic flows over VLAN VNIC #1. This tests yet another internal edge case in ixgbe related to VLVF/VFTA.
  • Put two VLAN VNICs with the same vid on the default group (via rxrings=sw), then delete one VLAN VNIC and verify traffic is still received on the other. This is another test of VLAN reference counting.
  • Create enough VLAN VNICs to use all HW groups on the MAC. Verify they all receive traffic over HW.
  • Create enough VLAN VNICs to use all HW groups. Then create another VLAN VNIC but specify rxrings=hw to explicitly ask for a HW group. This will cause the new VLAN VNIC to steal one of the HW groups. This tests the client group-switching logic.
  • Verify that Rx data travels over HW when using the aggr primary client. (If you modify the aggr MTU then this breaks, but that bug existed before this wad).
  • Verify that Rx data tavels over HW when placing a VLAN (dladm create-vlan) on the aggr primary client.
  • Verify that two VNICs can exist on an aggr and receive traffic over SW.
  • Verify that a single VLAN VNIC on an aggr receives traffic over HW.
  • Verify that two VLAN VNICs can exist on an aggr and receive traffic over SW.
  • Verify that two VLAN VNICs sharing the same unicast MAC address can exist on an aggr and received traffic over SW.
  • Create two VNICs on an aggr, delete one, verify the remaining VNIC still receives traffic, and that it's over HW.
  • Verify that a VNIC and VLAN VNIC can coexist on an aggr and receive traffic, over SW.
  • Verify that a VLAN VNIC on an etherstub can received traffic.

Related issues

Related to illumos gate - Feature #11492: add VLVF bypass to ixgbe coreNew

Actions
Related to illumos gate - Feature #11491: Want DLS bypass for VLAN trafficNew

Actions

History

#1

Updated by Robert Mustacchi 5 months ago

#2

Updated by Robert Mustacchi 5 months ago

Also available in: Atom PDF