Bug #11490
SRS ring polling disabled for VLANs
100%
Description
Through investigation we discovered there are a large number of zones
with a large number of MAC rxsdrops. This is surprising, because in
theory there should be rings in play. In fact, here we are on QP01C02H01
which is running joyent_20160204T171016Z.
If we look at all of it's ixgbe_t's:
> ::prtconf ! grep ixgbe | awk '{ print $1}' > /var/tmp/rm/ixgbe > ::cat /var/tmp/rm/ixgbe | ::print struct dev_info devi_driver_data | ::print ixgbe_t num_rx_groups num_rx_rings num_rx_groups = 0x20 num_rx_rings = 0x80 num_rx_groups = 0x20 num_rx_rings = 0x80 num_rx_groups = 0x20 num_rx_rings = 0x80 num_rx_groups = 0x20 num_rx_rings = 0x80
If we look at this, we see that we have plenty of rings and groups, far
more than this box has vnics.
[root@QP01C02H01 (jupiter) /var/tmp/rm]# dladm show-vnic -p -o link | wc -l 19
However, if we look at these mac srs drops stats, we see something interesting:
igb0:0:mac_misc_stat:rxsdrops 0 igb0:0:mac_rx_hwlane0:rxsdrops 0 igb0:0:mac_rx_swlane0:rxsdrops 0 igb1:0:mac_misc_stat:rxsdrops 0 igb1:0:mac_rx_hwlane0:rxsdrops 0 igb1:0:mac_rx_swlane0:rxsdrops 0 ixgbe0:0:mac_misc_stat:rxsdrops 0 ixgbe0:0:mac_rx_hwlane4:rxsdrops 0 ixgbe0:0:mac_rx_hwlane5:rxsdrops 0 ixgbe0:0:mac_rx_hwlane6:rxsdrops 0 ixgbe0:0:mac_rx_hwlane7:rxsdrops 0 ixgbe0:0:mac_rx_swlane0:rxsdrops 0 ixgbe1:0:mac_misc_stat:rxsdrops 0 ixgbe1:0:mac_rx_hwlane4:rxsdrops 0 ixgbe1:0:mac_rx_hwlane5:rxsdrops 0 ixgbe1:0:mac_rx_hwlane6:rxsdrops 0 ixgbe1:0:mac_rx_hwlane7:rxsdrops 0 ixgbe1:0:mac_rx_swlane0:rxsdrops 0 ixgbe2:0:mac_misc_stat:rxsdrops 0 ixgbe2:0:mac_rx_hwlane4:rxsdrops 0 ixgbe2:0:mac_rx_hwlane5:rxsdrops 0 ixgbe2:0:mac_rx_hwlane6:rxsdrops 0 ixgbe2:0:mac_rx_hwlane7:rxsdrops 0 ixgbe2:0:mac_rx_swlane0:rxsdrops 0 ixgbe3:0:mac_misc_stat:rxsdrops 0 ixgbe3:0:mac_rx_hwlane4:rxsdrops 0 ixgbe3:0:mac_rx_hwlane5:rxsdrops 0 ixgbe3:0:mac_rx_hwlane6:rxsdrops 0 ixgbe3:0:mac_rx_hwlane7:rxsdrops 0 ixgbe3:0:mac_rx_swlane0:rxsdrops 0 z13_eth0:0:mac_misc_stat:rxsdrops 0 z13_eth0:0:mac_rx_swlane0:rxsdrops 4953776 z14_eth0:0:mac_misc_stat:rxsdrops 0 z14_eth0:0:mac_rx_swlane0:rxsdrops 1079036 z15_eth0:0:mac_misc_stat:rxsdrops 0 z15_eth0:0:mac_rx_swlane0:rxsdrops 3453358 z16_eth0:0:mac_misc_stat:rxsdrops 0 z16_eth0:0:mac_rx_swlane0:rxsdrops 4766797 z17_eth0:0:mac_misc_stat:rxsdrops 0 z17_eth0:0:mac_rx_swlane0:rxsdrops 4373173 z18_eth0:0:mac_misc_stat:rxsdrops 0 z18_eth0:0:mac_rx_swlane0:rxsdrops 299795 z19_eth0:0:mac_misc_stat:rxsdrops 0 z19_eth0:0:mac_rx_swlane0:rxsdrops 7299191 z20_eth0:0:mac_misc_stat:rxsdrops 0 z20_eth0:0:mac_rx_swlane0:rxsdrops 5496324 z21_eth0:0:mac_misc_stat:rxsdrops 0 z21_eth0:0:mac_rx_swlane0:rxsdrops 1701288 z22_eth0:0:mac_misc_stat:rxsdrops 0 z22_eth0:0:mac_rx_swlane0:rxsdrops 346209 z23_eth0:0:mac_misc_stat:rxsdrops 0 z23_eth0:0:mac_rx_swlane0:rxsdrops 8937596 z24_eth0:0:mac_misc_stat:rxsdrops 0 z24_eth0:0:mac_rx_swlane0:rxsdrops 4499318 z25_eth0:0:mac_misc_stat:rxsdrops 0 z25_eth0:0:mac_rx_swlane0:rxsdrops 253240 z26_eth0:0:mac_misc_stat:rxsdrops 0 z26_eth0:0:mac_rx_swlane0:rxsdrops 4503777 z27_eth0:0:mac_misc_stat:rxsdrops 0 z27_eth0:0:mac_rx_swlane0:rxsdrops 4788049 z28_eth0:0:mac_misc_stat:rxsdrops 0 z28_eth0:0:mac_rx_swlane0:rxsdrops 2351624 z29_eth0:0:mac_misc_stat:rxsdrops 0 z29_eth0:0:mac_rx_swlane0:rxsdrops 320165 z30_eth0:0:mac_misc_stat:rxsdrops 0 z30_eth0:0:mac_rx_swlane0:rxsdrops 8184349 z31_eth0:0:mac_misc_stat:rxsdrops 0 z31_eth0:0:mac_rx_swlane0:rxsdrops 225596
Note how all these zones only have software lane stats while the ixgbe
instances have multiple hardware lanes, the expected four, in fact.
Okay, so what does MAC actually thing we have for all these clients...
> ::walk mac_client_impl_cache | ::printf "%p %s\n" mac_client_impl_t . mci_name ! grep z | awk '{ print $1 }' > /var/tmp/rm/vnic.mci > ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_ring_group mci_flent->fe_rx_ring_group = 0xffffff32936a76c0 mci_flent->fe_rx_ring_group = 0xffffff32936a7b40 mci_flent->fe_rx_ring_group = 0xffffff32936a7ab0 mci_flent->fe_rx_ring_group = 0xffffff32936a7a20 mci_flent->fe_rx_ring_group = 0xffffff32936a7900 mci_flent->fe_rx_ring_group = 0xffffff32936a7870 mci_flent->fe_rx_ring_group = 0xffffff32936a77e0 mci_flent->fe_rx_ring_group = 0xffffff32936a7990 mci_flent->fe_rx_ring_group = 0xffffff32936a7750 mci_flent->fe_rx_ring_group = 0xffffff32936a7630 mci_flent->fe_rx_ring_group = 0xffffff32936a75a0 mci_flent->fe_rx_ring_group = 0xffffff32936a7510 mci_flent->fe_rx_ring_group = 0xffffff32936a73f0 mci_flent->fe_rx_ring_group = 0xffffff32936a7360 mci_flent->fe_rx_ring_group = 0xffffff32936a72d0 mci_flent->fe_rx_ring_group = 0xffffff32936a7240 mci_flent->fe_rx_ring_group = 0xffffff32936a71b0 mci_flent->fe_rx_ring_group = 0xffffff32936a7120 mci_flent->fe_rx_ring_group = 0xffffff32936a7480 > ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_srs_cnt mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 > ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_ring_group | ::print mac_group_t mrg_cur_count mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4 mrg_cur_count = 0x4
So based on this, in theory each of these should all have four rings. So, why
don't they seem to be using them and instead only showing software lanes?
That's a bit odd. So, the real question is can the srs in the system support
polling. In theory, they should be able to if we have these rings. So let's see.
> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_srs[0] | ::print mac_soft_ring_set_t srs_state srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000812 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000812 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000002 srs_state = 0x24000812 srs_state = 0x24000002 srs_state = 0x24000002
Well, the srs polling capable flag is 0x00000008, so nope. Well, that's not
what we expect.
> ::cat /var/tmp/rm/vnic.mci | ::print mac_client_impl_t mci_flent->fe_rx_srs[0] | ::print mac_soft_ring_set_t srs_soft_ring_count srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18 srs_soft_ring_count = 0x18
Okay, they have software rings at least (and quite a lot). But those don't let us poll. In fact, here's a representative srs:
srs_rx = { srs_rx.sr_func = mac_rx_deliver srs_rx.sr_arg1 = 0xffffff67b10de560 srs_rx.sr_arg2 = 0 srs_rx.sr_lower_proc = mac_rx_srs_process srs_rx.sr_poll_pkt_cnt = 0 srs_rx.sr_poll_thres = 0x10 srs_rx.sr_hiwat = 0x400 srs_rx.sr_lowat = 0x100 srs_rx.sr_stat = { mrs_lclbytes = 0x227f1a6 mrs_lclcnt = 0x834b1 mrs_pollcnt = 0 mrs_pollbytes = 0 mrs_intrcnt = 0x515963262 mrs_intrbytes = 0x10b7f97e27fb mrs_sdrops = 0x6f6b89 mrs_chaincntundr10 = 0 mrs_chaincnt10to50 = 0 mrs_chaincntover50 = 0 mrs_ierrors = 0 } srs_rx.sr_poll_on = 0 srs_rx.sr_worker_poll_on = 0 srs_rx.sr_poll_off = 0 srs_rx.sr_poll_thr_sig = 0x1e928ddc srs_rx.sr_poll_thr_busy = 0x1e928ddc srs_rx.sr_poll_drain_no_poll = 0 srs_rx.sr_poll_no_poll = 0 srs_rx.sr_below_hiwat = 0 srs_rx.sr_poll_again = 0 srs_rx.sr_poll_sig_worker = 0 srs_rx.sr_poll_intr_enable = 0 srs_rx.sr_poll_goto_sleep = 0 srs_rx.sr_drain_again = 0x1467148 srs_rx.sr_drain_poll_sig = 0x1e928ddc srs_rx.sr_drain_worker_sig = 0x1e0a87af srs_rx.sr_drain_poll_running = 0 srs_rx.sr_drain_keep_polling = 0 srs_rx.sr_drain_finish_intr = 0xd5d787ab srs_rx.sr_poll_worker_wakeup = 0 }
Well. That's awkard. it has never had polling turned on ever. Which fits that
it's not polling capable. So that's odd. Now, let's try and approach this from
another angle. If we look at the mac_srs output on the box:
> ::mac_srs ! grep eth ADDR LINK_NAME FLENT HW RING DIR ... ffffff719a5a0340 z31_eth0 ffffff719a6801c8 0 TX ffffff67d5002000 z31_eth0 ffffff719a6801c8 0 RX ffffff67d5002cc0 z30_eth0 ffffff7142bddb38 0 TX ffffff702d6c3040 z28_eth0 ffffff34b0eba020 0 TX ffffff67d5003980 z30_eth0 ffffff7142bddb38 0 RX ffffff702d6c3d00 z28_eth0 ffffff34b0eba020 0 RX ffffff67d5004640 z29_eth0 ffffff7142be4620 0 TX ffffff702d6c49c0 z27_eth0 ffffff7142bf1bf0 0 TX ffffff67d5005300 z29_eth0 ffffff7142be4620 0 RX ffffff702d6c5680 z27_eth0 ffffff7142bf1bf0 0 RX ffffff702d6c6340 z26_eth0 ffffff7142bff1c0 0 TX ffffff67c5888000 z19_eth0 ffffff67c6a4c6c0 0 TX ffffff67c5888cc0 z13_eth0 ffffff67c5bd50e8 0 RX ffffff67c5889980 z14_eth0 ffffff67c6a45bd8 0 TX ffffff67c588a640 z15_eth0 ffffff67c6a38608 0 RX ffffff70ab60c040 z22_eth0 ffffff70dd7571b8 0 TX ffffff703ca8c000 z20_eth0 ffffff702c8550f8 0 RX ffffff70ab60cd00 z22_eth0 ffffff70dd7571b8 0 RX ffffff703ca8ccc0 z18_eth0 ffffff702c8626c8 0 TX ffffff70ab60d9c0 z21_eth0 ffffff702c847b28 0 TX ffffff703ca8d980 z18_eth0 ffffff702c8626c8 0 RX ffffff70ab60e680 z21_eth0 ffffff702c847b28 0 RX ffffff703ca8e640 z16_eth0 ffffff67c6a2b038 0 TX ffffff70ab60f340 z20_eth0 ffffff702c8550f8 0 TX ffffff703ca8f300 z16_eth0 ffffff67c6a2b038 0 RX ffffff329ceaf300 z17_eth0 ffffff67c5bdbbd0 0 RX ffffff329ceb0040 z17_eth0 ffffff67c5bdbbd0 0 TX ffffff67b18d1d00 z23_eth0 ffffff6792741b10 0 TX ffffff67b18d29c0 z23_eth0 ffffff6792741b10 0 RX ffffff710a913000 z26_eth0 ffffff7142bff1c0 0 RX ffffff67b18d3680 z14_eth0 ffffff67c6a45bd8 0 RX ffffff710a913cc0 z25_eth0 ffffff70dd735b30 0 TX ffffff67b18d4340 z13_eth0 ffffff67c5bd50e8 0 TX ffffff710a914980 z25_eth0 ffffff70dd735b30 0 RX ffffff710a915640 z24_eth0 ffffff70dd743100 0 TX ffffff710a916300 z24_eth0 ffffff70dd743100 0 RX ffffff67f8396040 z15_eth0 ffffff67c6a38608 0 TX ffffff67f83979c0 z19_eth0 ffffff67c6a4c6c0 0 RX
We'll see no vnic has hardware rings. And if we look at another box in the metadata tier:
> ::mac_srs ! grep net | grep RX ADDR LINK_NAME FLENT HW RING DIR ... ffffff60b26c0300 z5_net2 ffffff60b27356b8 ffffff32936f2230 RX ffffff60b26c1040 z5_net2 ffffff60b27356b8 ffffff32936f2170 RX ffffff60b26c1d00 z5_net2 ffffff60b27356b8 ffffff32936f20b0 RX ffffff60b26c29c0 z5_net2 ffffff60b27356b8 0 RX ffffff5e79583040 z4_net1 ffffff5d7e32e5f8 ffffff32936f2470 RX ffffff5e79583d00 z4_net1 ffffff5d7e32e5f8 ffffff32936f23b0 RX ffffff60b26c4340 z5_net1 ffffff60b273c1a0 0 RX ffffff5e795849c0 z4_net1 ffffff5d7e32e5f8 0 RX ffffff5e79586340 z4_net0 ffffff5d7e3350e0 0 RX ffffff34c2e08040 z2_net0 ffffff34b696f020 0 RX ffffff34c2e099c0 z1_net1 ffffff34b6975b08 ffffff32936f2ef0 RX ffffff34c2e0a680 z1_net1 ffffff34b6975b08 ffffff32936f2e30 RX ffffff34c2e0b340 z1_net1 ffffff34b6975b08 ffffff32936f2d70 RX ffffff5d7e36d040 z3_net1 ffffff5d7e33bbc8 0 RX ffffff332ecee000 z1_net1 ffffff34b6975b08 ffffff32936f2cb0 RX ffffff5d7e36e9c0 z3_net0 ffffff5d7e3426b0 0 RX ffffff332eceecc0 z1_net1 ffffff34b6975b08 0 RX ffffff5d7e370340 z2_net1 ffffff5d7e349198 ffffff32936f2bf0 RX ffffff3320e94000 z1_net0 ffffff32bfd2d180 0 RX ffffff34cc555000 z2_net1 ffffff5d7e349198 ffffff32936f2b30 RX ffffff34cc555cc0 z2_net1 ffffff5d7e349198 ffffff32936f2a70 RX ffffff34cc556980 z2_net1 ffffff5d7e349198 ffffff32936f29b0 RX ffffff34cc557640 z2_net1 ffffff5d7e349198 0 RX ffffff5e79577cc0 z5_net0 ffffff5d7e327b10 0 RX ffffff5e79579640 z4_net1 ffffff5d7e32e5f8 ffffff32936f25f0 RX ffffff5d7e9b9cc0 z3_net1 ffffff5d7e33bbc8 ffffff32936f28f0 RX ffffff5e7957a300 z4_net1 ffffff5d7e32e5f8 ffffff32936f2530 RX ffffff5d7e9ba980 z3_net1 ffffff5d7e33bbc8 ffffff32936f2830 RX ffffff5d7e9bb640 z3_net1 ffffff5d7e33bbc8 ffffff32936f2770 RX ffffff5d7e9bc300 z3_net1 ffffff5d7e33bbc8 ffffff32936f26b0 RX ffffff60b26bf640 z5_net2 ffffff60b27356b8 ffffff32936f22f0 RX
Okay, so given all this, what's going on. It appears that we think the groups
are allocated. But this is even weirder. If we look at ::mac_group on the
original box:
> ::mac_group ADDR TYPE STATE NRINGS NCLIENTS RINGS ffffff32936cb040 RECEIVE RESERVED 1 1 ffffff329367e5b0 ffffff32936a7000 RECEIVE SHARED 4 0 ffffff32933b9038 ffffff32936a7090 RECEIVE RESERVED 4 1 ffffff3293684c40 ffffff32936a7120 RECEIVE RESERVED 4 1 ffffff3293684940 ffffff32936a71b0 RECEIVE RESERVED 4 1 ffffff3293684640 ffffff32936a7240 RECEIVE RESERVED 4 1 ffffff3293684340 ffffff32936a72d0 RECEIVE RESERVED 4 1 ffffff3293684040 ffffff32936a7360 RECEIVE RESERVED 4 1 ffffff3293683c48 ffffff32936a73f0 RECEIVE RESERVED 4 1 ffffff3293683948 ffffff32936a7480 RECEIVE RESERVED 4 1 ffffff3293683648 ffffff32936a7510 RECEIVE RESERVED 4 1 ffffff3293683348 ffffff32936a75a0 RECEIVE RESERVED 4 1 ffffff3293683048 ffffff32936a7630 RECEIVE RESERVED 4 1 ffffff3293682c50 ffffff32936a76c0 RECEIVE RESERVED 4 1 ffffff3293682950 ffffff32936a7750 RECEIVE RESERVED 4 1 ffffff3293682650 ffffff32936a77e0 RECEIVE RESERVED 4 1 ffffff3293682350 ffffff32936a7870 RECEIVE RESERVED 4 1 ffffff3293682050 ffffff32936a7900 RECEIVE RESERVED 4 1 ffffff3293681c58 ffffff32936a7990 RECEIVE RESERVED 4 1 ffffff3293681958 ffffff32936a7a20 RECEIVE RESERVED 4 1 ffffff3293681658 ffffff32936a7ab0 RECEIVE RESERVED 4 1 ffffff3293681358 ffffff32936a7b40 RECEIVE RESERVED 4 1 ffffff3293681058 ffffff32936a7bd0 RECEIVE REGISTERED 4 0 ffffff3293680c60 ffffff32936a7c60 RECEIVE REGISTERED 4 0 ffffff3293680960 ffffff32936a7cf0 RECEIVE REGISTERED 4 0 ffffff3293680660 ffffff32936a7d80 RECEIVE REGISTERED 4 0 ffffff3293680360 ffffff32936a7e10 RECEIVE REGISTERED 4 0 ffffff3293680060 ffffff32936a7ea0 RECEIVE REGISTERED 4 0 ffffff329367fc68 ffffff32936a7f30 RECEIVE REGISTERED 4 0 ffffff329367f968 ffffff32936a7fc0 RECEIVE REGISTERED 4 0 ffffff329367f668 ffffff32936a8050 RECEIVE REGISTERED 4 0 ffffff329367f368 ffffff32936a80e0 RECEIVE REGISTERED 4 0 ffffff329367f068 ffffff32936a8170 RECEIVE REGISTERED 4 0 ffffff329367ec70 ffffff3290504000 RECEIVE SHARED 4 0 ffffff32933bfc08 ffffff3290504090 RECEIVE RESERVED 4 1 ffffff32933bf908 ffffff3290504120 RECEIVE REGISTERED 4 0 ffffff32933bf608 ffffff32905041b0 RECEIVE REGISTERED 4 0 ffffff32933bf308 ffffff3290504240 RECEIVE REGISTERED 4 0 ffffff32933bf008 ffffff32905042d0 RECEIVE REGISTERED 4 0 ffffff32933bec10 ffffff3290504360 RECEIVE REGISTERED 4 0 ffffff32933be910 ffffff32905043f0 RECEIVE REGISTERED 4 0 ffffff32933be610 ffffff3290504480 RECEIVE REGISTERED 4 0 ffffff32933be310 ffffff3290504510 RECEIVE REGISTERED 4 0 ffffff32933be010 ffffff32905045a0 RECEIVE REGISTERED 4 0 ffffff32933bdc18 ffffff3290504630 RECEIVE REGISTERED 4 0 ffffff32933bd918 ffffff32905046c0 RECEIVE REGISTERED 4 0 ffffff32933bd618 ffffff3290504750 RECEIVE REGISTERED 4 0 ffffff32933bd318 ffffff32905047e0 RECEIVE REGISTERED 4 0 ffffff32933bd018 ffffff3290504870 RECEIVE REGISTERED 4 0 ffffff32933bcc20 ffffff3290504900 RECEIVE REGISTERED 4 0 ffffff32933bc920 ffffff3290504990 RECEIVE REGISTERED 4 0 ffffff32933bc620 ffffff3290504a20 RECEIVE REGISTERED 4 0 ffffff32933bc320 ffffff3290504ab0 RECEIVE REGISTERED 4 0 ffffff32933bc020 ffffff3290504b40 RECEIVE REGISTERED 4 0 ffffff32933bbc28 ffffff3290504bd0 RECEIVE REGISTERED 4 0 ffffff32933bb928 ffffff3290504c60 RECEIVE REGISTERED 4 0 ffffff32933bb628 ffffff3290504cf0 RECEIVE REGISTERED 4 0 ffffff32933bb328 ffffff3290504d80 RECEIVE REGISTERED 4 0 ffffff32933bb028 ffffff3290504e10 RECEIVE REGISTERED 4 0 ffffff32933bac30 ffffff3290504ea0 RECEIVE REGISTERED 4 0 ffffff32933ba930 ffffff3290504f30 RECEIVE REGISTERED 4 0 ffffff32933ba630 ffffff3290504fc0 RECEIVE REGISTERED 4 0 ffffff32933ba330 ffffff3290505050 RECEIVE REGISTERED 4 0 ffffff32933ba030 ffffff32905050e0 RECEIVE REGISTERED 4 0 ffffff32933b9c38 ffffff3290505170 RECEIVE REGISTERED 4 0 ffffff32933b9938 ffffff329369f000 RECEIVE SHARED 4 0 ffffff32937703a8 ffffff329369f090 RECEIVE RESERVED 4 1 ffffff32937700a8 ffffff329369f120 RECEIVE REGISTERED 4 0 ffffff32937e0cb0 ffffff329369f1b0 RECEIVE REGISTERED 4 0 ffffff32937e09b0 ffffff329369f240 RECEIVE REGISTERED 4 0 ffffff32937e06b0 ffffff329369f2d0 RECEIVE REGISTERED 4 0 ffffff32937e03b0 ffffff329369f360 RECEIVE REGISTERED 4 0 ffffff32937e00b0 ffffff329369f3f0 RECEIVE REGISTERED 4 0 ffffff32937dfcb8 ffffff329369f480 RECEIVE REGISTERED 4 0 ffffff32937df9b8 ffffff329369f510 RECEIVE REGISTERED 4 0 ffffff32937df6b8 ffffff329369f5a0 RECEIVE REGISTERED 4 0 ffffff32937df3b8 ffffff329369f630 RECEIVE REGISTERED 4 0 ffffff32937df0b8 ffffff329369f6c0 RECEIVE REGISTERED 4 0 ffffff32937dec00 ffffff329369f750 RECEIVE REGISTERED 4 0 ffffff32937de900 ffffff329369f7e0 RECEIVE REGISTERED 4 0 ffffff32937de600 ffffff329369f870 RECEIVE REGISTERED 4 0 ffffff32937de300 ffffff329369f900 RECEIVE REGISTERED 4 0 ffffff32937de000 ffffff329369f990 RECEIVE REGISTERED 4 0 ffffff32937ddc08 ffffff329369fa20 RECEIVE REGISTERED 4 0 ffffff32937dd908 ffffff329369fab0 RECEIVE REGISTERED 4 0 ffffff32937dd608 ffffff329369fb40 RECEIVE REGISTERED 4 0 ffffff32937dd308 ffffff329369fbd0 RECEIVE REGISTERED 4 0 ffffff32937dd008 ffffff329369fc60 RECEIVE REGISTERED 4 0 ffffff32937dcc10 ffffff329369fcf0 RECEIVE REGISTERED 4 0 ffffff32937dc910 ffffff329369fd80 RECEIVE REGISTERED 4 0 ffffff32937dc610 ffffff329369fe10 RECEIVE REGISTERED 4 0 ffffff32937dc310 ffffff329369fea0 RECEIVE REGISTERED 4 0 ffffff32937dc010 ffffff329369ff30 RECEIVE REGISTERED 4 0 ffffff32937dbc18 ffffff329369ffc0 RECEIVE REGISTERED 4 0 ffffff32937db918 ffffff32936a0050 RECEIVE REGISTERED 4 0 ffffff32937db618 ffffff32936a00e0 RECEIVE REGISTERED 4 0 ffffff32937db318 ffffff32936a0170 RECEIVE REGISTERED 4 0 ffffff32937db018 ffffff3290824000 RECEIVE SHARED 4 0 ffffff329367e070 ffffff3290824090 RECEIVE RESERVED 4 1 ffffff3293776c78 ffffff3290824120 RECEIVE REGISTERED 4 0 ffffff3293776978 ffffff32908241b0 RECEIVE REGISTERED 4 0 ffffff3293776678 ffffff3290824240 RECEIVE REGISTERED 4 0 ffffff3293776378 ffffff32908242d0 RECEIVE REGISTERED 4 0 ffffff3293776078 ffffff3290824360 RECEIVE REGISTERED 4 0 ffffff3293775c80 ffffff32908243f0 RECEIVE REGISTERED 4 0 ffffff3293775980 ffffff3290824480 RECEIVE REGISTERED 4 0 ffffff3293775680 ffffff3290824510 RECEIVE REGISTERED 4 0 ffffff3293775380 ffffff32908245a0 RECEIVE REGISTERED 4 0 ffffff3293775080 ffffff3290824630 RECEIVE REGISTERED 4 0 ffffff3293774c88 ffffff32908246c0 RECEIVE REGISTERED 4 0 ffffff3293774988 ffffff3290824750 RECEIVE REGISTERED 4 0 ffffff3293774688 ffffff32908247e0 RECEIVE REGISTERED 4 0 ffffff3293774388 ffffff3290824870 RECEIVE REGISTERED 4 0 ffffff3293774088 ffffff3290824900 RECEIVE REGISTERED 4 0 ffffff3293773c90 ffffff3290824990 RECEIVE REGISTERED 4 0 ffffff3293773990 ffffff3290824a20 RECEIVE REGISTERED 4 0 ffffff3293773690 ffffff3290824ab0 RECEIVE REGISTERED 4 0 ffffff3293773390 ffffff3290824b40 RECEIVE REGISTERED 4 0 ffffff3293773090 ffffff3290824bd0 RECEIVE REGISTERED 4 0 ffffff3293772c98 ffffff3290824c60 RECEIVE REGISTERED 4 0 ffffff3293772998 ffffff3290824cf0 RECEIVE REGISTERED 4 0 ffffff3293772698 ffffff3290824d80 RECEIVE REGISTERED 4 0 ffffff3293772398 ffffff3290824e10 RECEIVE REGISTERED 4 0 ffffff3293772098 ffffff3290824ea0 RECEIVE REGISTERED 4 0 ffffff3293771ca0 ffffff3290824f30 RECEIVE REGISTERED 4 0 ffffff32937719a0 ffffff3290824fc0 RECEIVE REGISTERED 4 0 ffffff32937716a0 ffffff3290825050 RECEIVE REGISTERED 4 0 ffffff32937713a0 ffffff32908250e0 RECEIVE REGISTERED 4 0 ffffff32937710a0 ffffff3290825170 RECEIVE REGISTERED 4 0 ffffff3293770ca8 ffffff329377b6c0 RECEIVE RESERVED 1 1 ffffff329367e430
So who actually has reserved rings?
> ::mac_group -r ! grep RESERVED | awk '{ print $1 }' > /var/tmp/rm/groups > ::cat /var/tmp/rm/groups | ::print mac_group_t mrg_clients->mgc_client->mci_name mrg_clients->mgc_client->mci_name = [ "igb0" ] mrg_clients->mgc_client->mci_name = [ "ixgbe1" ] mrg_clients->mgc_client->mci_name = [ "z13_eth0" ] mrg_clients->mgc_client->mci_name = [ "z14_eth0" ] mrg_clients->mgc_client->mci_name = [ "z15_eth0" ] mrg_clients->mgc_client->mci_name = [ "z16_eth0" ] mrg_clients->mgc_client->mci_name = [ "z17_eth0" ] mrg_clients->mgc_client->mci_name = [ "z18_eth0" ] mrg_clients->mgc_client->mci_name = [ "z19_eth0" ] mrg_clients->mgc_client->mci_name = [ "z20_eth0" ] mrg_clients->mgc_client->mci_name = [ "z21_eth0" ] mrg_clients->mgc_client->mci_name = [ "z22_eth0" ] mrg_clients->mgc_client->mci_name = [ "z23_eth0" ] mrg_clients->mgc_client->mci_name = [ "z24_eth0" ] mrg_clients->mgc_client->mci_name = [ "z25_eth0" ] mrg_clients->mgc_client->mci_name = [ "z26_eth0" ] mrg_clients->mgc_client->mci_name = [ "z27_eth0" ] mrg_clients->mgc_client->mci_name = [ "z28_eth0" ] mrg_clients->mgc_client->mci_name = [ "z29_eth0" ] mrg_clients->mgc_client->mci_name = [ "z30_eth0" ] mrg_clients->mgc_client->mci_name = [ "z31_eth0" ] mrg_clients->mgc_client->mci_name = [ "ixgbe0" ] mrg_clients->mgc_client->mci_name = [ "ixgbe3" ] mrg_clients->mgc_client->mci_name = [ "ixgbe2" ] mrg_clients->mgc_client->mci_name = [ "igb1" ]
Okay. So we have assigned these VNICs hardware rings! Which actually matches
what dladm things. So why didn't we activate the SRS on them and why don't we
think they're polling capable?!
So, if we look at another machine for a moment, it's worth comparing and
looking at the fact that some nics have rings and others don't. For example on
the host we mentioned above (QP01C02H06), there are several things that prove
we have hardware rings and polling capable srs entries, even for our vnics!
> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_srs_cnt mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x2 mci_flent->fe_rx_srs_cnt = 0x2 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x1 mci_flent->fe_rx_srs_cnt = 0x5 mci_flent->fe_rx_srs_cnt = 0x1
So, let's review the conditions under which we can get these additional srs
entries. Why is it that some have only the single fe_rx_srs_cnt of zero and
others more. So, if we look at mac_rx_srs_group_setup(), then we'll see that an
important criteria is whether or not the flent has a group assigned to it.
On this machine, they all have a group:
> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_ring_group mci_flent->fe_rx_ring_group = 0xffffff3291bb1120 mci_flent->fe_rx_ring_group = 0xffffff325fce11b0 mci_flent->fe_rx_ring_group = 0xffffff325fce1120 mci_flent->fe_rx_ring_group = 0xffffff3291bb1090 mci_flent->fe_rx_ring_group = 0xffffff325fce1090 mci_flent->fe_rx_ring_group = 0xffffff32937001c0 mci_flent->fe_rx_ring_group = 0xffffff3293700a80 mci_flent->fe_rx_ring_group = 0xffffff325f92c090 mci_flent->fe_rx_ring_group = 0xffffff329052c090 mci_flent->fe_rx_ring_group = 0xffffff3291bb1360 mci_flent->fe_rx_ring_group = 0xffffff325fce13f0 mci_flent->fe_rx_ring_group = 0xffffff325f92c120 mci_flent->fe_rx_ring_group = 0xffffff3291bb12d0 mci_flent->fe_rx_ring_group = 0xffffff325fce1360 mci_flent->fe_rx_ring_group = 0xffffff3291bb1240 mci_flent->fe_rx_ring_group = 0xffffff325fce12d0 mci_flent->fe_rx_ring_group = 0xffffff3291bb11b0 mci_flent->fe_rx_ring_group = 0xffffff325fce1240 > ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_ring_group | ::print mac_group_t mrg_state mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED) mrg_state = 2 (MAC_GROUP_STATE_RESERVED)
This at least matches what we expect. They are all there and they're all
reserved. So, this is what we'd ultimately expect to end up happening here to
make the eligible for creating an srs and being in polling mode. So, for the
ones that aren't, the question is what's the state of the ring.
> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_rx_ring_group | ::print mac_group_t mrg_rings | ::list mac_ring_t mr_next | ::print mac_ring_t mr_state ! sort | uniq -c 66 mr_state = 0x2 (MR_INUSE)
So, all these rings are in use. Now, if we continue going through the function
that would assign the additional srs instances, mac_rx_srs_group_setup,
we'll see that the next check is the client state flags and the vid.
> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_state_flags | ::eval '. & 0x2000 = D' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
So, 0x2000 is MCIS_NO_UNICAST_ADDR, which is false, so that condition isn't
triggering. Which leaves the unforutnate, but likely and expected case – the
VLAN tag.
> ::walk mac_client_impl_cache | ::print mac_client_impl_t mci_flent->fe_flow_desc.fd_mask mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x5 mci_flent->fe_flow_desc.fd_mask = 0x5 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x5 mci_flent->fe_flow_desc.fd_mask = 0x5 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x5 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x5 mci_flent->fe_flow_desc.fd_mask = 0x1 mci_flent->fe_flow_desc.fd_mask = 0x5
Note how several of these have an fd_mask of 0x5. That means that they have a
VLAN. And of course, thanks to the comment in the source code that we need to
do software classification to validate the VLAN tag.
From here, there are a couple of important next steps. It'll be useful first to
verify the impact of this. After we've verified that, we can figure out the
priority and what to do from there, likely this falls into other ideas around
having the GLDv3 ring support do VLAN classification when possible.
From an initial glance we should be able to add this to rings on i40e and
ixgbe.
I'm picking this up as I've spent the last several weeks implementing VLAN HW classification support for VLAN switching capable ixgbe parts. This work will also include some of the general scaffolding needed in GLDv3 to support negotiation of this capability. Other classes of parts, say i40e, will need additional modifications to support this feature and that work should be filed under a separate ticket. This ticket will act as the staging area for initial HW VLAN classification support.
I tested various scenarios with a script I wrote (to make sure my tests were repeatable and less prone to human error). I won't claim these tests cover everything, but they should cover the majority of what we typically see in a production environment. The aggr tests were run on several scenarios: 1) aggr of two ixgbe ports, 2) aggr of mixed ixgbe/igb, 3) aggr with 9000 MTU.
- Verify I can create a VNIC on ixgbe.
- Verify I can create a VLAN VNIC on ixgbe.
- Verify that a VNIC on ixgbe gets HW resources and traffic received on this VNIC uses those HW resources.
- Verify that a VLAN VNIC on ixgbe gets HW resource and traffic received on this VLAN VNIC uses those HW resources.
- Verify that two VLAN VNICs sharing the same unicast MAC address are placed on the default group and their Rx traffic is SW classified.
- Create more VLAN VNICs than there are HW groups so that I know some of the VLAN VNICs are on the default group, delete all the VLAN VNICs on HW groups, delete all but one VLAN VNIC on the default group, verify I can still receive traffic on that remaining VLAN VNIC. This tests the VLAN reference counting and some internal ixgbe logic related to the VFTA/VLVF.
- Create one VLAN VNIC with rxrings=sw to put in on default group, create another VLAN VNIC with same vid on non-default group, verify Rx traffic flows over VLAN VNIC #1, delete VLAN VNIC #2, reverify traffic flows over VLAN VNIC #1. This tests yet another internal edge case in ixgbe related to VLVF/VFTA.
- Put two VLAN VNICs with the same vid on the default group (via rxrings=sw), then delete one VLAN VNIC and verify traffic is still received on the other. This is another test of VLAN reference counting.
- Create enough VLAN VNICs to use all HW groups on the MAC. Verify they all receive traffic over HW.
- Create enough VLAN VNICs to use all HW groups. Then create another VLAN VNIC but specify rxrings=hw to explicitly ask for a HW group. This will cause the new VLAN VNIC to steal one of the HW groups. This tests the client group-switching logic.
- Verify that Rx data travels over HW when using the aggr primary client. (If you modify the aggr MTU then this breaks, but that bug existed before this wad).
- Verify that Rx data tavels over HW when placing a VLAN (dladm create-vlan) on the aggr primary client.
- Verify that two VNICs can exist on an aggr and receive traffic over SW.
- Verify that a single VLAN VNIC on an aggr receives traffic over HW.
- Verify that two VLAN VNICs can exist on an aggr and receive traffic over SW.
- Verify that two VLAN VNICs sharing the same unicast MAC address can exist on an aggr and received traffic over SW.
- Create two VNICs on an aggr, delete one, verify the remaining VNIC still receives traffic, and that it's over HW.
- Verify that a VNIC and VLAN VNIC can coexist on an aggr and receive traffic, over SW.
- Verify that a VLAN VNIC on an etherstub can received traffic.
Files
Related issues