Project

General

Profile

Bug #6273

obytes64 drops despite vnic not being destroyed

Added by Robert Mustacchi about 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Category:
networking
Start date:
2015-09-30
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:

Description

We encountered several cases where the obytes64 value for a vnic would drop to zero while the vnic was still active. The various kstat identifiers also didn't change, which for a counter is a serious bug.

I've installed a D script to run overnight to try and capture the theory that I originally established. This script should at least help us narrow in on whether or not this theory is viable:

[root@RM08219 (us-east-2) ~]# cat /var/tmp/rm/OS-2481.d
fbt::vnic_dev_create:entry
{
        self->t = 1;
}

fbt::mac_tx_srs_stat_recreate:entry
/self->t != 1/
{
        trace(timestamp);
        trace(walltimestamp);
        printf("\n%x %Y.%09d\n", timestamp, walltimestamp, walltimestamp % 1000000000);
        stack();
}

fbt::vnic_dev_create:return
/self->t/
{
        self->t = 0;
}

Past me had a theory that this was due to a stat being clobbered and in fact, the output of the above OS-2481.d was valuable and still sitting on the shrimp. Here's a few lines from it, the stack is what's important:

CPU     ID                    FUNCTION:NAME
  5  37137   mac_tx_srs_stat_recreate:entry  3633581624715199 1397097051623999321
ce8b92600e7bf 2014 Apr 10 02:30:51.623999321

              mac`i_mac_group_add_ring+0x291
              mac`mac_group_mov_ring+0x5b
              mac`mac_release_tx_group+0xfb
              mac`mac_datapath_teardown+0x3ca
              mac`mac_client_datapath_teardown+0x66
              mac`mac_unicast_remove+0x27a
              vnic`vnic_dev_delete+0x14b
              vnic`vnic_ioc_delete+0x28
              dld`drv_ioctl+0x1e4
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall32+0xff

  5  37137   mac_tx_srs_stat_recreate:entry  3633581625230065 1397097051624999498
ce8b92608c2f1 2014 Apr 10 02:30:51.624999498

              mac`i_mac_group_add_ring+0x291
              mac`mac_group_mov_ring+0x5b
              mac`mac_release_tx_group+0xfb
              mac`mac_datapath_teardown+0x3ca
              mac`mac_client_datapath_teardown+0x66
              mac`mac_unicast_remove+0x27a
              vnic`vnic_dev_delete+0x14b
              vnic`vnic_ioc_delete+0x28
              dld`drv_ioctl+0x1e4
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall32+0xff

There are various different tx groups that exist over a physical device based upon the way mac has decided to hand out transmit rings and their availability. In many places where we have ixgbe on the scene, it ends up working out that we assign on the order of 8-16 things their own group with one ring and then leave everyone else using the default ring. The following bit of mdb from emy10 shows the rough clustering of soft ring sets (eg. mac clients basically) to theses tx groups:

> ::walk mac_client_impl_cache | ::printf "%p\n" mac_client_impl_t mci_flent->fe_tx_ring_g
roup ! sort | uniq -c
   4 0
   1 ffffff0d2c4c7b80
   1 ffffff0d2c4c7c10
   1 ffffff0d2c4c7ca0
   1 ffffff0d2c4c7d30
   1 ffffff0d2c4c7dc0
   1 ffffff0d2c4c7e50
   1 ffffff0d2c4c7ee0
  59 ffffff0d2c4c7f70
   1 ffffff0d34b3c980
   1 ffffff0d34b3ca10
   1 ffffff0d34b3caa0
   1 ffffff0d34b3cb30
   1 ffffff0d34b3cbc0
   1 ffffff0d34b3cc50
   1 ffffff0d34b3cce0
  52 ffffff0d34b3cd70
   1 ffffff0d39505010
   1 ffffff0d47e2ddc0

In the cases where we only have one mac client on the group, the group correspondingly only has a single client. This is important as if a non-default group has its last client removed then we change the state transition that we take while going through mac_datapath_teardown and end up releasing the transmit group entirely.

When the transmit group is removed, its ring is restored to the default group. However, that triggers us to go through and recreate the srs tx stat. Now, for the link kstat for a vnic goes through the following path to get the answer to that number of bytes that's been transmitted being the obytes member and the MAC_STAT_OBYTES:

dls_devnet_stat_update()
 +-> dls_stat_update()
   +-> mac_stat_get()
     +-> driver specific stat, vnic_m_stat()
       +-> mac_client_stat_get()

The mac_client_stat_get() function just reads from the srs tx stat, which happens to be what we just clobbered for everything in the default group. It's pretty easy to see this by doing the following commands before and after halting a zone that has one of the singleton groups identified above:

# kstat -n mac_misc_stat -s obytes -p | sort > misc.post
# kstat -m link -s obytes -p | sort > link.pre

If we do this, let's look at a few of the stats for some zones. We'll note that we added to the defunct tx stats, but the overall link stat was truncated:

pre:
  z103_net1:0:mac_misc_stat:obytes        0                              
post:
  z103_net1:0:mac_misc_stat:obytes        261775867                      

And the corresponding change in the link stat:

pre:
  link:0:z103_net1:obytes 261773435                                      
post:
  link:0:z103_net1:obytes 2852                                           

The solution here is that when someone is asking for the internal mac client stat, we need to consult anything that's in the defunct group when answering it. This allows mac to still break them apart into different categories, but allows consumers to obtain the logical number of bytes actually used by the GLDv3 device despite any changes to ring assignments.

History

#1

Updated by Electric Monk about 4 years ago

  • Status changed from New to Closed

git commit d47ced1f1801c1c1ca309ff7c5997bf0c8cb4092

commit  d47ced1f1801c1c1ca309ff7c5997bf0c8cb4092
Author: Robert Mustacchi <rm@joyent.com>
Date:   2015-10-08T05:34:49.000Z

    6273 obytes64 drops despite vnic not being destroyed
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Approved by: Gordon Ross <gordon.ross@nexenta.com>

#2

Updated by Richard PALO about 4 years ago

NULL pointer constant issues with these files:

diff --git a/usr/src/cmd/mdb/common/modules/mac/mac.c b/usr/src/cmd/mdb/common/modules/mac/mac.c
index 9c1bcf4..067fe93 100644
--- a/usr/src/cmd/mdb/common/modules/mac/mac.c
+++ b/usr/src/cmd/mdb/common/modules/mac/mac.c
@@ -452,7 +452,7 @@ mac_common_walk_step(mdb_walk_state_t *wsp)
 {
     int status;

-    if (wsp->walk_addr == NULL)
+    if (wsp->walk_addr == (uintptr_t)NULL)
         return (WALK_DONE);

     status = wsp->walk_callback(wsp->walk_addr, wsp->walk_data,
@@ -1023,7 +1023,7 @@ mac_group_walk_init(mdb_walk_state_t *wsp)
 {
     int ret;

-    if (wsp->walk_addr != NULL) {
+    if (wsp->walk_addr != (uintptr_t)NULL) {
         mdb_warn("non-global walks are not supported\n");
         return (WALK_ERR);
     }
@@ -1049,7 +1049,7 @@ mac_group_walk_step(mdb_walk_state_t *wsp)
      * walkers are a bit unsporting, they don't actually read in the data
      * for us.
      */
-    if (wsp->walk_addr == NULL)
+    if (wsp->walk_addr == (uintptr_t)NULL)
         return (WALK_DONE);

     if (mdb_vread(&mi, sizeof (mac_impl_t), wsp->walk_addr) == -1) {
@@ -1061,7 +1061,7 @@ mac_group_walk_step(mdb_walk_state_t *wsp)
      * First go for rx groups, then tx groups.
      */
     mgp = (uintptr_t)mi.mi_rx_groups;
-    while (mgp != NULL) {
+    while (mgp != (uintptr_t)NULL) {
         if (mdb_vread(&mg, sizeof (mac_group_t), mgp) == -1) {
             mdb_warn("failed to read mac_group_t at %p", mgp);
             return (WALK_ERR);
@@ -1074,7 +1074,7 @@ mac_group_walk_step(mdb_walk_state_t *wsp)
     }

     mgp = (uintptr_t)mi.mi_tx_groups;
-    while (mgp != NULL) {
+    while (mgp != (uintptr_t)NULL) {
         if (mdb_vread(&mg, sizeof (mac_group_t), mgp) == -1) {
             mdb_warn("failed to read mac_group_t at %p", mgp);
             return (WALK_ERR);
@@ -1095,7 +1095,7 @@ mac_group_count_clients(mac_group_t *mgp)
     int clients = 0;
     uintptr_t mcp = (uintptr_t)mgp->mrg_clients;

-    while (mcp != NULL) {
+    while (mcp != (uintptr_t)NULL) {
         mac_grp_client_t c;

         if (mdb_vread(&c, sizeof (c), mcp) == -1) {
diff --git a/usr/src/uts/common/io/mac/mac_client.c b/usr/src/uts/common/io/mac/mac_client.c
index 85e9293..b6ede73 100644
--- a/usr/src/uts/common/io/mac/mac_client.c
+++ b/usr/src/uts/common/io/mac/mac_client.c
@@ -1446,7 +1446,7 @@ mac_client_open(mac_handle_t mh, mac_client_handle_t *mchp, char *name,
      */
     mac_client_add(mcip);

-    mcip->mci_share = NULL;
+    mcip->mci_share = (uintptr_t)NULL;
     if (share_desired)
         i_mac_share_alloc(mcip);

@@ -1701,7 +1701,7 @@ mac_client_set_rings_prop(mac_client_impl_t *mcip, mac_resource_props_t *mrp,
     uint_t            ringcnt;
     boolean_t        unspec;

-    if (mcip->mci_share != NULL)
+    if (mcip->mci_share != (uintptr_t)NULL)
         return (EINVAL);

     if (mrp->mrp_mask & MRP_RX_RINGS) {
@@ -3432,7 +3432,7 @@ mac_tx_cookie_t
 mac_tx(mac_client_handle_t mch, mblk_t *mp_chain, uintptr_t hint,
     uint16_t flag, mblk_t **ret_mp)
 {
-    mac_tx_cookie_t        cookie = NULL;
+    mac_tx_cookie_t        cookie = (uintptr_t)NULL;
     int            error;
     mac_tx_percpu_t        *mytx;
     mac_soft_ring_set_t    *srs;
@@ -3449,7 +3449,7 @@ mac_tx(mac_client_handle_t mch, mblk_t *mp_chain, uintptr_t hint,
         MAC_TX_TRY_HOLD(mcip, mytx, error);
         if (error != 0) {
             freemsgchain(mp_chain);
-            return (NULL);
+            return ((uintptr_t)NULL);
         }
     }

@@ -3530,7 +3530,7 @@ mac_tx(mac_client_handle_t mch, mblk_t *mp_chain, uintptr_t hint,

         MAC_TX(mip, srs_tx->st_arg2, mp_chain, mcip);
         if (mp_chain == NULL) {
-            cookie = NULL;
+            cookie = (uintptr_t)NULL;
             SRS_TX_STAT_UPDATE(srs, opackets, 1);
             SRS_TX_STAT_UPDATE(srs, obytes, obytes);
         } else {
@@ -3600,7 +3600,7 @@ mac_tx_is_flow_blocked(mac_client_handle_t mch, mac_tx_cookie_t cookie)
      */
     if (mac_srs->srs_tx.st_mode == SRS_TX_FANOUT ||
         mac_srs->srs_tx.st_mode == SRS_TX_AGGR) {
-        if (cookie != NULL) {
+        if (cookie != (uintptr_t)NULL) {
             sringp = (mac_soft_ring_t *)cookie;
             mutex_enter(&sringp->s_ring_lock);
             if (sringp->s_ring_state & S_RING_TX_HIWAT)

Also available in: Atom PDF