Project

General

Profile

Bug #10283

bufmod sends corrupted LSO packets

Added by Robert Mustacchi 11 months ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
networking
Start date:
2019-01-24
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

I first discovered this during my LSO work, but I've also produced this behavior on 20180621T003454Z. In fact, this problem has existed for as long as it's been possible to send LSO packets.

The problem presents itself when you try to snoop a link sending LSO packets. The traffic itself is fine (e.g. iperf doens't complain), but data sent to snoop is being corrupted in some way.


<GZ> root@gaia [~]
# snoop -r -vvvv -d igb0 host 192.168.2.60
...
(warning) bad packet header in buffer offset 15080: length=314881378
(warning) bad packet header in buffer offset 15084: length=-975845868
(warning) bad packet header in buffer offset 15088: length=503537050
(warning) bad packet header in buffer offset 15092: length=-730215749
(warning) bad packet header in buffer offset 15096: length=-411264108
(warning) bad packet header in buffer offset 15100: length=-1127292324
(warning) bad packet header in buffer offset 15104: length=1283861262
(warning) bad packet header in buffer offset 15108: length=82423970
(warning) bad packet header in buffer offset 15112: length=2048372326
(warning) bad packet header in buffer offset 15116: length=239070324
(warning) bad packet header in buffer offset 15120: length=86852170

When you snoop a device the packets are delivered through a STREAMS stream. The bottom is the interface itself, the top is essentially DLPI (snoop uses dlpi_recv() to passively read packets), and the middle is made up of the packet filter and buffer modules.

> 0xfffffe5b7a18bbf8::stream

+-----------------------+-----------------------+
| 0xfffffe5b722d0900    | 0xfffffe5b722d0808    | 
| strwhead              | strrhead              | 
|                       |                       |
| cnt = 0t0             | cnt = 0t288106        | 
| flg = 0x00004022      | flg = 0x0004403c      | 
+-----------------------+-----------------------+
            |                       ^
            v                       |
+-----------------------+-----------------------+
| 0xffffff3a36f27b70    | 0xffffff3a36f27a78    | 
| bufmod                | bufmod                | 
|                       |                       |
| cnt = 0t0             | cnt = 0t0             | 
| flg = 0x00000822      | flg = 0x00000832      | 
+-----------------------+-----------------------+
            |                       ^
            v                       |
+-----------------------+-----------------------+
| 0xfffffe5d33e36920    | 0xfffffe5d33e36828    | 
| pfmod                 | pfmod                 | 
|                       |                       |
| cnt = 0t0             | cnt = 0t0             | 
| flg = 0x00000822      | flg = 0x00000832      | 
+-----------------------+-----------------------+
            |                       ^
            v                       |
+-----------------------+-----------------------+
| 0xfffffe761d5a30f8    | 0xfffffe761d5a3000    | 
| igb                   | igb                   | 
|                       |                       |
| cnt = 0t0             | cnt = 0t0             | 
| flg = 0x00244062      | flg = 0x00204032      | 
+-----------------------+-----------------------+

pfmod filters packets in the kernel (based on the client filter expression) and bufmod buffers them up into chunks to send up to userspace. These "chunks" are the interface between bufmod and snoop. A chunk is just an array of bytes containing mblks/packets. In order to help snoop find the bounds of each packet bufmod adds a header to each mblk. This header is a struct sb_hdr and describes both the original length of the packet as well as the new total length after adding the header and padding out the mblk to an 8 byte boundary. You can see all of this go down in bufmod.c:sbaddmsg().

In userspace, snoop is reading these chunks and then using the sb_hdr to determine mblk boundaries. It does this in the snoop_caputure.c:scan() function. Before attempting to parse the incoming mblk and print it out, snoop first performs some sanity checks to make sure it's dealing with a legit buffer.

        if ((nhdrp->sbh_totlen == 0) ||
            (bp + nhdrp->sbh_totlen) < bp ||
            (bp + nhdrp->sbh_totlen) > bufstop ||
            (nhdrp->sbh_origlen == 0) ||
            (bp + nhdrp->sbh_origlen) < bp ||
            (nhdrp->sbh_msglen == 0) ||
            (bp + nhdrp->sbh_msglen) < bp ||
            (bp + nhdrp->sbh_msglen) > bufstop ||
            (nhdrp->sbh_msglen > nhdrp->sbh_origlen) ||
            (nhdrp->sbh_totlen < nhdrp->sbh_msglen) ||
            (nhdrp->sbh_timestamp.tv_sec == 0)) {
            if (cap) {
                (void) fprintf(stderr, "(warning) bad packet " 
                    "header in capture file");
            } else {
                (void) fprintf(stderr, "(warning) bad packet " 
                    "header in buffer");
            }
            (void) fprintf(stderr, " offset %d: length=%d\n",
                bp - buf, nhdrp->sbh_totlen);
            goto err;
        }

If any of these checks fail, snoop prints the "bad packet" message and then starts scanning the rest of the chunk bytes in hopes of finding a legit mblk. If you read the snoop messages closely you'll see the offset always increase by four – this is snoop trying to find a good packet in the chunk. So one bad packet from bufmod can cause a whole ton of messages in snoop (especially when it's near the LSO limit).

With dtrace in hand I was able to discover that the sbh_totlen (the length of the packet + sb_hdr + padding) was greater than the chunk length. The chunk length is based off the length of the read returned by dlpi_receive(). Why are they different?

If you look back at sbaddmsg() there is one case that seems wrong.

        pad = Align(hp.sbh_totlen);
        hp.sbh_totlen += sizeof (hp);
        hp.sbh_totlen += pad;
...
        /*
         * Join message to the chunk
         */
        linkb(sbp->sb_head, mp);

        sbp->sb_mcount++;
        sbp->sb_mlen += hp.sbh_totlen;

        /*
         * If the first message alone is too big for the chunk close
         * the chunk now.
         * If the next message would immediately cause the chunk to
         * overflow we may as well close the chunk now. The next
         * message is certain to be at least SMALLEST_MESSAGE size.
         *
         * RPZ: So at this point we've added the message to
         * the chunk but we haven't added the padding. This
         * feels like the bug to me.
         */
        if (hp.sbh_totlen + SMALLEST_MESSAGE > sbp->sb_chunk) {
            sbclosechunk(sbp);
            return;
        }
...
        /* RPZ: Code that follows adds the padding. */

So the packet sizes that were giving me trouble were all 65505 (without the padding, 65512 with). That is, dlpi_recv was returning 65505 as msglen, but sbh_totlen was 65512. In this case sb_chunk is 64K (65536). The difference between 64K and sbh_totlen is 24 – which also happens to be the size of sb_hdr. So what's the value of SMALLEST_MESSAGE?

#define    SMALLEST_MESSAGE    sizeof (struct sb_hdr) + _POINTER_ALIGNMENT

usr/src/uts/common/sys/isa_defs.h:260: (<global>) #define _POINTER_ALIGNMENT 8
usr/src/uts/common/sys/isa_defs.h:326: (<global>) #define _POINTER_ALIGNMENT 4
usr/src/uts/common/sys/isa_defs.h:435: (<global>) #define _POINTER_ALIGNMENT 4
usr/src/uts/common/sys/isa_defs.h:463: (<global>) #define _POINTER_ALIGNMENT 8

Bingo! This means that the if expression will be true and we will close the chunk before padding the mblk. I confirmed this theory with dtrace. Below you see that the packets of length 65481 (65505 minus the 24 byte sb_hdr) are linked to the chunk but memset() is never called (to add the padding).

# dtrace -qn '::sbaddmsg:entry /msgdsize(args[1]) > 60000/ { self->sz = msgdsize(args[1]); self->chunk = ((struct sb *)(args[0]->q_ptr))->sb_chunk; self->t = 1; } ::linkb:entry /self->t/ { self->link = 1; } ::memset:entry /self->link/ { self->memset = 1; } ::sbaddmsg:return /self->t/ { @[self->chunk, self->sz, self->link, self->memset] = count(); self->sz = 0; self->chunk = 0; self->t = 0; self->link = 0; self->memset = 0; }'
^C

    65536            64874        1        1                1
    65536            64930        1        1                1
    65536            61402        1        1               65
    65536            65481        1        0            33274

Testing notes:

Verified snoop corrupt packet errors on the current platform.


<GZ> root@gaia [~]
# uname -v
joyent_20180802T002654Z

<GZ> root@gaia [~]
# snoop -r -vvvv -d ixgbe0 host 192.168.2.60 1490 1> /dev/null 2> /var/tmp/snoop-err.txt
^C
<GZ> root@gaia [~]
# wc -l /var/tmp/snoop-err.txt 
  409426 /var/tmp/snoop-err.txt

<GZ> root@gaia [~]
# head /var/tmp/snoop-err.txt 
Using device ixgbe0 (promiscuous mode)
(warning) bad packet header in buffer offset 0: length=65512
(warning) bad packet header in buffer offset 4: length=574
(warning) bad packet header in buffer offset 8: length=1534559331
(warning) bad packet header in buffer offset 12: length=808944
(warning) bad packet header in buffer offset 16: length=-1029753184
(warning) bad packet header in buffer offset 20: length=916495931
(warning) bad packet header in buffer offset 24: length=-1673805153
(warning) bad packet header in buffer offset 28: length=4521992
(warning) bad packet header in buffer offset 32: length=1963703295

Then verified there were no corrupted packets on a patched platform.


<GZ> root@gaia [~]
# uname -v
joyent_20180815T192218Z

<GZ> root@gaia [~]
# snoop -r -vvvv -d ixgbe0 host 192.168.2.60 1490 1> /dev/null 2> /var/tmp/snoop-err.txt
^C
<GZ> root@gaia [~]
# wc -l /var/tmp/snoop-err.txt 
       1 /var/tmp/snoop-err.txt

<GZ> root@gaia [~]
# head /var/tmp/snoop-err.txt 
Using device ixgbe0 (promiscuous mode)

History

#1

Updated by Electric Monk 10 months ago

  • Status changed from New to Closed

git commit eb00302c9a73ef4cd8d5e1958b3356988094c00f

commit  eb00302c9a73ef4cd8d5e1958b3356988094c00f
Author: Ryan Zezeski <rpz@joyent.com>
Date:   2019-02-05T16:46:14.000Z

    10283 bufmod sends corrupted LSO packets
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Also available in: Atom PDF