Project

General

Profile

Actions

Bug #13230

open

i40e has duplicate traffic when used with bhyve/snoop running

Added by Jorge Schrauwen almost 3 years ago. Updated 23 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

After having my board replaced my 2x 10G copper i40e nics are now working... sort of.

I noticed I was getting duplicate traffic inside bhyve which broke IPv6 and PPPoE, also duplicate traffic is not good regardless of what it breaks.

Initial tests:
- turn of offloading in the VM (had issues with this in the past): no change
- switch to a different underlying nic, igb in this case: works fine without dups
- switch back to i40e: dups are back...
- tried a 1G link instead of a 10G link: dups remained.

Left the system to run over igb (1G) instead of i40e (10G) until I had more time to debug.

Second set of tests:
- eliminated bhyve and add a vnic with a vlan on top of i40e: no dups (oh shit was it bhyve? ha no it wasn't)
- run snoop on the vnic: dups return (aha! so I guess whatever viona/bhyve is doing snoop also does)
( I think someone else was seeing a lot of re-transmits too with snoop running a few weeks ago, maybe related?)
- eliminated vnic with vlan tagging: same result! no dups unless snoop is running, then the dups are back.

Setup information:

[root@boron ~]# dladm show-phys
LINK         MEDIA                STATE      SPEED  DUPLEX    DEVICE
i40e0        Ethernet             up         1000   full      i40e0
igb0         Ethernet             up         1000   full      igb0
i40e1        Ethernet             up         1000   full      i40e1

i40e0: admin nictag (no vlan tagging)
igb0: trunk nictag (vlan tagging)
i40e1: what I want to use as trunk nictag, now just there for testing.

[root@boron ~]# dladm show-vnic
LINK         OVER       SPEED MACADDRESS        MACADDRTYPE VID  ZONE
net0         igb0       0     82:fc:c:c2:e4:6f  fixed       110  eceec5eb-22c7-e65b-fd6b-81ebdb36406b
net1         igb0       0     0:22:6:a:0:1      fixed       110  99e40ee7-a8f9-4b57-9225-e7bd19f64b07
net2         igb0       0     0:22:6:14:0:1     fixed       120  99e40ee7-a8f9-4b57-9225-e7bd19f64b07
net3         igb0       0     0:22:6:1e:0:1     fixed       130  99e40ee7-a8f9-4b57-9225-e7bd19f64b07
net4         igb0       0     0:22:6:28:0:1     fixed       140  99e40ee7-a8f9-4b57-9225-e7bd19f64b07
net0         igb0       0     0:22:6:5:0:a      fixed       10   99e40ee7-a8f9-4b57-9225-e7bd19f64b07
vnic110      i40e1      1000  2:8:20:43:c8:a0   random      110  --

vnic110, vnic on top of i40e1 tagged with vlan 110

[root@boron ~]# ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
i40e0/_a          static   ok           10.23.40.211/24
vnic110/dhcp      dhcp     ok           10.23.10.225/24
lo0/v6            static   ok           ::1/128
i40e0/?           static   ok           fe80::3eec:efff:fe0c:783a/10
i40e0/_b          static   ok           2a02:578:470f:40::211/64
i40e0/?           addrconf ok           2a02:578:470f:40:3eec:efff:fe0c:783a/64
i40e0/?           dhcp     ok           2a02:578:470f:40::246/128

10.23.40.211 gz-ip without vlan tagging
10.23.10.225 gz-ip with vlan tagging

Ping results without snoop running:

[iris :: sjorge][~]
[.]$ ping -c 10 10.23.40.211
PING 10.23.40.211 (10.23.40.211): 56 data bytes
64 bytes from 10.23.40.211: icmp_seq=0 ttl=255 time=0.745 ms
64 bytes from 10.23.40.211: icmp_seq=1 ttl=255 time=0.248 ms
64 bytes from 10.23.40.211: icmp_seq=2 ttl=255 time=0.234 ms
64 bytes from 10.23.40.211: icmp_seq=3 ttl=255 time=0.238 ms
64 bytes from 10.23.40.211: icmp_seq=4 ttl=255 time=0.349 ms
64 bytes from 10.23.40.211: icmp_seq=5 ttl=255 time=0.223 ms
64 bytes from 10.23.40.211: icmp_seq=6 ttl=255 time=0.224 ms
64 bytes from 10.23.40.211: icmp_seq=7 ttl=255 time=0.345 ms
64 bytes from 10.23.40.211: icmp_seq=8 ttl=255 time=0.339 ms
64 bytes from 10.23.40.211: icmp_seq=9 ttl=255 time=0.333 ms

--- 10.23.40.211 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.223/0.328/0.745/0.148 ms

[hyperon :: sjorge][~]
[■]$ ping -c 10 10.23.10.225
PING 10.23.10.225 (10.23.10.225): 56 data bytes
64 bytes from 10.23.10.225: icmp_seq=0 ttl=255 time=0.429 ms
64 bytes from 10.23.10.225: icmp_seq=1 ttl=255 time=0.420 ms
64 bytes from 10.23.10.225: icmp_seq=2 ttl=255 time=0.702 ms
64 bytes from 10.23.10.225: icmp_seq=3 ttl=255 time=0.523 ms
64 bytes from 10.23.10.225: icmp_seq=4 ttl=255 time=0.593 ms
64 bytes from 10.23.10.225: icmp_seq=5 ttl=255 time=0.593 ms
64 bytes from 10.23.10.225: icmp_seq=6 ttl=255 time=0.605 ms
64 bytes from 10.23.10.225: icmp_seq=7 ttl=255 time=0.575 ms
64 bytes from 10.23.10.225: icmp_seq=8 ttl=255 time=0.396 ms
64 bytes from 10.23.10.225: icmp_seq=9 ttl=255 time=0.683 ms

--- 10.23.10.225 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.396/0.552/0.702/0.102 ms

Pings were done from macOS for 10.23.10.0/24 and from a freebsd host for 10.23.40.0/24... to make sure there was no router in play.

Ping results with snoop running:

[hyperon :: sjorge][~]
[■]$ ping -c 10 10.23.10.225
PING 10.23.10.225 (10.23.10.225): 56 data bytes
64 bytes from 10.23.10.225: icmp_seq=0 ttl=255 time=0.575 ms
64 bytes from 10.23.10.225: icmp_seq=0 ttl=255 time=0.580 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=1 ttl=255 time=8.708 ms
64 bytes from 10.23.10.225: icmp_seq=1 ttl=255 time=8.711 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=2 ttl=255 time=0.447 ms
64 bytes from 10.23.10.225: icmp_seq=2 ttl=255 time=0.452 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=3 ttl=255 time=0.446 ms
64 bytes from 10.23.10.225: icmp_seq=3 ttl=255 time=0.451 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=4 ttl=255 time=0.389 ms
64 bytes from 10.23.10.225: icmp_seq=4 ttl=255 time=0.436 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=5 ttl=255 time=0.589 ms
64 bytes from 10.23.10.225: icmp_seq=5 ttl=255 time=0.597 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=6 ttl=255 time=0.601 ms
64 bytes from 10.23.10.225: icmp_seq=6 ttl=255 time=0.613 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=7 ttl=255 time=0.398 ms
64 bytes from 10.23.10.225: icmp_seq=7 ttl=255 time=0.611 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=8 ttl=255 time=0.586 ms
64 bytes from 10.23.10.225: icmp_seq=8 ttl=255 time=0.622 ms (DUP!)
64 bytes from 10.23.10.225: icmp_seq=9 ttl=255 time=8.514 ms

--- 10.23.10.225 ping statistics ---
10 packets transmitted, 10 packets received, +9 duplicates, 0.0% packet loss
round-trip min/avg/max/stddev = 0.389/1.807/8.711/2.962 ms

[root@boron ~]# snoop -d vnic110 'icmp'
Using device vnic110 (promiscuous mode)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 0)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 0)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 0)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 0)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 1)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 1)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 1)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 1)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 2)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 2)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 2)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 2)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 3)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 3)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 3)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 3)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 4)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 4)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 4)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 4)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 5)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 5)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 5)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 5)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 6)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 6)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 6)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 6)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 7)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 7)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 7)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 7)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 8)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 8)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 8)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 8)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 9)
hyperon.vlan10.acheron.be -> 10.23.10.225 ICMP Echo request (ID: 50479 Sequence number: 9)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 9)
10.23.10.225 -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 50479 Sequence number: 9)

[iris :: sjorge][~]
[.]$ ping -c 10 10.23.40.211
PING 10.23.40.211 (10.23.40.211): 56 data bytes
64 bytes from 10.23.40.211: icmp_seq=0 ttl=255 time=0.524 ms
64 bytes from 10.23.40.211: icmp_seq=0 ttl=255 time=0.633 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=1 ttl=255 time=0.402 ms
64 bytes from 10.23.40.211: icmp_seq=1 ttl=255 time=0.519 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=2 ttl=255 time=0.391 ms
64 bytes from 10.23.40.211: icmp_seq=2 ttl=255 time=0.518 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=3 ttl=255 time=0.407 ms
64 bytes from 10.23.40.211: icmp_seq=3 ttl=255 time=0.535 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=4 ttl=255 time=0.406 ms
64 bytes from 10.23.40.211: icmp_seq=4 ttl=255 time=0.427 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=5 ttl=255 time=0.400 ms
64 bytes from 10.23.40.211: icmp_seq=5 ttl=255 time=0.421 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=6 ttl=255 time=0.402 ms
64 bytes from 10.23.40.211: icmp_seq=6 ttl=255 time=0.424 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=7 ttl=255 time=0.528 ms
64 bytes from 10.23.40.211: icmp_seq=7 ttl=255 time=0.550 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=8 ttl=255 time=0.516 ms
64 bytes from 10.23.40.211: icmp_seq=8 ttl=255 time=0.536 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=9 ttl=255 time=0.515 ms

--- 10.23.40.211 ping statistics ---
10 packets transmitted, 10 packets received, +9 duplicates, 0.0% packet loss
round-trip min/avg/max/stddev = 0.391/0.477/0.633/0.069 ms

[root@boron ~]# snoop -d i40e0 'icmp'
Using device i40e0 (promiscuous mode)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 0)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 0)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 0)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 0)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 1)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 1)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 1)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 1)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 2)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 2)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 2)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 2)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 3)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 3)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 3)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 3)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 4)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 4)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 4)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 4)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 5)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 5)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 5)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 5)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 6)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 6)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 6)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 6)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 7)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 7)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 7)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 7)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 8)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 8)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 8)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 8)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 9)
iris.vlan40.acheron.be -> boron        ICMP Echo request (ID: 7992 Sequence number: 9)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 9)
       boron -> iris.vlan40.acheron.be ICMP Echo reply (ID: 7992 Sequence number: 9)
Actions #1

Updated by Jorge Schrauwen almost 3 years ago

The nics are build into a https://www.supermicro.com/en/products/motherboard/X11SPH-nCTF

So it's Intel X722 + X557 PHY

Actions #2

Updated by Jorge Schrauwen almost 3 years ago

[17:47:28] rzezeski: sjorge: I couldn't reproduce, try running this DTrace script while doing your pings with dup traffic: https://gist.github.com/rzezeski/6e99738bddc7f3a5fd0b90f9c6c4d187
[17:47:44] rzezeski: run with `dtrace -Cs /path/to/file.d`
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   

RESERVED    1/13   b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
DEFAULT     0/1    b8:27:eb:5f:a7:4b 3c:ec:ef:0c:78:3a 0800 0    10.23.40.42     10.23.40.211    1   
Actions #3

Updated by Jorge Schrauwen almost 3 years ago


RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
DEFAULT     0/0    00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   
RESERVED    2/17   00:22:06:03:10:01 02:08:20:43:c8:a0 0800 110  10.23.10.150    10.23.10.225    1   

This one was with the vnic on top, I killed the snoop need the end and the default one goes away

Actions #4

Updated by Ryan Zezeski almost 3 years ago

The DTrace output indicates that the unicast traffic is hitting both the reserved group (as it should) and the default group (which it should not). We program the mac providers so that unicast traffic flows to its reserved group (if one is available) and multicast/broadcast/all other traffic flow to the default group so it can be software classified. When promisc mode is enabled the reserved group unicast traffic should remain untouched, but all other traffic on the L2 network should arrive on the default group. This allows all traffic to reach mac_rx_common() and thus be delivered to mac_promisc_dispatch(). In this case, since the unicast traffic is being delivered to both the reserved group and default group, we are having duplicate calls to mac_rx_common(). And since this traffic is destined for an existing client it means we see duplicate traffic both in snoop AND at the client itself (once for direct delivery via the hardware ring/SRS, and once for the default ring/mac_rx_classify()).

I do not encounter the same issue on my i40e part (X710-DA2). This indicates either some type of L2 steering change in the firmware or perhaps a logic bug in how we program these parts that works by dumb luck in some cases. More research is needed into the hardware/firmware differences and perhaps a peek at the latest PRM.

Actions #5

Updated by Jorge Schrauwen almost 3 years ago

Firmware is the latest available from SuperMicro for this board/nic combo.

Labeled as NVM v4.11, not sure in the PXE/iSCSI blobs, but probably the NVM is the interesting one

fw 3.1.55727 api 1.5 nvm 4.11 etid 80001da4 oem 1.265.0

Actions #6

Updated by Jorge Schrauwen almost 3 years ago

root@fermion:~ # tcpdump -s0 -i ixl0.110 icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ixl0.110, link-type EN10MB (Ethernet), capture size 262144 bytes
08:38:39.390648 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 61, length 64
08:38:39.390911 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 61, length 64
08:38:40.393513 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 62, length 64
08:38:40.393790 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 62, length 64
08:38:41.395851 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 63, length 64
08:38:41.396146 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 63, length 64
08:38:42.398967 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 64, length 64
08:38:42.399234 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 64, length 64
08:38:43.401270 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 65, length 64
08:38:43.401512 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 65, length 64
08:38:44.405983 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 66, length 64
08:38:44.406258 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 66, length 64
08:38:45.411345 IP tachyon.vlan10.acheron.be > fermion.acheron.be: ICMP echo request, id 25026, seq 67, length 64
08:38:45.411634 IP fermion.acheron.be > tachyon.vlan10.acheron.be: ICMP echo reply, id 25026, seq 67, length 64
^C
14 packets captured
24 packets received by filter
0 packets dropped by kernel

Same NIC passthru to a freebsd VM via bhyve, works fine there when running the interface in promisc mode.

Seems likely we're programming it wrong somewhere.

Edit: this was with the latest intel provided driver for freebsd (https://downloadcenter.intel.com/download/25160/Intel-Network-Adapter-Driver-for-Intel-Ethernet-Controller-700-Series-under-FreeBSD-?wapkw=x722)

Actions #7

Updated by Jorge Schrauwen almost 3 years ago

Device[104] /devices/pci@5e,0/pci8086,2030@0/pci8086,37c0@0/pci8086,37c5@3/pci15d9,37d2@0
Class [ufm]
        Vendor: Intel Corporation
        Device: Ethernet Connection X722 for 10GBASE-T
        Capabilities: Report
        Image 0: Firmware
            Slot 0 (-|-|a): 4.1

prtconf -Dv

                    pci15d9,37d2, instance #0 (driver name: i40e)
                        Driver properties:
                            name='api-version' type=string items=1 dev=none
                                value='1.9'
                            name='firmware-build' type=string items=1 dev=none
                                value='e70c'
                            name='firmware-version' type=string items=1 dev=none
                                value='4.1'
                            name='printed-board-assembly' type=string items=1 dev=none
                                value=''
                            name='fm-accchk-capable' type=boolean dev=none
                            name='fm-dmachk-capable' type=boolean dev=none
                            name='fm-errcb-capable' type=boolean dev=none
                            name='fm-ereport-capable' type=boolean dev=none
                        Hardware properties:
                            name='ddi-ufm-capable' type=boolean
                            name='pci-msix-capid-pointer' type=int items=1
                                value=00000070
                            name='pci-msi-capid-pointer' type=int items=1
                                value=00000050
                            name='pcie-link-supported-speeds' type=int64 items=1
                                value=000000009502f900
                            name='pcie-link-maximum-speed' type=int64 items=1
                                value=000000009502f900
                            name='pcie-link-current-speed' type=int64 items=1
                                value=000000009502f900
                            name='pcie-link-current-width' type=int items=1
                                value=00000001
                            name='pcie-link-maximum-width' type=int items=1
                                value=00000001
                            name='pcie-aspm-state' type=string items=1
                                value='disabled'
                            name='pcie-aspm-support' type=string items=1
                                value='l0s,l1'
                            name='pcie-serial' type=byte items=8
                                value=3c.ec.ef.ff.ff.0c.78.3a
                            name='acpi-namespace' type=string items=1
                                value='\_SB_.PC02.BR2A.EPCU.VSP3.FPK0'
                            name='assigned-addresses' type=int items=10
                                value=c3670010.00000000.df000000.00000000.01000000.c367001c.00000000.e0008000.00000000.00008000
                            name='reg' type=int items=15
                                value=00670000.00000000.00000000.00000000.00000000.43670010.00000000.00000000.00000000.01000000.4367001c.00000000.00000000.00000000.00008000
                            name='compatible' type=string items=15
                                value='pciex8086,37d2.15d9.37d2.9' + 'pciex8086,37d2.15d9.37d2' + 'pciex8086,37d2.9' + 'pciex8086,37d2' + 'pciexclass,020000' + 'pciexclass,0200' + 'pci8086,37d2.15d9.37d2.9' + 'pci8086,37d2.15d9.37d2' + 'pci15d9,37d2,s' + 'pci15d9,37d2' + 'pci8086,37d2.9' + 'pci8086,37d2,p' + 'pci8086,37d2' + 'pciclass,020000' + 'pciclass,0200'
                            name='model' type=string items=1
                                value='Ethernet controller'
                            name='power-consumption' type=int items=2
                                value=00000001.00000001
                            name='devsel-speed' type=int items=1
                                value=00000000
                            name='interrupts' type=int items=1
                                value=00000001
                            name='subsystem-vendor-id' type=int items=1
                                value=000015d9
                            name='subsystem-id' type=int items=1
                                value=000037d2
                            name='unit-address' type=string items=1
                                value='0'
                            name='class-code' type=int items=1
                                value=00020000
                            name='revision-id' type=int items=1
                                value=00000009
                            name='vendor-id' type=int items=1
                                value=00008086
                            name='device-id' type=int items=1
                                value=000037d2
                            name='vendor-name' type=string items=1
                                value='Intel Corporation'
                            name='device-name' type=string items=1
                                value='Ethernet Connection X722 for 10GBASE-T'
                            name='subsystem-name' type=string items=1
                                value='unknown subsystem'
                        Device Minor Nodes:
                            dev=(296,1)
                                dev_path=/pci@5e,0/pci8086,2030@0/pci8086,37c0@0/pci8086,37c5@3/pci15d9,37d2@0:i40e0
                                    spectype=chr type=minor
                                    dev_link=/dev/i40e0
                            dev=(296,1006)
                                dev_path=<clone>
                                Device Minor Layered Under:
                                    mod=udp6 accesstype=chr
                                        dev_path=/pseudo/udp6@0
                            dev=(296,1004)
                                dev_path=<clone>
                                Device Minor Layered Under:
                                    mod=udp accesstype=chr
                                        dev_path=/pseudo/udp@0
                            dev=(296,1003)
                                dev_path=<clone>
                                Device Minor Layered Under:
                                    mod=udp accesstype=chr
                                        dev_path=/pseudo/udp@0

Actions #8

Updated by Dan McDonald over 2 years ago

I'm curious about calls to i40e_aq_set_vsi_unicast_promiscuous(), which lives in the Intel common code.

In both FreeBSD and Linux, callers to that function always set rx_only_promisc to TRUE, where we set it to FALSE. I'm not sure if it's a deliberate choice on our part or not, but Jorge, if you could alter i40e callers (all of which are in i40e_gld.c) to pass B_TRUE for the rx_only_promisc parameter, perhaps we can see if the problem disappears.

I'm concerned, naturally, that we may break something else in doing this, however.

Actions #9

Updated by Dan McDonald over 2 years ago

Looking at the common code, I see this:


2280    if (set) {
2281        flags |= I40E_AQC_SET_VSI_PROMISC_UNICAST;
2282        if (rx_only_promisc &&
2283            (((hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver >= 5)) ||
2284             (hw->aq.api_maj_ver > 1)))
2285            flags |= I40E_AQC_SET_VSI_PROMISC_TX;
2286    }

Maybe the older equipment or sufficiently-old-FW doesn't have the right API version to pay attention to the I40E_AQC_SET_VSI_PROMISC_TX flag?

Actions #10

Updated by Dan McDonald over 2 years ago

  • Category set to driver - device drivers
  • Assignee set to Dan McDonald
Actions #11

Updated by Dan McDonald over 2 years ago

Possible fix here: https://kebe.com/~danmcd/webrevs/13230/

This webrev URL will likely be updated if we discover something else is the cause. It may also be removed if it's a harder problem than first imagined and more investigation is needed.

Actions #12

Updated by Jorge Schrauwen over 2 years ago

[root@boron ~]# piadm list
PI STAMP               BOOTABLE FILESYSTEM            BOOT IMAGE NOW  NEXT
20210129T105411Z       standalone/boot                available  no   no
20210129T174735Z       standalone/boot                next       no   no
20210202T164613Z       standalone/boot                none       yes  yes

Using the PI you posted on IRC

Not fixed :(

[hyperon :: sjorge][~]
[■]$ ping boron
PING boron.vlan40.acheron.be (10.23.40.211): 56 data bytes
64 bytes from 10.23.40.211: icmp_seq=0 ttl=254 time=0.788 ms
64 bytes from 10.23.40.211: icmp_seq=1 ttl=254 time=1.271 ms
64 bytes from 10.23.40.211: icmp_seq=2 ttl=254 time=1.124 ms
64 bytes from 10.23.40.211: icmp_seq=3 ttl=254 time=1.117 ms
64 bytes from 10.23.40.211: icmp_seq=4 ttl=254 time=1.003 ms
64 bytes from 10.23.40.211: icmp_seq=5 ttl=254 time=1.307 ms
64 bytes from 10.23.40.211: icmp_seq=5 ttl=254 time=1.316 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=6 ttl=254 time=1.210 ms
64 bytes from 10.23.40.211: icmp_seq=6 ttl=254 time=1.217 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=7 ttl=254 time=0.939 ms
64 bytes from 10.23.40.211: icmp_seq=7 ttl=254 time=0.944 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=8 ttl=254 time=1.172 ms
64 bytes from 10.23.40.211: icmp_seq=8 ttl=254 time=1.181 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=9 ttl=254 time=1.009 ms
64 bytes from 10.23.40.211: icmp_seq=9 ttl=254 time=1.016 ms (DUP!)
64 bytes from 10.23.40.211: icmp_seq=10 ttl=254 time=1.130 ms
64 bytes from 10.23.40.211: icmp_seq=11 ttl=254 time=1.250 ms
64 bytes from 10.23.40.211: icmp_seq=12 ttl=254 time=1.307 ms
64 bytes from 10.23.40.211: icmp_seq=13 ttl=254 time=0.945 ms
^C
--- boron.vlan40.acheron.be ping statistics ---
14 packets transmitted, 14 packets received, +5 duplicates, 0.0% packet loss
round-trip min/avg/max/stddev = 0.788/1.118/1.316/0.148 ms
[root@boron ~]# snoop -d i40e0 icmp
Using device i40e0 (promiscuous mode)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 5)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 5)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 5)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 5)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 6)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 6)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 6)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 6)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 7)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 7)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 7)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 7)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 8)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 8)
hyperon.vlan10.acheron.be -> boron        ICMP Echo request (ID: 4401 Sequence number: 8)
       boron -> hyperon.vlan10.acheron.be ICMP Echo reply (ID: 4401 Sequence number: 8)
Actions #13

Updated by Dan McDonald over 2 years ago

I've managed to obtain HW that reproduces this.

If I initiate pings from the X722 system while snooping, I only see duplicate packets on receipt, NOT on transmit.

What I think happens is that when you send a ping request from a remote system, it duplicates at receipt, and our ping handler dutifully replies to both.

Actions #14

Updated by Dan McDonald over 2 years ago

Some additional data:

1.) This only manifests for inbound UNICAST traffic. Both inbound multicast and broadcast do NOT generate duplicate inbound packets.

2.) This manifests regardless if the NIC is native, or has a VNIC layered on top of it.

Actions #15

Updated by Dan McDonald over 2 years ago

When two packets arrive, they are both generated by interrupts, but both traverse different paths through MAC. The "1st" is what a normal unicast-path packet looks like. The "2nd" is what the duplicate looks like when promiscuous is enabled on this particular i40e.

kebe(~/ws/ij-cr)[1]% diff -u /tmp/{1st,2nd}
--- /tmp/1st    2021-04-12 17:26:47.000000000 -0400
+++ /tmp/2nd    2021-04-12 17:26:28.000000000 -0400
@@ -12,7 +12,9 @@
               mac`mac_rx_srs_proto_fanout+0x334
               mac`mac_rx_srs_drain+0x3ad
               mac`mac_rx_srs_process+0x5c8
-              mac`mac_rx_common+0x291
+              mac`mac_rx_classify+0x7c
+              mac`mac_rx_flow+0x58
+              mac`mac_rx_common+0x21e
               mac`mac_rx+0xc6
               mac`mac_rx_ring+0x1f
               i40e`i40e_intr_rx_work+0x89
kebe(~/ws/ij-cr)[1]% 
Actions #16

Updated by Dan McDonald over 2 years ago

(Basically, I'm able to reproduce Ryan Z's analysis from earlier.)

Actions #17

Updated by Dan McDonald over 2 years ago

I've also reproduced Jorge's FreeBSD confirmation using their installer USB stick. I used FreeBSD's dtrace to confirm the lack of data interrupts. They're clearly doing something right that we're not (at attach() time?).

Actions #18

Updated by Dan McDonald over 2 years ago

An interesting wrinkle. I've managed (somehow?) to get broadcast packets to duplicate receipt on both rings EVEN IF THERE IS NO PROMISCUOUS MODE. I'm not sure how reproducible this is, as 3 days ago I could not get broadcast to get received on multiple rings. (This was after some FreeBSD use, but I can't believe the chipset wouldn't get fully reset after switching back and forth...)

Actions #19

Updated by Dan McDonald over 1 year ago

A quick update.

1.) Apparently the Linux driver enables promiscuous mode using SET_DEFAULT_VSI and disabling via UNSET_DEFAULT_VSI admin-queue primitives. A quick change in our code makes this problem disappear on X722-on-C622 motherboards, solving the reported problem.

2.) Discussions with Intel managed to get them to say this: (the '> ' prefix is a followup question from me)


PF (main VSI) usually has MacVlan filters with VLAN -1 (meaning it allows every VLAN), so it really doesn't need promisc mode. Default VSI receives all packets which are not directed to any particular VSI (there's no MacVlan filters with dst mac address like that). But there has to be VEB between the port and the VSI, as the default VSI is configured on the 'upper' switching element. It's possible that FreeBSD uses MacVlan filters in a different way and that's why it doesn't use the default VSI method.
In the Linux driver, setting up the promisc mode on PF could result in replication of multicast packets that are directed to particular interfaces (as in this scenario PF is already a default VSI and has promisc mode enabled on top of that).

>It's almost as if the C622 chipset implementation sets defaults differently at power-up than X710/XXV710XL710 does

Some (not many) registers may indeed have a different default value.

Our experience has been that "setting up the promisc mode on PF" for the X722-on-C622 part is that ALL packets are replicated. :)

The possible-fix mentioned in #1 will need to be regression-tested across a large number of i40e(4D) parts and with designed deployments or test-cases to insure BHYVE or vnic-only promiscuous does not accidentally trigger unwanted side-effects.

Actions #20

Updated by Dan McDonald over 1 year ago

I've found a machine with onboard X722 with both BaseT and SFP PHYs. I hope to try the stock i40e(4D) and the fixed one there soon. Oh the URL for the WIP webrev: https://kebe.com/~danmcd/webrevs/13230-newtry/

Actions #21

Updated by Dan McDonald 23 days ago

A promisc behavior test:

Confirming promisc behavior on i40e:
------------------------------------

NICs we need:

- Physical (e.g. i40e0)

- No-VLAN vnic (e.g. net0, a vnic over i40e0), use different netstack, but
  same IP network as Physical

- VLAN vnic (e.g. net1, a vnic with vlan!=0 over i40e0), use different
  netstack

- REMOTE --> non-i40e Physical where snoop is on full-time.  Filter to remote
  IPs of Physical and No-VLAN vnic and ICMP.

- REMOTE VLAN vnic --> non-i40e vlan != 0 vnic.

Tests - perform identically with both original (to document) and fixed i40e(4D)
-------------------------------------------------------------------------------

Actual bug -- only one that should differ between original and fixed

- Have remote send one ping to Physical with promisc on.  Should NOT get
  amplification.  (X722 BaseT with stock i40e will amplify)

- Repeat with No-VLAN vnic (confirm X722 problem)

- Repeat with VLAN vnic (confirm X722 problem)

Regression 1

- Snoop on physical sees No-VLAN vnic AND VLAN vnic.

Regression 2

- Snoop on No-VLAN vnic does not see packets for Physical or VLAN vnic.

Regression 3

- Snoop on VLAN vnic does not see packets for Physical or no-VLAN vnic.
- Make sure broadcast/multicast is appropriately picked up.

Regression 4

- Test a program that adds enough multicast groups to test the
  promisc-on-case logic.

Regression 5

- See how inbound multicast (non-all-nodes) traffic behaves before-and-after
  with variants of the above tests.

Regression 6

- Create enough VNICs to run out of hardware resources and force the NIC into
  promiscuous mode.  Check this via mdb -k:

  <i40e_t addr>::print -ath "i40e_t" !grep filt

- One i40e NIC has 0x188 (392) so it'll be a lot, but it might not be that
  many per se.

Actions

Also available in: Atom PDF