BNX regression with open source driver - AGGR doesn't work
Upon upgrading an OmniOS box from r151022 to r151030 I lost network access. The box (HP DL380G6) uses it's four BCM5709 NICs as a four port LACP aggr bundle to a Juniper EX4200. LACP is clean on both ends, all links show as up and sync'd. Outbound traffic from the box appears to go out fine as tested with dlsend/dlrecv, incoming only works if it happens to land on the 'right' interface in the bundle, usually the first if it's mac addy is the one Illumos chose to be the aggregate for the bundle. If you use snoop on the aggr all traffic is received cleanly and the aggr is usable while snoop is running. Interestingly, even when you use -P to prevent snoop from enabling promiscuous mode, it still does when looking via kstat bnx.
Further digging determined the issue crops up when making the jump from OmniOS r151028 to r151030, hence the assumption that the trigger is likely the replacement of the prior closed source bnx driver with an open source replacement. Robert M. and others have assisted with basic troubleshooting and the theory that most makes sense based on symptom set is that these particular bnx nics are not correctly adding the aggregate mac address to their input filter, either due to a driver bug, or hardware limitation that should instead have them flip to promiscuous mode. Using kstat bnx against the box when booted in r151028 promisc is not set on the bnx ints when in an aggr bundle, but I'm not sure that's an accurate representation of what's going on at the raw NIC level?
To replicate: Setup an aggr using two or more bnx devices to a single switch on a box using an Illumos instance new enough to have the open source bnx driver in place. VLANs / MTUs other than 1500 are not needed to observe the issue (and are not in use in my env).
I have a test box and switch to debug against, and may have access to a box with a pair of BCM5723 to test against too.