Missing tcp/udp checksum with hardware offloading and tagged vlan
Spend most of my weekend debugging this. Opening an issue just for future people stumbling on the same problem.
Not sure where is the best place to fix it, but this is more a FYI and a please fix.
- 4 link aggr between switch and host
- (using LACP but should be irrelevant)
- 4 tagged vlan's going over said aggr
host is running smartos, a kvm vm is running and has 4 nics (in each vlan respectively)
traffic from else where to the vm and back flow fine. (ip packets have checksums, etc...)
the gz has a vnic hanging of the same aggr, traffic from else where to the gz and back flow fine. (ip packets have checksums, etc...)
traffic from the gz to the vm does not flow for tcp/udp, arp and icmp are ok.
the vm was discarding the traffic due to the tcp or udp checksum being blank.
This is because traffic flow goes like this in that case
vnic_gz -> aggr -> vnic_vm and never has a checksum added.
My vm is running OpenBSD, other operating systems may be less picky.
Updated by Robert Mustacchi about 5 years ago
The issue here is that the vnd driver which was driving this is taking the outgoing traffic via a promiscuous callback and thus it wasn't passing through the normal mac_tx loopback paths which would have caused it to be properly checksummed for loopback. The solution is probably to ensure that all the traffic destined for the mac in vnd doesn't rely on the promisc callbacks and to potentially simplify the code and have it simply consume things from mac directly rather than via dls.