Bug #6264
seeming problems with 5850 tcp timestamping behavior changed mid-connection
0%
Description
As indicated in the omnios thread "strangeness ssh into omnios from oi_151a9"
seems the most potential culprit causing hangs between older gate (oi_151a9) and recent is
commit 1f183ba0b0be3e10202501aa3740753df6512804 Author: Lauri Tirkkonen <lotheac@iki.fi> AuthorDate: Wed Apr 15 16:30:46 2015 +0300 Commit: Robert Mustacchi <rm@joyent.com> CommitDate: Thu Jul 30 08:33:51 2015 -0700 5850 tcp timestamping behavior changed mid-connection
The snoop logs posted there seem the most pertinent.
Testing a build with this reverted gave satisfactory results.
This should probably be reverted until the issue is resolved...
Related issues
Updated by Dan McDonald over 5 years ago
Given that the original bug was filed due to middlebox behavior, perhaps you should check your middleboxes out as well.
Also, please bring over and summarize the OmniOS list thread, as it's best to keep reproduction details in the bug itself.
Updated by Dan McDonald over 5 years ago
- Related to Bug #5850: tcp timestamping behavior changed mid-connection added
Updated by Lauri Tirkkonen over 5 years ago
I'll summarize the issue as I understood it along with the analysis I came up
with based on the available data.
An illumos box pre-5850-fix (oi_151a9) opens a TCP connection to an illumos box
post-5850 (OmniOS bloody). Somewhere along the line, a middlebox injects, in
both directions, an empty ACK which reduces the window size to 1024 (neither
peer has sent such a segment to the other according to the packet captures).
These injected segments do not include timestamp options, so:
- the post-5850 box silently drops the segment
- the pre-5850 box decides that this connection suddenly should no longer be
timestamped and stops sending timestamps (this is the incorrect behavior
that 5850 fixes)
Since the pre-5850 box now omits all timestamp options from segments it sends,
the post-5850 box drops them silently upon reception.
This behavior is problematic, to be sure, but it's quite clear the middlebox is
at fault here (as well as the pre-5850 box which decides to stop generating
timestamps just because someone happened to inject an offending segment).
Updated by Lauri Tirkkonen over 5 years ago
- Status changed from New to Closed
Based on the mailing list discussion, disabling timestamp negotiation via the ndd options 'tcp_tstamp_always' and 'tcp_tstamp_if_wscale' makes the connection hang go away, which gives further confirmation that this indeed was the issue.
RFC 7323 doesn't forbid us from accepting non-timestamped segments, but I don't think we should change illumos behavior at this point; this issue only manifested in the presence of an ill-behaving middlebox combined with the old (broken!) behavior. Since we can accommodate the pre-5850 stack with a workaround, I'm closing this issue, but if any further cirumstances arise where dropping non-timestamped segments causes actual problems, maybe we should consider accepting them regardless of the RFC recommendation.