Project

General

Profile

Bug #6264

seeming problems with 5850 tcp timestamping behavior changed mid-connection

Added by Richard PALO about 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
networking
Start date:
2015-09-27
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

As indicated in the omnios thread "strangeness ssh into omnios from oi_151a9"
seems the most potential culprit causing hangs between older gate (oi_151a9) and recent is

commit 1f183ba0b0be3e10202501aa3740753df6512804
Author:     Lauri Tirkkonen <lotheac@iki.fi>
AuthorDate: Wed Apr 15 16:30:46 2015 +0300
Commit:     Robert Mustacchi <rm@joyent.com>
CommitDate: Thu Jul 30 08:33:51 2015 -0700

    5850 tcp timestamping behavior changed mid-connection

The snoop logs posted there seem the most pertinent.

Testing a build with this reverted gave satisfactory results.
This should probably be reverted until the issue is resolved...


Related issues

Related to illumos gate - Bug #5850: tcp timestamping behavior changed mid-connectionClosed2015-04-17

Actions

History

#1

Updated by Dan McDonald about 4 years ago

Given that the original bug was filed due to middlebox behavior, perhaps you should check your middleboxes out as well.

Also, please bring over and summarize the OmniOS list thread, as it's best to keep reproduction details in the bug itself.

#2

Updated by Dan McDonald about 4 years ago

  • Related to Bug #5850: tcp timestamping behavior changed mid-connection added
#3

Updated by Lauri Tirkkonen about 4 years ago

I'll summarize the issue as I understood it along with the analysis I came up
with based on the available data.

An illumos box pre-5850-fix (oi_151a9) opens a TCP connection to an illumos box
post-5850 (OmniOS bloody). Somewhere along the line, a middlebox injects, in
both directions, an empty ACK which reduces the window size to 1024 (neither
peer has sent such a segment to the other according to the packet captures).
These injected segments do not include timestamp options, so:
- the post-5850 box silently drops the segment
- the pre-5850 box decides that this connection suddenly should no longer be
timestamped and stops sending timestamps (this is the incorrect behavior
that 5850 fixes)
Since the pre-5850 box now omits all timestamp options from segments it sends,
the post-5850 box drops them silently upon reception.

This behavior is problematic, to be sure, but it's quite clear the middlebox is
at fault here (as well as the pre-5850 box which decides to stop generating
timestamps just because someone happened to inject an offending segment).

#4

Updated by Lauri Tirkkonen about 4 years ago

  • Status changed from New to Closed

Based on the mailing list discussion, disabling timestamp negotiation via the ndd options 'tcp_tstamp_always' and 'tcp_tstamp_if_wscale' makes the connection hang go away, which gives further confirmation that this indeed was the issue.

RFC 7323 doesn't forbid us from accepting non-timestamped segments, but I don't think we should change illumos behavior at this point; this issue only manifested in the presence of an ill-behaving middlebox combined with the old (broken!) behavior. Since we can accommodate the pre-5850 stack with a workaround, I'm closing this issue, but if any further cirumstances arise where dropping non-timestamped segments causes actual problems, maybe we should consider accepting them regardless of the RFC recommendation.

Also available in: Atom PDF