Project

General

Profile

Bug #1028

Network loses reachability on the loss of one network link when two nics share same subnet

Added by Brad Latus over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
2011-05-13
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Configuring two network cards with IP addresses on the same subnet and then removing a network cable from one of them causes both IP addresses to be unreachable.

bge0 = 192.168.0.1/24 (unplugged)
bge1 = 192.168.0.2/24

If we ping what should be reachable IP (192.168.0.2) from machine (192.168.0.15)
Snoop on bge1 shows
192.168.0.15 -> 192.168.0.1 ICMP Echo request (ID: 512 Sequence number: 47618)
Snoop on bge0 shows
192.168.0.1 -> 192.168.0.15 ICMP Echo reply (ID: 512 Sequence number: 47618)
This shows that ICMP replies are trying to go out the failed interface
Steps to Reproduce
  1. ifconfig bge0 plumb 192.168.0.1/24 up # ifconfig bge1 plumb 192.168.0.2/24 up
Check both addresses are ping-able from external host
-192.168.0.1 is alive
-192.168.0.2 is alive
Remove network link from bge0
Expected Result
Check both addresses are ping-able from external host
-192.168.0.1 unreachable
-192.168.0.2 is alive
Actual Result
Check both addresses are ping-able from external host
- 192.168.0.1 unreachable
- 192.168.0.2 unreachable


Related issues

Related to illumos gate - Feature #3446: Update bge to support missing 57xx/577xx devicesClosed2015-04-06

Actions

History

#1

Updated by Dan McDonald over 8 years ago

I tried reproducing this just now on OpenIndiana-latest with e1000g0 and iwp0 (both are on the same bridged LAN), but it's possible I need two wired ports (on the same physical switch?) to reproduce this problem.

One thing that would help diagnose is if you could please utter:

netstat -rna | grep 192.168.0.15

while both bges are up, while one is down, and then again while the other is down. Mine would rapidly change to be the one that's available. Also, to be fair, I was sending pings to both the iwp and e1000g addresses at the same time.

#2

Updated by Brad Latus over 8 years ago

I am this time pinging from '192.168.3.175'
The two addresses on the box are 192.168.2.22 and 192.168.2.44

From what I can see below, comms with 3.175 is bound to bge1, even when bge1 has lost link (does not attempt to go via bge0)

(both connected)
#netstat -rna | grep 192.168.3.175
192.168.3.175 192.168.3.44 UC 2 452 bge1

#netstat rna | grep 192.168.3.22
192.168.3.0 192.168.3.22 U 3 5 bge0
192.168.3.22 -
UHL 2 314 bge0
192.168.3.44 192.168.3.22 UC 2 5 bge0

#netstat rna | grep 192.168.3.44
192.168.3.0 192.168.3.44 U 3 646 bge1
192.168.3.44 -
UHL 2 314 bge1
192.168.3.44 192.168.3.22 UC 2 5 bge0
192.168.3.175 192.168.3.44 UC 2 646 bge1

(bge0 discon) - .22 has stopped responding .44 still ok
  1. netstat -rna | grep 192.168.3.175
    192.168.3.175 192.168.3.44 UC 2 922 bge1
  1. netstat rna | grep 192.168.3.22
    192.168.3.0 192.168.3.22 U 3 5 bge0
    192.168.3.22 -
    UHL 2 456 bge0
    192.168.3.44 192.168.3.22 UC 2 5 bge0
  2. netstat rna | grep 192.168.3.44
    192.168.3.0 192.168.3.44 U 3 928 bge1
    192.168.3.44 -
    UHL 2 472 bge1
    192.168.3.44 192.168.3.22 UC 2 5 bge0
    192.168.3.175 192.168.3.44 UC 2 928 bge1

(bge0 reconn)
netstat rna | grep 192.168.3.175
192.168.3.175 192.168.3.44 UC 2 1038 bge1
[root@wolf]:[14]:[~]:$ netstat -rna | grep 192.168.3.22
192.168.3.0 192.168.3.22 U 3 5 bge0
192.168.3.22 -
UHL 2 475 bge0
192.168.3.44 192.168.3.22 UC 2 5 bge0
[root@wolf]:[15]:[~]:$ netstat rna | grep 192.168.3.44
192.168.3.0 192.168.3.44 U 3 1046 bge1
192.168.3.44 -
UHL 2 569 bge1
192.168.3.44 192.168.3.22 UC 2 5 bge0
192.168.3.175 192.168.3.44 UC 2 1046 bge1

(bge1 disc) - (no ping response from .22 or .44)
  1. netstat -rna | grep 192.168.3.175
    192.168.3.175 192.168.3.44 UC 2 1098 bge1
  2. netstat rna | grep 192.168.3.44
    192.168.3.0 192.168.3.44 U 3 1099 bge1
    192.168.3.44 -
    UHL 2 593 bge1
    192.168.3.44 192.168.3.22 UC 2 5 bge0
    192.168.3.175 192.168.3.44 UC 2 1099 bge1
  3. netstat rna | grep 192.168.3.22
    192.168.3.0 192.168.3.22 U 3 5 bge0
    192.168.3.22 -
    UHL 2 506 bge0
    192.168.3.44 192.168.3.22 UC 2 5 bge0

Using 'svc:/network/physical:default' not NWAM (if that makes a difference)

#3

Updated by Dan McDonald over 8 years ago

  • Status changed from New to Closed

I made a mistake in my reproduction. I "downed" the interface, but didn't unplug the wire. If I do that (using "dladm disconnect-wifi" over iwp), I see your problem.

When using network/physical:default, there's no handler for network link events. ifconfig(1M) or ipadm(1M) won't change status when unplugging a cable, only dladm(1M) will show this. IP doesn't know about the unplugging. Under NWAM, the nwamd will note the unplug and then do the internal equivalent of "ifconfig bge0 down". From IP's point-of-view, an unplugged cable is no different from a switch two-hops away being down.

This is not a bug. Having a tinier-than-nwamd daemon that maps link-down events into IP interface down might be a good RFE, but the current behavior is not buggy.

Also available in: Atom PDF