Project

General

Profile

Actions

Bug #14447

open

routes coming back from the dead: routing daemons left dazed and confused

Added by Alex Wilson over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Today in illumos, when you have a route which is associated with an interface (specifically, there's an ill on the ire_t in the kernel), it gets a second copy stashed quietly on the ill_saved_ire_mp member of the ill_t (interface). If the interface goes down, all of these routes disappear from the routing table. If the interface later comes back up again, they all reappear, having been rehydrated from this backup copy on the ill_t.

Whenever you add a new route, ip_rt_add attempts to tag it with the relevant ill if it can, and there are other mechanisms which tag it later if it fails. As a result of this, on most systems, all routes have an ill. Which means all routes are subject to this un-death.

Routing daemons written against the BSD and Linux routing table APIs do not expect this behaviour: on other OS routes either are deleted when the underlying interface goes away (and never come back again from the dead; Linux does this), or they remain in the routing table but are marked as "down" and given the lowest possible priority (FreeBSD and OpenBSD do this).

In addition, as the code stands today on illumos, we never send route socket notifications about these routes coming and going, and there is no way to view or delete these "ghost" routes that belong to a currently-down interface. So, we have routes which silently disappear without any notification to anybody, and may reappear as zombies at any possible time in the future without any notification at all, and there's no way to see them in the interim except via mdb -k.

Even worse, they reappear as zombies some time after the routing socket notification about the interface status change, in a racey manner with no timing you can depend upon. So you can't just look for the interface status change and then poll the routing table -- you may or may not win the race.

This is a terrible situation for a routing daemon to deal with: the only way they can possibly try to ensure FIB/RIB consistency in the face of this is to periodically poll out the entire routing table and diff it to what their internal view shows, obtaining some kind of eventual consistency. This is not a "feature" which any other OS requires routing daemons to implement, and adds significantly to the litany of reasons why projects like FRR don't want to deal with illumos.

At the very minimum, we should be producing routing socket messages about these stashed routes entering and leaving the routing table, we should provide a mechanism to enumerate and delete stashed routes, and we should provide an opt-out flag for routing daemons so their routes are exempt from this behaviour.

In the long term, I think we should get rid of this behaviour altogether. My preference is for the BSD-style "keep and flag" approach.

Actions #1

Updated by Alex Wilson over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Alex Wilson over 1 year ago

  • Subject changed from routes coming back from the dead: routing daemons left startled and confused to routes coming back from the dead: routing daemons left dazed and confused
Actions #3

Updated by Dan McDonald over 1 year ago

So let's see this behavior in action. Consider this netstack with a non-default NIC with a couple of extra static routes thrown in:

[root@smartos-build ~]# ifconfig net1
net1: flags=40201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,L3PROTECT> mtu 8500 index 3
    inet 10.21.12.26 netmask ffffff00 broadcast 10.21.12.255
    ether 90:b8:d0:6b:15:59 
[root@smartos-build ~]# netstat -rnc | grep net1
10.21.12.0/24        10.21.12.26          U         2          0 net1      
10.51.50.0/24        10.21.12.18          UG        1          0 net1      
10.91.25.0/24        10.21.12.18          UG        1          0 net1      
[root@smartos-build ~]# 

Now let's see what `route -n monitor` says when I take net0 down with "ifconfig net1 down". I'm placing line breaks between the messages, route(1M) does not.

got message of size 336
RTM_IFINFO: iface status change: len 336, if# 3, flags:<BROADCAST,RUNNING,MULTICAST,IPv4>
sockaddrs: <IFP>
 net1:90.b8.d0.6b.15.59

got message of size 108
RTM_DELETE: Delete Route: len 108, pid: 0, seq 0, errno 0, flags:<DONE>
locks:  inits: 
sockaddrs: <DST,NETMASK>
 10.21.12.26 255.255.255.0

got message of size 320
RTM_DELADDR: address being brought down on iface: len 320, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
 255.255.255.0 net1:90.b8.d0.6b.15.59 10.21.12.26 default

got message of size 336
RTM_IFINFO: iface status change: len 336, if# 3, flags:<BROADCAST,RUNNING,MULTICAST,IPv4>
sockaddrs: <IFP>
 net1:90.b8.d0.6b.15.59

Hmm, apart from two RTM_IFINFO messages, the second of which might be redundant, we seem to have only indicators of the interface prefix route disappearing. But let's look at netstat again:

[root@smartos-build ~]# ifconfig net1
net1: flags=40201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS,L3PROTECT> mtu 8500 index 3
    inet 10.21.12.26 netmask ffffff00 broadcast 10.21.12.255
    ether 90:b8:d0:6b:15:59 
[root@smartos-build ~]# netstat -rnc | grep net1
[root@smartos-build ~]# 

So yes, all routing table information is not there. Now let's bring net1 back up, and see what `route -n monitor` says (with extra linebreaks inserted), and let's see the state of the world afterwards:

got message of size 336
RTM_IFINFO: iface status change: len 336, if# 3, flags:<BROADCAST,RUNNING,MULTICAST,IPv4>
sockaddrs: <IFP>
 net1:90.b8.d0.6b.15.59

got message of size 108
RTM_DELETE: Delete Route: len 108, pid: 0, seq 0, errno 0, flags:<DONE>
locks:  inits: 
sockaddrs: <DST,NETMASK>
 10.21.12.26 255.255.255.0

got message of size 320
RTM_DELADDR: address being brought down on iface: len 320, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
 255.255.255.0 net1:90.b8.d0.6b.15.59 10.21.12.26 default

got message of size 336
RTM_IFINFO: iface status change: len 336, if# 3, flags:<UP,BROADCAST,RUNNING,MULTICAST,IPv4>
sockaddrs: <IFP>
 net1:90.b8.d0.6b.15.59

got message of size 320
RTM_NEWADDR: address being brought up on iface: len 320, metric 0, flags:<UP>
sockaddrs: <NETMASK,IFP,IFA,BRD>
 255.255.255.0 net1:90.b8.d0.6b.15.59 10.21.12.26 default

got message of size 108
RTM_ADD: Add Route: len 108, pid: 0, seq 0, errno 0, flags:<UP,DONE>
locks:  inits: 
sockaddrs: <DST,NETMASK>
 10.21.12.26 255.255.255.0

[root@smartos-build ~]# ifconfig net1
net1: flags=40201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,L3PROTECT> mtu 8500 index 3
    inet 10.21.12.26 netmask ffffff00 broadcast 10.21.12.255
    ether 90:b8:d0:6b:15:59 
[root@smartos-build ~]# netstat -rnc | grep net1
10.21.12.0/24        10.21.12.26          U         2          0 net1      
10.51.50.0/24        10.21.12.18          UG        1          0 net1      
10.91.25.0/24        10.21.12.18          UG        1          0 net1      
[root@smartos-build ~]# 

The "up" seems to generate all of the messages one saw in the "down" (IFINFO, DELETE, DELADDR, IFINFO) but the differenec is that the second IFINFO is to indicate "up", AND THEN it also emits two more (NEWADDR, ADD) which represent net1's address/prefix and the on-link-prefix-route.

Actions #4

Updated by Dan McDonald over 1 year ago

So I would love to see what *BSD does in a similar situation from the POV of `route monitor` and ifconfig/netstat. Clearly what needs to happen is that "netstat -rnc" shouldn't show the routes disappeared, or reappeared.

I'm also curious about *BSD behavior when:

  • route add PREFIX/len <down-nic-neighbor>
  • route delete PREFIX/len <down-nic-neighbor>

occurs. Both will fail in illumos today because, well, the routes aren't there!

Actions #5

Updated by Alex Wilson over 1 year ago

OpenBSD 6.9:

obsd# netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
default            192.168.193.1      UGS        0        1     -     8 em0  
224/4              127.0.0.1          URS        0        0 32768     8 lo0  
127/8              127.0.0.1          UGRS       0        0 32768     8 lo0  
127.0.0.1          127.0.0.1          UHhl       1        2 32768     1 lo0  
192.168.193/24     192.168.193.131    UCn        1        0     -     4 em0  
192.168.193.1      00:50:56:c0:00:01  UHLch      2       74     -     3 em0  
192.168.193.131    00:0c:29:15:f4:0f  UHLl       0        6     -     1 em0  
192.168.193.255    192.168.193.131    UHb        0        2     -     1 em0  

obsd# route add 192.168.1.0/24 192.168.193.1
add net 192.168.1.0/24: gateway 192.168.193.1

obsd# netstat -rn  
Routing tables

Internet:
Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
default            192.168.193.1      UGS        0        1     -     8 em0  
224/4              127.0.0.1          URS        0        0 32768     8 lo0  
127/8              127.0.0.1          UGRS       0        0 32768     8 lo0  
127.0.0.1          127.0.0.1          UHhl       1        2 32768     1 lo0  
192.168.1/24       192.168.193.1      UGS        0        0     -     8 em0  
192.168.193/24     192.168.193.131    UCn        1        0     -     4 em0  
192.168.193.1      00:50:56:c0:00:01  UHLch      3       97     -     3 em0  
192.168.193.131    00:0c:29:15:f4:0f  UHLl       0        7     -     1 em0  
192.168.193.255    192.168.193.131    UHb        0        2     -     1 em0  

obsd# ifconfig em0 down

obsd# netstat -rn                                                               
Routing tables                                                                  

Internet:                                                                     
Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
default            192.168.193.1      GS         0        1     -     8 em0  
224/4              127.0.0.1          URS        0        0 32768     8 lo0  
127/8              127.0.0.1          UGRS       0        0 32768     8 lo0  
127.0.0.1          127.0.0.1          UHhl       1        2 32768     1 lo0  
192.168.1/24       192.168.193.1      GS         0        0     -     8 em0     
192.168.193/24     192.168.193.131    Cn         1        0     -     4 em0  
192.168.193.1      00:50:56:c0:00:01  HLch       3      131     -     3 em0     
192.168.193.131    00:0c:29:15:f4:0f  UHLl       0        8     -     1 em0  
192.168.193.255    192.168.193.131    Hb         0        2     -     1 em0

obsd# ifconfig em0 up

obsd# netstat -rn                                                               
Routing tables

Internet:
Destination        Gateway            Flags   Refs      Use   Mtu  Prio Iface
default            192.168.193.1      UGS        0        4     -     8 em0  
224/4              127.0.0.1          URS        0        0 32768     8 lo0  
127/8              127.0.0.1          UGRS       0        0 32768     8 lo0  
127.0.0.1          127.0.0.1          UHhl       1        2 32768     1 lo0  
192.168.1/24       192.168.193.1      UGS        0        0     -     8 em0  
192.168.193/24     192.168.193.131    UCn        1        0     -     4 em0  
192.168.193.1      00:50:56:c0:00:01  UHLch      3      446     -     3 em0  
192.168.193.131    00:0c:29:15:f4:0f  UHLl       0       11     -     1 em0  
192.168.193.255    192.168.193.131    UHb        0        2     -     1 em0  

obsd# route monitor                                                     
got message of size 200 on Wed Feb  2 08:44:08 2022
RTM_IFINFO: iface status change: len 200, if# 1, name em0, link: active, mtu: 1500, flags:<BROADCAST,SIMPLEX,MULTICAST>
sockaddrs: <IFP>
 00:0c:29:15:f4:0f

got message of size 200 on Wed Feb  2 08:45:22 2022
RTM_IFINFO: iface status change: len 200, if# 1, name em0, link: active, mtu: 1500, flags:<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST>
sockaddrs: <IFP>
 00:0c:29:15:f4:0f

As I explained above, the routes remain in the table and can be manipulated as per normal (deleted, changed etc): they're just flagged as "down". Since no routes enter or leave the table, there are no events other than the RTM_IFINFO on the route socket. "netstat" isn't showing it, but the high bit in the priority on each of those routes is also set while they're down, to make sure they're the last resort for their destinations (since attempting to use them will result in EHOSTUNREACH)

Actions #6

Updated by Alex Wilson over 1 year ago

FreeBSD 13.0-release:

root@freebsd:~ # netstat -rn

Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.193.1      UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.193.0/24   link#1             U           em0
192.168.193.134    link#1             UHS         lo0

root@freebsd:~ # route add 192.168.0.0/24 192.168.193.1
add net 192.168.1: gateway 192.168.193.1

root@freebsd:~ # netstat -rn

Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.193.1      UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.0.0/24     192.168.193.1      UGS         em0
192.168.193.0/24   link#1             U           em0
192.168.193.134    link#1             UHS         lo0

root@freebsd:~ # ifconfig em0 down

root@freebsd:~ # netstat -rn

Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.193.1      UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.0.0/24     192.168.193.1      UGS         em0
192.168.193.134    link#1             UHS         lo0

root@freebsd:~ # ifconfig em0 up

root@freebsd:~ # netstat -rn

Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            192.168.193.1      UGS         em0
127.0.0.1          link#2             UH          lo0
192.168.0.0/24     192.168.193.1      UGS         em0
192.168.193.0/24   link#1             U           em0
192.168.193.134    link#1             UHS         lo0

root@freebsd:~ # route monitor

got message of size 124 on Wed Feb  2 09:02:09 2022
RTM_DELADDR: address being removed from iface: len 124, metric 0, flags:<UP>
sockaddrs: <NETMASK,IFP,IFA,BRD>
 255.255.255.0 em0:0.c.29.b4.bc.d7 192.168.193.134 192.168.193.255

got message of size 200 on Wed Feb  2 09:02:09 2022
RTM_DELETE: Delete Route: len 200, pid: 0, seq 0, errno 0, flags:<DONE,PINNED>
locks:  inits: 
sockaddrs: <DST,GATEWAY,NETMASK>
 192.168.193.0 link#1 255.255.255.0

got message of size 168 on Wed Feb  2 09:02:09 2022
RTM_IFINFO: iface status change: len 168, if# 1, link: up, flags:<BROADCAST,b6,RUNNING,SIMPLEX,MULTICAST>

got message of size 124 on Wed Feb  2 09:02:16 2022
RTM_NEWADDR: address being added to iface: len 124, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
 255.255.255.0 em0:0.c.29.b4.bc.d7 192.168.193.134 192.168.193.255

got message of size 200 on Wed Feb  2 09:02:16 2022
RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits: 
sockaddrs: <DST,GATEWAY,NETMASK>
 192.168.193.0 link#1 255.255.255.0

got message of size 168 on Wed Feb  2 09:02:16 2022
RTM_IFINFO: iface status change: len 168, if# 1, link: up, flags:<UP,BROADCAST,b6,RUNNING,SIMPLEX,MULTICAST>
^C

On FreeBSD, the link-local-subnet route leaves and re-enters the table, and we get route socket events for that. Other routes which go via this interface remain in the table while the interface is down, with a flag, like OpenBSD.

Actions

Also available in: Atom PDF