Bug #14447
openroutes coming back from the dead: routing daemons left dazed and confused
0%
Description
Today in illumos, when you have a route which is associated with an interface (specifically, there's an ill
on the ire_t
in the kernel), it gets a second copy stashed quietly on the ill_saved_ire_mp
member of the ill_t
(interface). If the interface goes down, all of these routes disappear from the routing table. If the interface later comes back up again, they all reappear, having been rehydrated from this backup copy on the ill_t
.
Whenever you add a new route, ip_rt_add
attempts to tag it with the relevant ill
if it can, and there are other mechanisms which tag it later if it fails. As a result of this, on most systems, all routes have an ill
. Which means all routes are subject to this un-death.
Routing daemons written against the BSD and Linux routing table APIs do not expect this behaviour: on other OS routes either are deleted when the underlying interface goes away (and never come back again from the dead; Linux does this), or they remain in the routing table but are marked as "down" and given the lowest possible priority (FreeBSD and OpenBSD do this).
In addition, as the code stands today on illumos, we never send route socket notifications about these routes coming and going, and there is no way to view or delete these "ghost" routes that belong to a currently-down interface. So, we have routes which silently disappear without any notification to anybody, and may reappear as zombies at any possible time in the future without any notification at all, and there's no way to see them in the interim except via mdb -k
.
Even worse, they reappear as zombies some time after the routing socket notification about the interface status change, in a racey manner with no timing you can depend upon. So you can't just look for the interface status change and then poll the routing table -- you may or may not win the race.
This is a terrible situation for a routing daemon to deal with: the only way they can possibly try to ensure FIB/RIB consistency in the face of this is to periodically poll out the entire routing table and diff it to what their internal view shows, obtaining some kind of eventual consistency. This is not a "feature" which any other OS requires routing daemons to implement, and adds significantly to the litany of reasons why projects like FRR don't want to deal with illumos.
At the very minimum, we should be producing routing socket messages about these stashed routes entering and leaving the routing table, we should provide a mechanism to enumerate and delete stashed routes, and we should provide an opt-out flag for routing daemons so their routes are exempt from this behaviour.
In the long term, I think we should get rid of this behaviour altogether. My preference is for the BSD-style "keep and flag" approach.
Updated by Alex Wilson over 1 year ago
- Subject changed from routes coming back from the dead: routing daemons left startled and confused to routes coming back from the dead: routing daemons left dazed and confused
Updated by Dan McDonald over 1 year ago
So let's see this behavior in action. Consider this netstack with a non-default NIC with a couple of extra static routes thrown in:
[root@smartos-build ~]# ifconfig net1 net1: flags=40201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,L3PROTECT> mtu 8500 index 3 inet 10.21.12.26 netmask ffffff00 broadcast 10.21.12.255 ether 90:b8:d0:6b:15:59 [root@smartos-build ~]# netstat -rnc | grep net1 10.21.12.0/24 10.21.12.26 U 2 0 net1 10.51.50.0/24 10.21.12.18 UG 1 0 net1 10.91.25.0/24 10.21.12.18 UG 1 0 net1 [root@smartos-build ~]#
Now let's see what `route -n monitor` says when I take net0 down with "ifconfig net1 down". I'm placing line breaks between the messages, route(1M) does not.
got message of size 336 RTM_IFINFO: iface status change: len 336, if# 3, flags:<BROADCAST,RUNNING,MULTICAST,IPv4> sockaddrs: <IFP> net1:90.b8.d0.6b.15.59 got message of size 108 RTM_DELETE: Delete Route: len 108, pid: 0, seq 0, errno 0, flags:<DONE> locks: inits: sockaddrs: <DST,NETMASK> 10.21.12.26 255.255.255.0 got message of size 320 RTM_DELADDR: address being brought down on iface: len 320, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 net1:90.b8.d0.6b.15.59 10.21.12.26 default got message of size 336 RTM_IFINFO: iface status change: len 336, if# 3, flags:<BROADCAST,RUNNING,MULTICAST,IPv4> sockaddrs: <IFP> net1:90.b8.d0.6b.15.59
Hmm, apart from two RTM_IFINFO messages, the second of which might be redundant, we seem to have only indicators of the interface prefix route disappearing. But let's look at netstat again:
[root@smartos-build ~]# ifconfig net1 net1: flags=40201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS,L3PROTECT> mtu 8500 index 3 inet 10.21.12.26 netmask ffffff00 broadcast 10.21.12.255 ether 90:b8:d0:6b:15:59 [root@smartos-build ~]# netstat -rnc | grep net1 [root@smartos-build ~]#
So yes, all routing table information is not there. Now let's bring net1 back up, and see what `route -n monitor` says (with extra linebreaks inserted), and let's see the state of the world afterwards:
got message of size 336 RTM_IFINFO: iface status change: len 336, if# 3, flags:<BROADCAST,RUNNING,MULTICAST,IPv4> sockaddrs: <IFP> net1:90.b8.d0.6b.15.59 got message of size 108 RTM_DELETE: Delete Route: len 108, pid: 0, seq 0, errno 0, flags:<DONE> locks: inits: sockaddrs: <DST,NETMASK> 10.21.12.26 255.255.255.0 got message of size 320 RTM_DELADDR: address being brought down on iface: len 320, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 net1:90.b8.d0.6b.15.59 10.21.12.26 default got message of size 336 RTM_IFINFO: iface status change: len 336, if# 3, flags:<UP,BROADCAST,RUNNING,MULTICAST,IPv4> sockaddrs: <IFP> net1:90.b8.d0.6b.15.59 got message of size 320 RTM_NEWADDR: address being brought up on iface: len 320, metric 0, flags:<UP> sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 net1:90.b8.d0.6b.15.59 10.21.12.26 default got message of size 108 RTM_ADD: Add Route: len 108, pid: 0, seq 0, errno 0, flags:<UP,DONE> locks: inits: sockaddrs: <DST,NETMASK> 10.21.12.26 255.255.255.0
[root@smartos-build ~]# ifconfig net1 net1: flags=40201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,L3PROTECT> mtu 8500 index 3 inet 10.21.12.26 netmask ffffff00 broadcast 10.21.12.255 ether 90:b8:d0:6b:15:59 [root@smartos-build ~]# netstat -rnc | grep net1 10.21.12.0/24 10.21.12.26 U 2 0 net1 10.51.50.0/24 10.21.12.18 UG 1 0 net1 10.91.25.0/24 10.21.12.18 UG 1 0 net1 [root@smartos-build ~]#
The "up" seems to generate all of the messages one saw in the "down" (IFINFO, DELETE, DELADDR, IFINFO) but the differenec is that the second IFINFO is to indicate "up", AND THEN it also emits two more (NEWADDR, ADD) which represent net1's address/prefix and the on-link-prefix-route.
Updated by Dan McDonald over 1 year ago
So I would love to see what *BSD does in a similar situation from the POV of `route monitor` and ifconfig/netstat. Clearly what needs to happen is that "netstat -rnc" shouldn't show the routes disappeared, or reappeared.
I'm also curious about *BSD behavior when:
- route add PREFIX/len <down-nic-neighbor>
- route delete PREFIX/len <down-nic-neighbor>
occurs. Both will fail in illumos today because, well, the routes aren't there!
Updated by Alex Wilson over 1 year ago
OpenBSD 6.9:
obsd# netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Mtu Prio Iface default 192.168.193.1 UGS 0 1 - 8 em0 224/4 127.0.0.1 URS 0 0 32768 8 lo0 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 1 2 32768 1 lo0 192.168.193/24 192.168.193.131 UCn 1 0 - 4 em0 192.168.193.1 00:50:56:c0:00:01 UHLch 2 74 - 3 em0 192.168.193.131 00:0c:29:15:f4:0f UHLl 0 6 - 1 em0 192.168.193.255 192.168.193.131 UHb 0 2 - 1 em0 obsd# route add 192.168.1.0/24 192.168.193.1 add net 192.168.1.0/24: gateway 192.168.193.1 obsd# netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Mtu Prio Iface default 192.168.193.1 UGS 0 1 - 8 em0 224/4 127.0.0.1 URS 0 0 32768 8 lo0 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 1 2 32768 1 lo0 192.168.1/24 192.168.193.1 UGS 0 0 - 8 em0 192.168.193/24 192.168.193.131 UCn 1 0 - 4 em0 192.168.193.1 00:50:56:c0:00:01 UHLch 3 97 - 3 em0 192.168.193.131 00:0c:29:15:f4:0f UHLl 0 7 - 1 em0 192.168.193.255 192.168.193.131 UHb 0 2 - 1 em0 obsd# ifconfig em0 down obsd# netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Mtu Prio Iface default 192.168.193.1 GS 0 1 - 8 em0 224/4 127.0.0.1 URS 0 0 32768 8 lo0 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 1 2 32768 1 lo0 192.168.1/24 192.168.193.1 GS 0 0 - 8 em0 192.168.193/24 192.168.193.131 Cn 1 0 - 4 em0 192.168.193.1 00:50:56:c0:00:01 HLch 3 131 - 3 em0 192.168.193.131 00:0c:29:15:f4:0f UHLl 0 8 - 1 em0 192.168.193.255 192.168.193.131 Hb 0 2 - 1 em0 obsd# ifconfig em0 up obsd# netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Mtu Prio Iface default 192.168.193.1 UGS 0 4 - 8 em0 224/4 127.0.0.1 URS 0 0 32768 8 lo0 127/8 127.0.0.1 UGRS 0 0 32768 8 lo0 127.0.0.1 127.0.0.1 UHhl 1 2 32768 1 lo0 192.168.1/24 192.168.193.1 UGS 0 0 - 8 em0 192.168.193/24 192.168.193.131 UCn 1 0 - 4 em0 192.168.193.1 00:50:56:c0:00:01 UHLch 3 446 - 3 em0 192.168.193.131 00:0c:29:15:f4:0f UHLl 0 11 - 1 em0 192.168.193.255 192.168.193.131 UHb 0 2 - 1 em0 obsd# route monitor got message of size 200 on Wed Feb 2 08:44:08 2022 RTM_IFINFO: iface status change: len 200, if# 1, name em0, link: active, mtu: 1500, flags:<BROADCAST,SIMPLEX,MULTICAST> sockaddrs: <IFP> 00:0c:29:15:f4:0f got message of size 200 on Wed Feb 2 08:45:22 2022 RTM_IFINFO: iface status change: len 200, if# 1, name em0, link: active, mtu: 1500, flags:<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> sockaddrs: <IFP> 00:0c:29:15:f4:0f
As I explained above, the routes remain in the table and can be manipulated as per normal (deleted, changed etc): they're just flagged as "down". Since no routes enter or leave the table, there are no events other than the RTM_IFINFO on the route socket. "netstat" isn't showing it, but the high bit in the priority on each of those routes is also set while they're down, to make sure they're the last resort for their destinations (since attempting to use them will result in EHOSTUNREACH)
Updated by Alex Wilson over 1 year ago
FreeBSD 13.0-release:
root@freebsd:~ # netstat -rn Routing tables Internet: Destination Gateway Flags Netif Expire default 192.168.193.1 UGS em0 127.0.0.1 link#2 UH lo0 192.168.193.0/24 link#1 U em0 192.168.193.134 link#1 UHS lo0 root@freebsd:~ # route add 192.168.0.0/24 192.168.193.1 add net 192.168.1: gateway 192.168.193.1 root@freebsd:~ # netstat -rn Routing tables Internet: Destination Gateway Flags Netif Expire default 192.168.193.1 UGS em0 127.0.0.1 link#2 UH lo0 192.168.0.0/24 192.168.193.1 UGS em0 192.168.193.0/24 link#1 U em0 192.168.193.134 link#1 UHS lo0 root@freebsd:~ # ifconfig em0 down root@freebsd:~ # netstat -rn Routing tables Internet: Destination Gateway Flags Netif Expire default 192.168.193.1 UGS em0 127.0.0.1 link#2 UH lo0 192.168.0.0/24 192.168.193.1 UGS em0 192.168.193.134 link#1 UHS lo0 root@freebsd:~ # ifconfig em0 up root@freebsd:~ # netstat -rn Routing tables Internet: Destination Gateway Flags Netif Expire default 192.168.193.1 UGS em0 127.0.0.1 link#2 UH lo0 192.168.0.0/24 192.168.193.1 UGS em0 192.168.193.0/24 link#1 U em0 192.168.193.134 link#1 UHS lo0 root@freebsd:~ # route monitor got message of size 124 on Wed Feb 2 09:02:09 2022 RTM_DELADDR: address being removed from iface: len 124, metric 0, flags:<UP> sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 em0:0.c.29.b4.bc.d7 192.168.193.134 192.168.193.255 got message of size 200 on Wed Feb 2 09:02:09 2022 RTM_DELETE: Delete Route: len 200, pid: 0, seq 0, errno 0, flags:<DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 192.168.193.0 link#1 255.255.255.0 got message of size 168 on Wed Feb 2 09:02:09 2022 RTM_IFINFO: iface status change: len 168, if# 1, link: up, flags:<BROADCAST,b6,RUNNING,SIMPLEX,MULTICAST> got message of size 124 on Wed Feb 2 09:02:16 2022 RTM_NEWADDR: address being added to iface: len 124, metric 0, flags: sockaddrs: <NETMASK,IFP,IFA,BRD> 255.255.255.0 em0:0.c.29.b4.bc.d7 192.168.193.134 192.168.193.255 got message of size 200 on Wed Feb 2 09:02:16 2022 RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED> locks: inits: sockaddrs: <DST,GATEWAY,NETMASK> 192.168.193.0 link#1 255.255.255.0 got message of size 168 on Wed Feb 2 09:02:16 2022 RTM_IFINFO: iface status change: len 168, if# 1, link: up, flags:<UP,BROADCAST,b6,RUNNING,SIMPLEX,MULTICAST> ^C
On FreeBSD, the link-local-subnet route leaves and re-enters the table, and we get route socket events for that. Other routes which go via this interface remain in the table while the interface is down, with a flag, like OpenBSD.