Bug #7185
closedIP DCEs leak from halted non-global zones
100%
Description
Steps to reproduce:
1.) Boot a zone.
2.) Establish a connection over localhost (my first leak had 127.0.0.1, but this one seemed to be over v6).
3.) While connection to itself is established, halt the zone.
4.) "reboot -d" and the subsequent dump will have a leak of at least one DCE.
Stacks look like this:
dce_cache leak: 1 buffer, 152 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffff025a8da898 ffffff025a83ebb8 49f5eeb4e8 ffffff027d6b1440
ffffff025a8ae348 ffffff024bdab2c0 0
kmem_cache_alloc_debug+0x2e0
kmem_cache_alloc+0x320
dce_lookup_and_add_v4+0xe9
ip_set_destination_v4+0x392
ip_attr_connect+0x109
conn_connect+0x122
tcp_set_destination+0x70
tcp_connect_ipv4+0x11f
tcp_do_connect+0x505
tcp_connect+0xc9
so_connect+0xfe
socket_connect+0x3c
connect+0xb1
dce_cache leak: 1 buffer, 152 bytes
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffff025a8da7c0 ffffff025a83ec68 42da3c1fb9 ffffff027cc6c3c0
ffffff025a8ae348 ffffff024e7d0080 0
kmem_cache_alloc_debug+0x2e0
kmem_cache_alloc+0x320
dce_lookup_and_add_v6+0x171
ip_set_destination_v6+0x549
ip_attr_connect+0x172
conn_connect+0x122
tcp_set_destination+0x70
tcp_connect_ipv6+0x1a9
tcp_do_connect+0x542
tcp_connect+0xc9
so_connect+0xfe
socket_connect+0x3c
connect+0xb1
It's not clear yet if CLOSED connections also leak, but I think this is likely a bug in the netstack-teardown code that forgets to clean up ip_xmit_attr DCE holds.
Updated by Dan McDonald almost 6 years ago
Also happens with off-link connections initiated by the zone as well. Here's a leaked IPv4 DCE:
dce_u = {
dceu_v6addr = a08:3e5:0:0::
dceu_v4addr = 10.8.3.229
}
Updated by Dan McDonald almost 6 years ago
One possible fix is to have dce_stack_destroy() check for stragglers hanging off of it. It's defensive programming, and does not account for the CAUSE of the leak.
329 void 330 dce_stack_destroy(ip_stack_t *ipst) 331 { 332 int i; 333 for (i = 0; i < ipst->ips_dce_hashsize; i++) { /* XXX KEBE SAYS clean v4 & v6 hash buckets here. */ 334 rw_destroy(&ipst->ips_dce_hash_v4[i].dcb_lock); 335 rw_destroy(&ipst->ips_dce_hash_v6[i].dcb_lock); 336 } 337 kmem_free(ipst->ips_dce_hash_v4, 338 ipst->ips_dce_hashsize * sizeof (dcb_t)); 339 ipst->ips_dce_hash_v4 = NULL; 340 kmem_free(ipst->ips_dce_hash_v6, 341 ipst->ips_dce_hashsize * sizeof (dcb_t)); 342 ipst->ips_dce_hash_v6 = NULL; 343 ipst->ips_dce_hashsize = 0; 344 345 ASSERT(ipst->ips_dce_default->dce_refcnt == 1); 346 kmem_cache_free(dce_cache, ipst->ips_dce_default); 347 ipst->ips_dce_default = NULL; 348 }
Updated by Dan McDonald almost 6 years ago
Updated by Dan McDonald about 5 years ago
The DCE netstack teardown code assumes (incorrectly) that all DCEs have been unlinked from the hash buckets by interface (ill_t to be precise) teardowns. The dce_cleanup() function is only instantiated for IPv6 DCEs, not IPv4 ones. (In fact, the leaks will not show any IPv6 DCEs.)
Per the code in ip_dce.c, DCEs only get freed when a netstack responds to memory pressure, or when it detects a hash-bucket is too deep - thanks to checks-and-sets in dce_lookup_and_add_v46() functions.
Updated by Electric Monk about 5 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 4510c7eb59fd7173e2f25391b94e238c416d4f2e
commit 4510c7eb59fd7173e2f25391b94e238c416d4f2e Author: Dan McDonald <danmcd@omniti.com> Date: 2017-04-10T17:43:31.000Z 7185 IP DCEs leak from halted non-global zones Reviewed by: Jason King <jason.brian.king@gmail.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com>