ILB NAT has some serious problems with vmem arena
During ILB tests, it was found that ILB full NAT simply corrupts vmem arena causing random panic (more frequently observed on DEBUG kernel).
How to reproduce:
1. run DEBUG kernel
2. create full-NAT ILB rule
3. pass traffic via rule (~1-2 kpps is enough for triggering the panic)
4. catch the panic after some time
It is due to incorrect rounding of return code of vmem_alloc (file usr/src/uts/common/inet/ilb/ilb_nat.c line 385-386).
The call vmem_alloc(VM_NOSLEEP) could return -1, which isn't checked by current code.
When comparing (65535 is maximum in_port_t value):
(uintptr_t)-1 = 0xffffffffffffffff
(uintptr_t)65535 = 0xffff
But when rounding to in_port_t (as current code does):
(in_port_t)(uintptr_t)-1 = 65535
(in_port_t)(uintptr_t)65535 = 65535
Current code only check for 0.
IMO uintptr_t value should be verified before further conversion.
Dtrace runs shows ILB NAT' vmem_aloc indeed returns -1 sometimes:
dtrace -s ./vmem_alloc_print_t1.d
dtrace: script './vmem_alloc_print_t1.d' matched 3 probes
CPU ID FUNCTION:NAME
0 16206 vmem_alloc:return 2014 Apr 30 11:43:39 vmem_alloc+15c (ilb_ns_32,16c1) returns errno 2
ILB code calls vmem_free() on such unallocated areas later thus corrupting the stack.
Please note that current code shouldn't ever return 65535 as a valid allocation ('cause incorrect size argument to vmem_create(), see line 240, the constant NAT_PORT_SIZE == 65535 - 4096, rather should be 65535 - 4096 + 1, since 65535 is a legal ULP port value).
This should be corrected too.