Project

General

Profile

Actions

Bug #2799

open

panic removing port from aggr

Added by Sebastien Roy about 9 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2012-05-29
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

1. Create a one-port aggregation using dladm # dladm create-aggr -l vmxnet3s0 aggr0
2. Add a flow over the aggregation using flowadm # flowadm add-flow -l aggr0 -a transport=icmp icmp
3. Add a second port to the aggregation # dladm add-aggr -l vmxnet3s1 aggr0
4. Remove the original port from the aggregation # dladm remove-aggr -l vmxnet3s0 aggr0
5. Send a packet that matches the flow attributes for the flow added in step 2 # ping something
<at this point, the system panics>

::status

debugging crash dump vmcore.3 (64-bit) from seb-dev1
operating system: 5.11 2012-05-23 (i86pc)
image uuid: 25afb505-32d5-48f3-d6a0-f65064a78b4b
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff0007d0e2e0 addr=30 occurred in module "aggr" due to a NULL pointer dereferenc
e
dump content: kernel pages only

$c

aggr_ring_tx+0x14(ffffff01cf71bf40, ffffff01d89ed500)
mac_hwring_tx+0x20(ffffff01d0bd5e48, ffffff01d89ed500)
mac_tx+0x2b1(ffffff01d09e9008, ffffff01d89ed500, 0, 0, 0)
proto_unitdata_req+0x1ca(ffffff01d0c32e48, ffffff01d0a8c0e0)
dld_wput+0x14d(ffffff01cf80e638, ffffff01d0a8c0e0)
putnext+0x21e(ffffff01d0c31df8, ffffff01d0a8c0e0)
ip_xmit+0x759(ffffff01df58de40, ffffff01d09998a0, 80400000, 54, 0, ffffffff)
...

The fault address (0x30) is 0 + offsetof (aggr_port_t, lp_mch):

::offsetof aggr_port_t lp_mch

offsetof (aggr_port_t, lp_mch) = 0x30, sizeof (...->lp_mch) = 8

Looking at the code:

mblk_t *
aggr_ring_tx(void *arg, mblk_t *mp) {
aggr_pseudo_tx_ring_t *pseudo_ring = (aggr_pseudo_tx_ring_t *)arg;
aggr_port_t *port = pseudo_ring->atr_port;

return (mac_hwring_send_priv(port->lp_mch, pseudo_ring->atr_hw_rh, mp));
}

This is consistent with the contents of our aggr_pseudo_tx_ring_t (the 1st argument to aggr_ring_tx()), which has a NULL atr_port field:

ffffff01cf71bf40::print aggr_pseudo_tx_ring_t

{
atr_rh = 0xffffff01d0bd5e48
atr_port = 0
atr_hw_rh = 0
atr_flags = 0
}

This entire pseudo ring looks hosed (it doesn't have the MAC_PSEUDO_RING_INUSE flag set). What ring is atr_rh pointing to?

0xffffff01d0bd5e48::print mac_ring_t mr_state mr_flag

mr_state = 0 (0)
mr_flag = 0

This ring has been freed (state 0 is MR_FREE). It's the mac_ring_t associated with the pseudo ring corresponding to the port that was removed from the aggregation.

The root-cause is that i_mac_group_rem_ring() does not iterate over subflows to remove the ring from any associated soft ring sets that may be using this ring. At the time the flow in the example is created, the soft ring set associated with the subflow (our icmp flow above) is in single-ring mode and has a direct pointer to the mac_ring_t corresponding to the 1st port in the aggregation (srs_tx.st_arg2 is a direct pointer to the ring). When the port is removed from the aggregation and the ring is freed, srs_tx.st_arg2 still points at that freed ring.

The fix is to have i_mac_group_rem_ring() iterate over subflows in order to remove the ring from associated soft ring sets.

No data to display

Actions

Also available in: Atom PDF