Bug #1923

kernel panic in ire_round_robin due to divide by zero

Added by John Sonnenschein over 2 years ago. Updated over 2 years ago.

Status:Resolved Start date:2011-12-22
Priority:Normal Due date:
Assignee:John Sonnenschein % Done:


Category:kernel Spent time: -
Target version:-
Difficulty:Medium Tags:


Updated by Robert Mustacchi over 2 years ago

Here's some additional analysis from Dave Pacheco:

It's not clear that the state is invalid when we crashed. Regarding the irb refcounts: one of the two threads operating on the irb is holding the lock and about to bump the refcount, but panicked on the divide-by-zero just before doing so:

816         rw_enter(&irb_ptr->irb_lock, RW_WRITER);
 817         maxwalk = irb_ptr->irb_ire_cnt; /* Excludes condemned */
 818         hash %= maxwalk;
 819         irb_refhold_locked(irb_ptr);
 820         rw_exit(&irb_ptr->irb_lock);

so I don't think that indicates a refcount bug.

Now, ire_round_robin normally skips CONDEMNED ire objects and also can return NULL. If that happens, the caller will just use the IRE it already has. In this case, we would have returned the IRE but failed the ire_clone_verify() check as Rob pointed out.

So it seems like ire_round_robin just forgot to handle the edge case where there are no non-CONDEMNED IREs. By the way, this was seen at Oracle and fixed in build 153: http://wesunsolve.net/bugid/id/6991060. The comments in the bug just say "If an ire_bucket has no live ire's (everything is condemned), then ire_round_robin could end up doing a div-by-zero." There's no further explanation in the public comments.


Summing up: as expected, in each of the four dumps we panicked at ire_round_robin+0x5e on the divide-by-zero trap. In each of these cases, the IRB and IRE state appears sane:

vmcore.1 from SRVJ-18: irb ffffff1b5dff4ab8 has refcnt = 1, ire_cnt = 0, nire = 1. There's 1 ire on the irb's list, and it's condemned (generation == 0).
vmcore.0 from SRVJ-18: irb ffffff245524b858 has refcnt = 0, ire_cnt = 0, nire = 2. The two IREs referencing the IRB are condemned (and can be seen with "::ire"), but they're not linked on the IRB's list. I don't know the code well enough to say whether that's a problem, but I'm guessing not. ire_inactive() decrements the refcount. ire_delete() removes it from the list. ire_delete() only calls ire_inactive() if the IRE refcount is 0.
vmcore.0 from SRVJ-04: irb ffffff19bcd94548 has refcnt = 3, ire_cnt = 0, nire = 2. There are two IREs on the IRB's list, and one other unlisted IRE that references the IRB.
vmcore.0 from SRVJ-46: irb ffffff23451a57c0 has refcnt = 1, ire_cnt = 0, nire = 2. There are two IREs on the IRB's list, both condemned.

Now, onto the refcounts. In each case, we would not expect to see a reference for the panic thread because it currently holds the IRB lock and is about to bump the refcount when it panics. (This addresses Bryan's earlier comment.) Looking at the others, we'd expect to see that the IRB refcount be at least as high as the number of threads in ire_round_robin() that would still have a reference:

vmdump.1 from SRVJ-18: Besides the panic thread, there's only one thread in ire_round_robin, and it's trying to take the lock in order to decrement the refcount. So we should see at least 1 reference and refcnt = 1. Check.
vmdump.0 from SRVJ-18: Besides the panic thread, there are two threads blocked trying to pick up the radix node head rwlock at ire_round_robin+0x33f. At this point, these threads have dropped their IRB references. Then there are two threads blocked on the IRB rwlock at the start of the function (ire_round_robin+0x48). So we don't expect to see any references, and we have refcnt == 0. Check.
vmdump.0 from SRVJ-04: There are 7 non-panic threads in ire_round_robin(), but 1 is being called with a different IRB than the panic thread. Four are at ire_round_robin+0x48 and so haven't taken a reference yet. One is waiting to pick up the radix node head rwlock and so has dropped its reference. One is waiting to pick up the IRB rwlock (via irb_refrele) and so should still have a reference. So we should expect to see at least 1 reference, and we have 3. Check.
vmdump.0 from SRVJ-46: Besides the panic thread, there are 6 threads waiting at ire_round_robin+0x48 and one waiting to pick up the IRB lock in irb_refrele. So we expect to see 1 reference, and we have refcnt = 1. Check.

I can't say with certainty that there's nothing else funny going on, but nothing we've seen so far suggests there's anything else wrong.

I mentioned the bug (6991060) to Dan McDonald, who agreed that this is likely an edge case that was not covered in testing.

We're testing this suggested fix right now:

$ diff -u ip_ftable.c.old ip_ftable.c.new
--- ip_ftable.c.old     2011-11-28 15:20:22.017722465 -0800
+++ ip_ftable.c.new     2011-11-28 15:23:11.686603914 -0800
@@ -815,6 +815,12 @@

       rw_enter(&irb_ptr->irb_lock, RW_WRITER);
       maxwalk = irb_ptr->irb_ire_cnt; /* Excludes condemned */
+       if (maxwalk == 0) {
+               rw_exit(&irb_ptr->irb_lock);
+               return (NULL);
+       }
       hash %= maxwalk;

Since we can't reproduce the problem, we're just sanity checking it. Based on analysis of the surrounding code, we believe this will fix the problem and have very high confidence that it won't make things worse.

Updated by Rich Lowe over 2 years ago

  • Project changed from site to illumos gate

Updated by Rich Lowe over 2 years ago

  • Category set to kernel
  • Status changed from New to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in r13560 61e2c882ccf8

Also available in: Atom PDF