igmp timout logic induces panic
We have two (at least) dumps in thoth in which igmp_slowtimo() has been invoked after the corresponding ip stack has been torn down. These are 3ffcfe07a4a96187 and 52ad2dea93adbc10.
A representative stack looks like:
ffffff01e94279a0 vpanic() ffffff01e94279d0 rw_panic+0x6f(fffffffffb92acfa, ffffff566d3c6e58) ffffff01e9427a40 rw_enter_sleep+0x358(ffffff566d3c6e58, 1) ffffff01e9427ab0 igmp_slowtimo+0x3b(ffffff566d3c6000) ffffff01e9427b00 callout_list_expire+0x98(ffffff42a9c913c0, ffffff43f02c7140) ffffff01e9427b30 callout_expire+0x3b(ffffff42a9c913c0) ffffff01e9427b60 callout_execute+0x20(ffffff42a9c913c0) ffffff01e9427c20 taskq_thread+0x2d0(ffffff430d4f6490) ffffff01e9427c30 thread_start+8()
It appears that the untimeout() in ip_stack_fini() is not up to the job here, because there is no synchronization with the callout itself, and that callout (igmp_slowtimo) can add another callout for itself at the end of its execution. It seems like the callout needs to check for the stack being torn down and not do so in that case.
Updated by Electric Monk about 6 years ago
- Status changed from New to Closed
commit f5db8fb084e8d3d9f551ce34defa3c80d56edebc Author: Robert Mustacchi <email@example.com> Date: 2015-05-15T23:36:42.000Z 5893 igmp timout logic induces panic Reviewed by: Jerry Jelinek <firstname.lastname@example.org> Reviewed by: Dan McDonald <email@example.com> Approved by: Richard Lowe <firstname.lastname@example.org>