Feature #12476
closedcyclic reprogramming can race with removal
100%
Description
Upstreaming OS-6684 for bhyve:
While running vmadm start / stop -f in a tight loop, I hit the following:
> ::status debugging crash dump vmcore.22 (64-bit) from dent operating system: 5.11 joyent_20180216T202957Z (i86pc) image uuid: (not set) panic message: attempt to reprogram non-existent cyclic dump content: kernel pages only > $C ffffff000f8328d0 vpanic() ffffff000f832950 cyclic_reprogram_cyclic+0xb0(ffffff03eb144000, b, 2cc15105498) ffffff000f832990 cyclic_reprogram+0x1d8(ffffff0437fa49d0, 2cc15105498) ffffff000f832a00 vmm_glue_callout_reset_sbt+0x8b(ffffff03e8e0d788, c038ccb3532, e0ef1f1, fffffffff7b1fbb0, ffffff03e8e0d7b0, 200) ffffff000f832a40 pit_timer_start_cntr0+0xab(ffffff03e8e0d740) ffffff000f832a80 vatpit_callout_handler+0x98(ffffff03e8e0d7b0) ffffff000f832aa0 vmm_glue_callout_handler+0x2f(ffffff03e8e0d788) ffffff000f832b90 cyclic_softint+0x209(ffffff03eb0c5580, 0) ffffff000f832ba0 cbe_low_level+0x14() ffffff000f832bf0 av_dispatch_softvect+0x88(2) ffffff000f832c20 apix_dispatch_softint+0x35(0, 0) ffffff000f9d4a10 switch_sp_and_call+0x13() ffffff000f9d4a60 apix_do_softint+0x44(ffffff000f9d4ae0) ffffff000f9d4ad0 apix_do_interrupt+0x480(ffffff000f9d4ae0, ffffff03eb0daea0) ffffff000f9d4ae0 _interrupt+0x1e9() ffffff000f9d4bd0 mach_cpu_idle+6() ffffff000f9d4c00 cpu_idle+0x11a() ffffff000f9d4c20 idle+0xa7() ffffff000f9d4c30 thread_start+8()
After looking at the vhpet-related dump, and checking back with @john.levon's original, I've come up with a hypothesis regarding the issue.
In both cases, the panic occurs during an attempted timer reprogramming as part of virtual device's running callout at the same time that a reboot is occurring.
fffffb0410bc9980 cyclic_reprogram_cyclic+0x97(fffffdc5496f0d40, c3, 370a98496f2f04) fffffb0410bc99b0 cyclic_reprogram+0x108(fffffdce41a295c0, 370a98496f2f04) fffffb0410bc9a00 vmm_glue_callout_reset_sbt+0x8b(fffffdc859b8ac90, ec66a4e704be61, 3ff6900, fffffffff8494aa0, fffffdc859b8acb8, 200) fffffb0410bc9a60 vhpet_start_timer+0x99(fffffdc859b8ac40, 0, 3e391951, ec66a4e3055561) fffffb0410bc9ab0 vhpet_handler+0x76(fffffdc859b8acb8) fffffb0410bc9ad0 vmm_glue_callout_handler+0x2f(fffffdc859b8ac90) fffffb0410bc9b70 cyclic_softint+0xfd(fffffdc544958000, 0) fffffb0410bc9b80 cbe_low_level+0x14() fffffb0410bc9bd0 av_dispatch_softvect+0x78(2) fffffb0410bc9c00 apix_dispatch_softint+0x35(0, 0) fffffb0410b789f0 switch_sp_and_call+0x13() fffffb0410b78a40 apix_do_softint+0x6c(fffffb0410b78ab0) fffffb0410b78aa0 apix_do_interrupt+0x382(fffffb0410b78ab0, 0) fffffb0410b78ab0 _interrupt+0xba() fffffb0410b78ba0 i86_mwait+0x14() fffffb0410b78be0 cpu_idle_mwait+0x109() fffffb0410b78c00 idle+0xa7() fffffb0410b78c10 thread_start+8()
We can see the re-init (read: reboot) thread in the process of removing that cyclic in another thread:
stack pointer for thread fffffe55d37b8080 (bhyve/1 [vcpu 0]): fffffb04249a9560 [ fffffb04249a9560 _resume_from_idle+0x12b() ] fffffb04249a9590 swtch+0x141() fffffb04249a95d0 sema_p+0x1d7(fffffdc5496f0de0) fffffb04249a9650 cyclic_remove_here+0xac(fffffdc5496f0d40, c3, 0, 0) fffffb04249a9690 cyclic_remove+0x38(fffffdce41a295c0) fffffb04249a96c0 vmm_glue_callout_drain+0x27(fffffdc859b8ac90) fffffb04249a96f0 vhpet_cleanup+0x40(fffffdc859b8ac40) fffffb04249a9730 vm_cleanup+0x66(fffffdca49aad000, 0) fffffb04249a9760 vm_reinit+0x38(fffffdca49aad000) fffffb04249a9c20 vmmdev_do_ioctl+0xc43(fffffdcba98b4dc0, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8) fffffb04249a9cc0 vmm_ioctl+0x12f(13100000017, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8) fffffb04249a9d00 cdev_ioctl+0x39(13100000017, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8) fffffb04249a9d50 spec_ioctl+0x60(fffffdcb12aa0a00, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8, 0) fffffb04249a9de0 fop_ioctl+0x55(fffffdcb12aa0a00, 20007605, 0, 202003, fffffdcb461ef0e0, fffffb04249a9ea8, 0) fffffb04249a9f00 ioctl+0x9b(3, 20007605, 0) fffffb04249a9f10 sys_syscall+0x19f()
Now cyclic_remove will wait for the pending handler to complete before it returns to the caller, as evidenced by the blocking sema_p call. The fact that the handler is trying to reprogram that very cyclic is where it gets tricky. I've theorized a set of events that leads to this condition:
1. The cyclic for vhpet_handler triggers, resulting in an interrupt on that CPU (at CBE_LOW_PIL of 2)
2. That handler executes some logic, but has not yet reached the point at which it reprograms the cyclic
3. On a different CPU, a thread calls vm_reinit, which as part of its cleanup, goes to remove the cyclic
4. The cyclic_remove call results in a cyclic xcall to the CPU currently executing the handler, at CBE_HIGH_PIL, effectively preempting it
5. In cyclic_remove_xcall, the cyclic is removed from the cyc_cpu_t heap and marked CYF_FREE.
6. Because the handler is still running in that moment (cy_pend != 0), the cyclic_remove_here call is forced to wait for its completion, via sema_p.
7. With the CBE_HIGH_PIL interrupt handling complete, the cyclic handler continues, and as part of its duty, attempts to reprogram the now-removed cyclic.
8. vpanic ensuesIt seems that the cyclic_reprogram logic needs to be updated to detect when a cyclic has been removed out from under the nose of its running handler, and report the failure back up to its caller, rather than panic the box.
Updated by Patrick Mooney over 3 years ago
After merging this into SmartOS, we ceased seeing those cyclic_reprogram()
panics. Absent the existence of bhyve (and a sufficient workload), it's difficult to reproduce those conditions for testing.
Updated by Electric Monk over 3 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit cb2d1b02582006a823e154df6807c505b73b1cb6
commit cb2d1b02582006a823e154df6807c505b73b1cb6 Author: Patrick Mooney <pmooney@pfmooney.com> Date: 2020-04-25T02:21:17.000Z 12476 cyclic reprogramming can race with removal Reviewed by: John Levon <john.levon@joyent.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Yuri Pankov <ypankov@fastmail.com> Approved by: Dan McDonald <danmcd@joyent.com>