Bug #14762
openfio and FDINFO need to understand each other
0%
Description
While stress-testing #14685 I discovered another race, thanks to a DEBUG kernel:
> $C fffffe0021566e10 vpanic() fffffe0021566e60 0xfffffffffbdd0fc5() fffffe0021566ed0 closeandsetf+0x5d0(3, 0) fffffe0021566ef0 close+0x13(3) fffffe0021566f00 sys_syscall+0x2ae() > ::status debugging crash dump vmcore.5 (64-bit) from bloody operating system: 5.11 omnios-master-79595cfc28 (i86pc) build version: gfx-drm - heads/master-0-g77f745e heads/master-0-g79595cfc28 image uuid: b3dea7af-9f85-ce80-f254-c883abe25613 panic message: assertion failed: ufp->uf_refcnt == 0 || p->p_lwpcnt > 1, file: ../../common/os/fio.c, line: 710 dump content: kernel pages and pages from PID -1 >
The assertion failed because the process only had one thread, BUT its refcount was beyond 0 thanks to FDINFO inspecting the process. FDINFO invalidates the assertion on line 710.
IF this was a non-DEBUG kernel, however, the close would likely loop waiting for the refcount to drop to 0. Because it uses cv_wait_stop(), however, it's likely the right thing would happen, even without the cv_signal being sent from FDINFO processing.
I propose the following two parts to fix this:
1.) Remove the ASSERT from closeandsetf(), with appropriate comment updates to explain things.
2.) Have FDINFO's decrement of uf_refcnt cv_broadcast() uf_closing_cv like the fio decrements do. This MIGHT be extraneous, but it should keep things moving a bit smoother.
Related issues
Updated by Dan McDonald about 2 months ago
- Related to Bug #14685: sotpi ops need to be wary of null v_stream added
Updated by Dan McDonald about 2 months ago
- Related to Feature #12046: Provide /proc/<PID>/fdinfo/ added
Updated by Dan McDonald about 1 month ago
This bug may be closed as not-a-bug given #14788 fixes the real problem far more cleanly.