Project

General

Profile

Actions

Bug #14762

open

fio and FDINFO need to understand each other

Added by Dan McDonald about 2 months ago. Updated 24 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While stress-testing #14685 I discovered another race, thanks to a DEBUG kernel:

> $C
fffffe0021566e10 vpanic()
fffffe0021566e60 0xfffffffffbdd0fc5()
fffffe0021566ed0 closeandsetf+0x5d0(3, 0)
fffffe0021566ef0 close+0x13(3)
fffffe0021566f00 sys_syscall+0x2ae()
> ::status
debugging crash dump vmcore.5 (64-bit) from bloody
operating system: 5.11 omnios-master-79595cfc28 (i86pc)
build version: gfx-drm - heads/master-0-g77f745e
heads/master-0-g79595cfc28

image uuid: b3dea7af-9f85-ce80-f254-c883abe25613
panic message: assertion failed: ufp->uf_refcnt == 0 || p->p_lwpcnt > 1, file: ../../common/os/fio.c, line: 710
dump content: kernel pages and pages from PID -1
> 

The assertion failed because the process only had one thread, BUT its refcount was beyond 0 thanks to FDINFO inspecting the process. FDINFO invalidates the assertion on line 710.

IF this was a non-DEBUG kernel, however, the close would likely loop waiting for the refcount to drop to 0. Because it uses cv_wait_stop(), however, it's likely the right thing would happen, even without the cv_signal being sent from FDINFO processing.

I propose the following two parts to fix this:

1.) Remove the ASSERT from closeandsetf(), with appropriate comment updates to explain things.

2.) Have FDINFO's decrement of uf_refcnt cv_broadcast() uf_closing_cv like the fio decrements do. This MIGHT be extraneous, but it should keep things moving a bit smoother.


Related issues

Related to illumos gate - Bug #14685: sotpi ops need to be wary of null v_streamRejectedDan McDonald

Actions
Related to illumos gate - Feature #12046: Provide /proc/<PID>/fdinfo/ClosedAndy Fiddaman

Actions
Actions

Also available in: Atom PDF