Bug #6011
openpotential ZFS deadlock when ZFS writes take a page fault
0%
Description
An OmniOS user has a system which appeared to wedge solidly when attempting to use procfs.
Investigation finds that an application of theirs is in munmap(), holding its as lock for write, and waiting for a ZFS txg to open for I/O. Approximately 60 other threads are waiting on this lock, primarily as readers. including others doing ZFS I/O via zfs_read/zfs_write.
The txg quiesce thread is waiting on tc_cv, and would be woken by the zfs_write I/O threads.
We have a classic deadlock, the thread with the as lock for WRITE waiting on txg open (tc_cv), and the threads that'd signal the cv waiting on the as lock.
It appears that triggering this requires memory pressure, such that we have the right combination of page faults. This seems to not occur absent memory pressure.
A dump may be available from the user (esproul) upon request, but cannot be shared by me.
Updated by Rich Lowe about 7 years ago
This is OmniOS 151006, revision b281e50 in the https://github.com/omniti-labs/illumos-omnios repo, so somewhat old. But I see no indication of later fixes to this issue.
Robert pointed out #4161, which sounds similar but I think is different.