Project

General

Profile

Actions

Bug #13118

closed

Very slow reaping, possible deadlock in zfs_delmap

Added by Alex Wilson 8 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

See original Joyent bug: https://smartos.org/bugview/OS-732

The missing sa_bulk_update call was fixed up in illumos in 2017 (#5379, commit 80e10fd), but the call to VOP_PUTPAGE inside zfs_delmap is still around.

This causes extreme slow-downs on systems reaping many processes with shared mappings of files on ZFS -- which we notice often with thousands of Samba smbd children calling exit() at once -- the system can process about 1-2 of them exiting per <i>second</i>, meaning that thousands of them take hours to shut down. We have had as many as 14,000 to 18,000 smbd processes on several of our fileservers, which takes 4-5 hours to shut down with that VOP_PUTPAGE there. Without the VOP_PUTPAGE it takes minutes.

According to the original analysis in OS-732 by Bryan Cantrill, it seems that this call can also be the source of a deadlock. I haven't personally observed that, so I'm not sure if that's still the case (other things may have changed in ZFS since then).

Quoting from the original bug:

in discussion with the engineer who originally did this work, the consensus is that these semantics are simply too expensive to implement; the fix here is to restore the semantics prior to their change to accommodate the scenario mentioned in the zfs_delmap() comment, above.


Related issues

Related to illumos gate - Bug #5379: modifying a mmap()-ed file does not update its timestampsClosed2014-12-02

Actions
Actions

Also available in: Atom PDF