Project

General

Profile

Bug #10211

UFS freebehind can induce hat_pageunload()/htable_purge_hat() race

Added by Robert Mustacchi 9 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
2019-01-10
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

We died in a stack trace that might look familiar:


> ::status
debugging crash dump /manta/thoth/stor/thoth/5718a7031b2a2fc301ba34e6d82ac5f2/vmcore.0 (64-bit) from 9R1WMS1
operating system: 5.11 joyent_20131105T084235Z (i86pc)
image uuid: (not set)
panic message: hment_remove() missing in hash table pp=ffffff0078c0f370, ht=ffffff2a8f7d1b38,entry=0x26 hash index=0x2ad1
dump content: kernel pages only
> $c
vpanic()
0xfffffffffb822e52()
hati_page_unmap+0xc8(ffffff0078c0f370, ffffff2a8f7d1b38, 26)
hati_pageunload+0x12b(ffffff0078c0f370, 0, 1)
hat_pageunload+0x19(ffffff0078c0f370, 1)
pvn_getdirty+0x14d(ffffff0078c0f370, 8400)
ufs_putpages+0x22b(ffffff19ab48db80, 76000, 1000, 8400, ffffffdc81a4e150)
ufs_putpage+0x136(ffffff19ab48db80, 78000, 1000, 8400, ffffffdc81a4e150, 0)
fop_putpage+0x4c(ffffff19ab48db80, 78000, 1000, 8400, ffffffdc81a4e150, 0)
vpm_sync_pages+0xb0(ffffff19ab48db80, 78000, 1000, 6)
rdip+0x38f(ffffff19ab48f178, ffffff00c09d0df0, 0, ffffffdc81a4e150)
ufs_read+0xbf(ffffff19ab48db80, ffffff00c09d0df0, 0, ffffffdc81a4e150, 0)
fop_read+0x8b(ffffff19ab48db80, ffffff00c09d0df0, 0, ffffffdc81a4e150, 0)
lo_read+0x29(ffffff1d27051400, ffffff00c09d0df0, 0, ffffffdc81a4e150, 0)
fop_read+0x8b(ffffff1d27051400, ffffff00c09d0df0, 0, ffffffdc81a4e150, 0)
read+0x2a7(16, 80764c0, 20000)
read32+0x1e(16, 80764c0, 20000)
_sys_sysenter_post_swapgs+0x149()

This stack trace shows us another code path that can induce this:
the freebehind code path in rdip(), which can invalidate a mapping that
it feels is the result of a sequential read.
This code is absolutely ancient (it was integrated over twenty years
ago with the comment "the pager is busted"), and its logic is entirely
busted: it looks at the i_nextr field of the inode to determine the
sequentiality of a read, but that field is filled in in ufs_getpage()
in many conditions that don't result in actual I/O (i.e., cases in which
the page is in the page cache). As a result, it will "free behind"
pages that weren't actually read off the disk – and of course, this
is only UFS, which is RAM-backed for us anyway.

So what's going on on this machine to induce this? First, we're doing a
(putatively) sequential read because of a recursive scp:

*panic_thread::print kthread_t t_procp->p_user.u_psargs

t_procp->p_user.u_psargs = [ "scp r -f - /" ]

This explains why we're reading libc.so.1 (and to the credit of the flawed
mechanism for detecting sequential reads, we are reading it sequentially).
In order for this to be OS-580, we would expect to be invalidating a mapping
in a hat that was recently the victim of a relvm(). Taking the htable
argument to hati_page_unmap():

ffffff2a8f7d1b38::print htable_t ht_hat->hat_as->a_proc->p_user.u_start

ht_hat->hat_as->a_proc->p_user.u_start = {
ht_hat->hat_as->a_proc->p_user.u_start.tv_sec = 2014 Aug 29 05:45:12
ht_hat->hat_as->a_proc->p_user.u_start.tv_nsec = 0x7840669
}

panic_hrestime::print

{
tv_sec = 2014 Aug 29 05:45:12
tv_nsec = 0x81fbe8c
}

0x81fbe8c-0x7840669=E

10205219

Translation: the process is 10.2 milliseconds old, and likely exec()'d
much more recently than that. This is therefore almost certainly another
manifestation of OS-580. We either need to address that fundamental race
(possible, presumably, but painful to implement and verify) or make this
particular manifestation impossible (or at least much less likely) by
ripping out the janky UFS freebehind code (or by setting freebehind to 0).

There are several ways to fix this (including fixing the i_nextr update to be more accurate, which in turn reduces sporadic free-behind behavior), but given that free-behind is highly dubious utility and of ancient origin, the safest fix seems to be the one the originally suggested: removing UFS free-behind entirely.

History

#1

Updated by Electric Monk 9 months ago

  • Status changed from New to Closed

git commit 9c65d7db6997bfa2730f9cdf6167f3982c9d2db6

commit  9c65d7db6997bfa2730f9cdf6167f3982c9d2db6
Author: Bryan Cantrill <bryan@joyent.com>
Date:   2019-01-17T22:07:04.000Z

    10211 UFS freebehind can induce hat_pageunload()/htable_purge_hat() race
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: John Levon <john.levon@joyent.com>

Also available in: Atom PDF