Project

General

Profile

Bug #7119

boot should be more resilient to physical path to bootfs changing

Added by Rich Lowe about 3 years ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2016-06-18
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:

Description

<trisk> richlowe, llua: yeah, GRUB extracts the device path from the pool
        label and passes in through $ZFS-BOOTFS, the kernel only tries to use
        this device to import

This means that while ZFS itself tries to be particularly resilient to devices moving around, boot is utterly unresilient, and requires booting from other media and re-importing the pool should the physical path to the boot device change.

That's more than a bit crap.


Related issues

Related to illumos gate - Bug #4793: spa_import_rootpool and vfs_mountroot rely on device path strings passed from GRUBNew2014-04-22

Actions
Related to illumos gate - Feature #9955: loader: export boot pool and vdev GUIDClosed2018-11-04

Actions

History

#1

Updated by Toomas Soome about 3 years ago

Rich Lowe wrote:

[...]

This means that while ZFS itself tries to be particularly resilient to devices moving around, boot is utterly unresilient, and requires booting from other media and re-importing the pool should the physical path to the boot device change.

That's more than a bit crap.

Actually the boot loader itself is least issue there, its trivial to pass pool UUID & object id (dataset) to kernel instead of phys_path & devid (kernel actually does attempt to use devid and falls back to path), the actual problem is the kernel rootfs mount relies entirely on devid/phys_path because how the driver attach is implemented. Since kernel does not scan possible devices but relies on path/devid information from pool, it also limits us to single vdev boot pools.

#2

Updated by Andrew Stormont almost 3 years ago

  • Related to Bug #4793: spa_import_rootpool and vfs_mountroot rely on device path strings passed from GRUB added
#3

Updated by Joshua Clulow 8 months ago

  • Related to Feature #9955: loader: export boot pool and vdev GUID added
#4

Updated by Joshua Clulow 8 months ago

  • Subject changed from boot should be more resilient to physical path to bootfs changing. to boot should be more resilient to physical path to bootfs changing
  • Category changed from bootloader to kernel
  • Assignee set to Joshua Clulow
  • Tags deleted (needs-triage)

Now that #9955 has gone back, the boot loader is exposing the boot vdev GUID and pool GUID to the OS in two new properties. I've been experimenting with a change to attempt to attach all of the modules and walk through the available block devices looking for a matching pool label in the event that the cached /devices path does not work.

#5

Updated by Joshua Clulow 8 months ago

Note that the error message on boot that you get because of this problem looks like this:

SunOS Release 5.11 Version illumos-42a3762d01 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.

NOTICE: Cannot read the pool label from '/pci@0,0/pci1af4,2@4/blkdev@0,0:b'
NOTICE: spa_import_rootpool: error 5
Cannot mount root on /pci@0,0/pci1af4,2@4/blkdev@0,0:b fstype zfs

panic[cpu0]/thread=fffffffffbc4a040: vfs_mountroot: cannot mount root

Warning - stack not written to the dump buffer
fffffffffbc8a1f0 genunix:vfs_mountroot+3ab ()
fffffffffbc8a230 genunix:main+135 ()
fffffffffbc8a240 unix:_locore_start+90 ()

skipping system dump - no dump device configured
rebooting...
#7

Updated by Joshua Clulow 8 months ago

One artefact with the prototype thus far is that while we are able to correctly open the pool even though the /devices path has changed, the /dev name (e.g., c0t0d0) of the disk is often incorrect in zpool status after boot. Running zdb rpool will fail once, complaining that it cannot open the pool -- but run a second time and it succeeds, and then zpool status shows the correct information.

The routine spa_vdev_setpath(), which is called from usermode through ZFS_IOC_VDEV_SETPATH, appears to be used in zpool_vdev_name(). There is some suggestion that this routine will correctly identify changes in the relationship between the devid and the /dev path, and that it might actually be called during zpool status. It doesn't quite seem to do the right thing yet, but I think investigating that will be important.

It was also suggested to me that I should include booting from at least mirrored pools in the test plan for this change.

#8

Updated by Joshua Clulow 8 months ago

It turns out I was running zpool status as an unprivileged user, which means it is unable to check or update the vdev_path. On the first privileged run of zpool status, the output is correct and the label appears to be updated with the new value. It's looking like we should inject a (privileged) run of zpool status on the pool with the root file system into cmd/svc/milestone/fs-root when we've determined that the root file system is ZFS.

#9

Updated by Joshua Clulow 8 months ago

As it turns out, with a mirrored rpool this is more complicated. I'll need to extend vdev_disk_open() with some logic to convert a pool+vdev GUID pair into new /devices paths for all of the disks it tries to open -- fixing spa_import_rootpool() alone is not enough there.

#10

Updated by Toomas Soome 8 months ago

Joshua Clulow wrote:

As it turns out, with a mirrored rpool this is more complicated. I'll need to extend vdev_disk_open() with some logic to convert a pool+vdev GUID pair into new /devices paths for all of the disks it tries to open -- fixing spa_import_rootpool() alone is not enough there.

mirror and raidz is similar there, you have stripe with vdev (mirror/raidz) configuration replicated. But, since we are scanning all block devices anyhow, we should register all disks with pool GUID so we can import multi-vdev pools too (loader does this for the same reason). And this will remove one more limit we used to have since 2005.

#11

Updated by Joshua Clulow 8 months ago

But, since we are scanning all block devices anyhow, we should register all disks with pool GUID so we can import multi-vdev pools too

Yes, I'm going to add a function that can be called in either of the places we try to use a /devices path prior to modrootloaded which will build the list for the pool GUID and allow a translation from (pool GUID, vdev GUID) to /devices path. The function will only get called if we "miss" (i.e., can't open a device by any of the regular means) which should mean it's unlikely to break anybody for whom ZFS root is already working.

Also available in: Atom PDF