Project

General

Profile

Actions

Bug #11816

closed

iscsi initiator gets confused if attached before root file system mount

Added by Joshua M. Clulow over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

The iSCSI initiator loads some data from the "persistent store"; i.e., state files stored on the root file system. It is theoretically possible to boot from an iSCSI device as well, so the initiator is willing to defer loading from the store if we are attempting an iSCSI boot.

The logic for deferring load of the persistent store is not quite right -- if we are not booting from iSCSI but the module is attached before root is mounted, the module gets stuck in a broken state. Historically this has been unlikely to occur, as drivers are usually not attached en masse until after the root file system is mounted. As part of fixing #7119, under some conditions we may load and attach all modules as part of attempting to locate a new /devices path for the root ZFS pool. This happens prior to mounting root and causes the iSCSI module to get stuck.


Related issues

Related to illumos gate - Bug #7119: boot should handle change in physical path to ZFS root devicesClosedJoshua M. Clulow2016-06-18

Actions
Actions #1

Updated by Joshua M. Clulow over 3 years ago

To add some more concrete detail, if we call ndi_devi_config() before vfs_mountroot(), we see the following effects:

root@oi0:~# svcs -xv
svc:/network/iscsi/initiator:default (iSCSI initiator daemon)
 State: maintenance since October 20, 2019 at 12:00:23 PM PDT
Reason: Start method failed repeatedly, last exited with status 1.
   See: http://illumos.org/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 7D iscsi
   See: /var/svc/log/network-iscsi-initiator:default.log
Impact: This service is not running.

Looking at the log:

[ Oct 20 12:00:23 Executing start method ("/lib/svc/method/iscsi-initiator start"). ]
[ Oct 20 12:00:23 Method "start" exited with status 1. ]
[ Oct 20 12:00:23 Executing start method ("/lib/svc/method/iscsi-initiator start"). ]
[ Oct 20 12:00:23 Method "start" exited with status 1. ]
[ Oct 20 12:00:23 Executing start method ("/lib/svc/method/iscsi-initiator start"). ]
[ Oct 20 12:00:23 Method "start" exited with status 1. ]

Trying to start the daemon manually:

root@oi0:~# truss -t open,ioctl -f /lib/svc/method/iscsid
1365:   open("/var/ld/ld.config", O_RDONLY)             Err#2 ENOENT
1365:   open("/lib/libc.so.1", O_RDONLY)                = 3
1365:   open("/usr/lib/locale//en_US.UTF-8/LC_CTYPE/LCL_DATA", O_RDONLY) = 3
1365:   open("/usr/lib/locale//en_US.UTF-8/LC_NUMERIC/LCL_DATA", O_RDONLY) = 3
1365:   open("/usr/lib/locale//en_US.UTF-8/LC_TIME/LCL_DATA", O_RDONLY) = 3
1365:   open("/usr/lib/locale//en_US.UTF-8/LC_COLLATE/LCL_DATA", O_RDONLY) = 3
1365:   open("/usr/lib/locale//en_US.UTF-8/LC_MONETARY/LCL_DATA", O_RDONLY) = 3
1365:   open("/usr/lib/locale//en_US.UTF-8/LC_MESSAGES/LCL_DATA", O_RDONLY) = 3
1366:   open("/dev/null", O_RDWR)                       = 4
1366/1:         open("/devices/iscsi:devctl", O_RDWR)           = 6
1366/1:         ioctl(6, (('i'<<24)|('S'<<16)|('C'<<8)|34), 0x080632E0) Err#14 EFAULT
1366/1:         open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/messages.mo", O_RDONLY) Err#2 ENOENT
1366/1:         open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo", O_RDONLY) Err#2 ENOENT
1365/2:             Received signal #18, SIGCLD, in lwp_park() [caught]
1365/2:               siginfo: SIGCLD CLD_EXITED pid=1366 status=0xFFFFFFFF

ioctl 34 is:

#define ISCSI_SMF_ONLINE                (ISCSI_IOCTL | 34)

In iscsi_ioctl(), a suspicious and ultimately incorrect comment:

        case ISCSI_SMF_ONLINE:
...
                /* just a theoretical case */
                if (ihp->hba_persistent_loaded == B_FALSE) {
                        rtn = EFAULT;
                        break;
                }

Indeed, we're stuck attached but without the persistent store loaded:

> *iscsi_state::walk softstate | ::print iscsi_hba_t hba_persistent_loaded
hba_persistent_loaded = 0 (0)

By forcing the module to detach, we can clear the service:

# modinfo | grep iscsi
 49 fffffffff7838000  349f8 108   1  iscsi (iSCSI Initiator v-1.55)
# modunload -i 49 
# svcadm clear iscsi/initiator
# svcs iscsi/initiator
STATE          STIME    FMRI
online         12:21:52 svc:/network/iscsi/initiator:default
Actions #2

Updated by Joshua M. Clulow over 3 years ago

Testing Notes

In order to test this change, I first replicated the symptoms using an unconditional pre-mountroot call to code which attempts to attach all driver instances:

@@ -504,6 +511,18 @@ main(void)
        process_cache = kmem_cache_create("process_cache", sizeof (proc_t),
            0, NULL, NULL, NULL, NULL, NULL, 0);

+       /*
+        * XXX Force for iSCSI testing...
+        */
+       extern void preroot_walk_block_devices(
+           int (*f)(const char *, void *), void *arg);
+       printf("FORCE preroot_walk_block_devices ...\n");
+       preroot_walk_block_devices(xxx_preroot_walker, NULL);
+       printf("... END FORCED preroot_walk_block_devices\n");
+       /*
+        * XXX
+        */
+
        vfs_mountroot();        /* Mount the root file system */
        errorq_init();          /* after vfs_mountroot() so DDI root is ready */
        cpu_kstat_init(CPU);    /* after vfs_mountroot() so TOD is valid */

The salient part of preroot_walk_block_devices() is:

+       (void) ndi_devi_config(ddi_root_node(), NDI_CONFIG | NDI_DEVI_PERSIST |
+           NDI_NO_EVENT | NDI_DRV_CONF_REPROBE);

The sequence of builds and tests I performed were as follows:

  • configure a stock OpenIndiana system as an iSCSI initiator and reboot
    • everything is working as expected (iSCSI service online, volumes visible and functional)
  • build with only the addition of the preroot_walk_block_devices() call and update the system
    • next boot, the service is offline as described in the comments
  • build with the addition of the change
    • next boot, the service is online and the volumes are visible and functional once more
  • final RTI build with just the change at issue
    • booted and checked that service was online and volumes were visible
Actions #3

Updated by Electric Monk over 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 7284664a1faa361af4ff33ba5435d43d3ee07bf2

commit  7284664a1faa361af4ff33ba5435d43d3ee07bf2
Author: Joshua M. Clulow <josh@sysmgr.org>
Date:   2019-10-25T12:02:21.000Z

    11816 iscsi initiator gets confused if attached before root file system mount
    Reviewed by: Andrew Stormont <astormont@racktopsystems.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions #4

Updated by Joshua M. Clulow 10 months ago

  • Related to Bug #7119: boot should handle change in physical path to ZFS root devices added
Actions

Also available in: Atom PDF