Project

General

Profile

Bug #10499

Multi-modifier protection (MMP)

Added by Jerry Jelinek 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2019-03-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Port the following ZFS commits from ZoL to illumos.
379ca9cf2 Multi-modifier protection (MMP)
bbffb59ef Fix multihost stale cache file import
0d398b256 Do not initiate MMP writes while pool is suspended

History

#1

Updated by Jerry Jelinek 6 months ago

This is the final set of ZoL patches needed for MMP:
db2af93d7 Increase default zfs_multihost_fail_intervals and import_intervals
3d31aad83 MMP writes rotate over leaves
8133679ff Do not resume a pool if multihost is enabled
4f3218aed Warn user about accidentally sharing devices
107dd2b17 Use proper tag for spa config refcounts in mmp_write_uberblock()
5365b0747 Add missing MMP status code to libzfs_status
34fe773e3 Skip import activity test in more zdb code paths
520023771 MMP should not suspend pool in ztest
bc5f51c5d module param callbacks check for initialized spa (XXX part)
0ba106e75 Fix divide-by-zero in mmp_delay_update()
533ea0415 Update mmp_delay on sync or skipped, failed write
cec3a0a1b Report pool suspended due to MMP
b7eec00f9 Fix MMP write frequency for large pools
ebed90a59 Handle zio_resume and mmp => off
d2160d053 Record skipped MMP writes in multihost_history
14c240ced Detect long config lock acquisition in mmp
7088545d0 Report duration and error in mmp_history entries
0d398b256 Do not initiate MMP writes while pool is suspended
f2c0dee23 Correct count_uberblocks in mmp.kshlib
918dbe35b mmp should use a fixed tag for spa_config locks
51d1b58ef Emit an error message before MMP suspends pool
bbffb59ef Fix multihost stale cache file import
b33d668dd Fix ZTS MMP tests and ztest -M behavior
d410c6d9f Reimplement vdev_random_leaf and rename it
af0f84288 mmp_on_uberblocks: Use kstat for uberblock counts
e889f0f52 Report MMP_STATE_NO_HOSTID immediately
0582e4032 Add callback for zfs_multihost_interval
ffb195c25 Release SCL_STATE in map_write_done()
b6e5c4038 Use correct macro for hz in mmp.c
802ae562e Fix coverity defects: CID 165755
cc9c6bcb7 Fix vdev_probe() call outside SCL_STATE_ALL lock
ff998d804 Ignore dataset if the dds_type is DMU_OST_OTHER
379ca9cf2 Multi-modifier protection (MMP)

#2

Updated by Electric Monk 6 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit e0f1c0afa46cc84d4b1e40124032a9a87310386e

commit  e0f1c0afa46cc84d4b1e40124032a9a87310386e
Author: Olaf Faaland <faaland1@llnl.gov>
Date:   2019-04-03T18:42:53.000Z

    10499 Multi-modifier protection (MMP)
    Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Portions contributed by: Tim Chase <tim@chase2k.com>
    Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com>
    Portions contributed by: John L. Hammond <john.hammond@intel.com>
    Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov>
    Portions contributed by: Prakash Surya <surya1@llnl.gov>
    Reviewed by: George Melikov <mail@gmelikov.ru>
    Reviewed by: Tom Caputi <tcaputi@datto.com>
    Reviewed by: Kash Pande <kash@tripleback.net>
    Reviewed by: loli10K <ezomori.nozomu@gmail.com>
    Reviewed by: George Melikov <mail@gmelikov.ru>
    Reviewed by: Tony Hutter <hutter2@llnl.gov>
    Reviewed by: Gu Zheng <guzheng2331314@163.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Ned Bass <bass6@llnl.gov>
    Reviewed by: Andreas Dilger <andreas.dilger@intel.com>
    Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Kody Kantor <kody.kantor@joyent.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

#3

Updated by Jerry Jelinek 6 months ago

Summary from initial ZoL commit:

Multi-modifier protection (MMP)

Add multihost=on|off pool property to control MMP.  When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp.  Property defaults to off.

During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock.  Include the 
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".

Allow the user to control the period between MMP writes, and the 
duration of the activity test on import, via a new module parameter
zfs_multihost_interval.  The period is specified in milliseconds.  The 
activity test duration is calculated from this value, and from the 
mmp_delay in the "best" uberblock found initially.
#5

Updated by Joshua M. Clulow 6 months ago

It would seem that this integration broke the build. From mail_msg:

==== Build errors (DEBUG) ====

dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Target `install' not remade because of errors
dmake: Warning: Command failed for target `libzpool'
dmake: Warning: Target `install' not remade because of errors
dmake: Warning: Command failed for target `lib'
dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Target `install' not remade because of errors
dmake: Warning: Command failed for target `ztest'
dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Target `install' not remade because of errors
Undefined            first referenced
 symbol                  in file
get_system_hostid                   pics/kernel.o
ld: fatal: symbol referencing errors. No output written to libzpool.so.1
Undefined            first referenced
 symbol                  in file
get_system_hostid                   pics/kernel.o
ld: fatal: symbol referencing errors. No output written to libzpool.so.1
Undefined            first referenced
 symbol                  in file
get_system_hostid                   pics/kernel.o
ld: fatal: symbol referencing errors. No output written to libzpool.so.1
Undefined            first referenced
 symbol                  in file
get_system_hostid                   pics/kernel.o
ld: fatal: symbol referencing errors. No output written to libzpool.so.1
Undefined            first referenced
 symbol                  in file
mmp_thread_start                    ztest.o
mmp_thread_stop                     ztest.o
mmp_signal_all_threads              ztest.o
zfs_multihost_fail_intervals        ztest.o
spa_multihost                       ztest.o
ld: fatal: symbol referencing errors. No output written to ztest
Undefined            first referenced
 symbol                  in file
mmp_thread_start                    ztest.o
mmp_thread_stop                     ztest.o
mmp_signal_all_threads              ztest.o
zfs_multihost_fail_intervals        ztest.o
spa_multihost                       ztest.o
ld: fatal: symbol referencing errors. No output written to ztest
Undefined            first referenced
 symbol                  in file
mmp_thread_start                    ztest.o
mmp_thread_stop                     ztest.o
mmp_signal_all_threads              ztest.o
zfs_multihost_fail_intervals        ztest.o
spa_multihost                       ztest.o
ld: fatal: symbol referencing errors. No output written to ztest
Undefined            first referenced
 symbol                  in file
mmp_thread_start                    ztest.o
mmp_thread_stop                     ztest.o
mmp_signal_all_threads              ztest.o
zfs_multihost_fail_intervals        ztest.o
spa_multihost                       ztest.o
ld: fatal: symbol referencing errors. No output written to ztest

I'm going to back it out for now, and we'll figure out how this slipped through.

#6

Updated by Electric Monk 6 months ago

git commit 8cf549157d2961d6f27e3246932cbf806123de48

commit  8cf549157d2961d6f27e3246932cbf806123de48
Author: Joshua M. Clulow <josh@sysmgr.org>
Date:   2019-04-04T14:37:49.000Z

    10499 Multi-modifier protection (MMP) [fix build race]

#7

Updated by Joshua M. Clulow 6 months ago

The second push fixes the build race seen on some systems -- but, of course, not all!

Also available in: Atom PDF