Project

General

Profile

Bug #5666

Crash of the HA storage cluster and entanglement of a hard drives

Added by Alexander Shvayakov over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2015-02-26
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Hello,

I have a ZFS Storage cluster with the two nodes ad998/ad999 (Supermicro) and two JBOD Quanta M4240H
OS: OmniOS v11 r151012 ( omnios-10b9c79)
Clustering: RSF-1

This system worked about 2-3 years without serious problems.

But in December I had to reinstall the operating system on both nodes. I did it without a break with the migration ZFS pools between two nodes.
I couldn't find another way to add a network card. No other way didn't work.

I did a reinstall and the Storage worked for several months without any problems until the Feb 24..

I got a messages from the zabbix - "ZFS pools is DEGRADED".

All pools were located on the ad999 at this moment.

I saw many messages in the /var/adm/messages

Feb 24 15:45:55 ad999 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c0a@3,2/pci1000,3000@0 (mpt_sas1):
Feb 24 15:45:55 ad999 Unable to allocate dma memory for extra SGL.
Feb 24 15:45:55 ad999 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c0a@3,2/pci1000,3000@0 (mpt_sas1):
Feb 24 15:45:55 ad999 MPT SGL mem alloc failed

I saw a h/w errors in the output iostat -en on the all disks.
For example:

c2t5000CCA03E4B7A26d0 Soft Errors: 0 Hard Errors: 11 Transport Errors: 0
Vendor: HITACHI Product: HUS723030ALS640 Revision: A222 Serial No: YVHAHTWD
Size: 3000.59GB <3000592982016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 11 Recoverable: 0

Status of the ZFS pool:

pool: GIGO1

state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Feb 24 17:28:54 2015
2.93M scanned out of 6.26T at 334K/s, (scan is slow, no estimated time)
1.41M resilvered, 0.00% done
config:

NAME                         STATE     READ WRITE CKSUM 
GIGO1 DEGRADED 0 4 0
mirror-0 DEGRADED 0 17 0
c2t5000CCA03E4AF152d0 ONLINE 0 17 26
c3t5000CCA03E3F024Ad0 DEGRADED 0 21 66 too many errors
mirror-1 DEGRADED 0 22 0
c2t5000CCA03E4B7A0Ed0 DEGRADED 0 24 106 too many errors
c3t5000CCA03E4B7A32d0 DEGRADED 1 34 117 too many errors
mirror-2 DEGRADED 0 7 0
spare-0 DEGRADED 0 5 78
c2t5000CCA03E4B7A26d0 DEGRADED 0 8 103 too many errors
c2t5000CCA03E4B7AEAd0 ONLINE 0 5 109 (resilvering)
c3t5000CCA03E4B7AA2d0 DEGRADED 0 13 80 too many errors
mirror-4 DEGRADED 0 14 0
spare-0 DEGRADED 0 9 51
c2t5000CCA03E441E72d0 DEGRADED 0 17 105 too many errors
c3t5000CCA03E4B7B3Ed0 ONLINE 0 9 76 (resilvering)
c3t5000CCA03E4B795Ad0 DEGRADED 1 16 53 too many errors
logs
mirror-3 UNAVAIL 0 19 0 insufficient replicas
c2t50011731001B157Ad0 FAULTED 0 19 0 too many errors
c3t50011731001B1586d0 FAULTED 0 19 0 too many errors
cache
c1t50015178F36660FAd0 ONLINE 0 0 0
c1t55CD2E40000942E9d0 ONLINE 0 0 0
spares
c2t5000CCA03E4B7AEAd0 INUSE currently in use
c3t5000CCA03E4B7B3Ed0 INUSE currently in use

errors: Permanent errors have been detected in the following files:

GIGO1/p1:&lt;0x0&gt;

I decided to move the zfs pools to the ad998, but I got a kernel panic during the import pool

Unfortunately I didn't wait for the full completion of the core dump and don't have data for debugging.
Now I understand that it had to be done, but I was in a hurry.

I saw the fresh driver mpt_sas in the updates and I decided to update the both systems and reboot nodes.
After reboot I tried to migrate pool, but I got panic again.

The DEGRADED pools work on the ad999 where everything started.
I can' to import (migrate) them to the second node (ad998).

Pools are resilvering now, but I see strange things:

NAME                         STATE     READ WRITE CKSUM
GIGO4 DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t5000CCA01A8B9696d0 ONLINE 0 0 6
c3t5000CCA01A8E08C2d0 DEGRADED 0 0 13 too many errors (resilvering)
mirror-1 ONLINE 0 0 0
spare-0 ONLINE 0 0 3
c2t5000CCA01A8D3116d0 ONLINE 0 0 3 (resilvering)
c2t5000CCA03E461C52d0 ONLINE 0 0 4 (resilvering)
c3t5000CCA03E4B7B8Ed0 ONLINE 0 0 3 (resilvering)
c3t5000CCA01A8E5EFEd0 ONLINE 0 0 1
mirror-2 ONLINE 0 0 0
c2t5000CCA01A8E60A2d0 ONLINE 0 0 3
spare-1 ONLINE 0 0 11
c3t5000CCA01A8E5FE6d0 ONLINE 0 0 11 (resilvering)
c3t5000CCA03E2427B2d0 ONLINE 0 0 11 (resilvering)
logs
mirror-3 ONLINE 0 0 0
c2t50011731001B159Ed0 ONLINE 0 0 0
c3t50011731001B15A6d0 ONLINE 0 0 0
cache
c1t5001517BB29F6ABAd0 ONLINE 0 0 0
spares
c2t5000CCA03E461C51d0 INUSE currently in use
c3t5000CCA03E2427B1d0 INUSE currently in use
c3t5000CCA03E4B7B8Ed0 INUSE currently in use

For example I see in the list of spare disks c2t5000CCA03E461C51d0 (currently in use), but I see c2t5000CCA03E461C52d0 in the mirror-1/spare-0

I tried:
smartctl -i -d scsi -T permissive /dev/rdsk/c2t5000CCA03E461C51d0
And I got: Smartctl open device: /dev/rdsk/c2t5000CCA03E461C51d0 failed: No such device

I can't understand why on the ad999 in the pool GIGO4 I saw the c2t5000CCA03E461C51d0 as spare. The correct name of this disc – c2t5000CCA03E461C52d0 on the ad999.
This disc have the name c2t5000CCA03E461C51d0 on the ad998.

I saw there are weird stuff.
For example I saw the c13t5000CCA01A8D3115d0s0 in the output fmdump -eV
You can see there the time: Feb 24 2015 17:37:47.839312364

Drive names changed after reinstalling of the Operating system from "c13...." to "c2......".
This disk have new name now - c2t5000CCA01A8D3115d0
I reinstalled this system on December 18.
This name is from another dimension.
That system doesn't exist

Where did that name?
Maybe out of the computers hell?
I can't understand

Could you recommend something?

I have two tarbols (1-2Mb) from the each node with the reports of various diagnostic tools.

But I can't to attach them, I get error 413 Request Entity Too Large

#1

Updated by Alexander Shvayakov over 5 years ago

/var/adm/messages.0 file from ad998

Feb 24 17:40:55 ad998 unix: [ID 836849 kern.notice]
Feb 24 17:40:55 ad998 ^Mpanic[cpu3]/thread=ffffff19edf38740:
Feb 24 17:40:55 ad998 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff00ba64f4b0 addr=40 occurred in module "zfs" due to a NULL pointer dereference

fmdump -V -e -u edb53f3e-d2ed-ed0c-b835-dba3cfd5f4ee

TIME                           CLASS
Feb 24 2015 18:08:29.329960085 ireport.os.sunos.panic.dump_available
nvlist version: 0
    version = 0x0
    class = ireport.os.sunos.panic.dump_available
    uuid = 9b70ebe3-fe45-c5ed-eeb3-bd4b28ab07c2
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = sw
        object = (embedded nvlist)
        nvlist version: 0
            path = /usr/bin/amd64/savecore
        (end object)

        site = (embedded nvlist)
        nvlist version: 0
            file = ../savecore.c
            func = raise_event
            line = 1634
        (end site)

        context = (embedded nvlist)
        nvlist version: 0
            origin = userland
            execname = /usr/bin/amd64/savecore
            pid = 959
        (end context)

    (end detector)

    pri = high
    attr = (embedded nvlist)
    nvlist version: 0
        dumpdir = /var/crash/unknown
        instance = 0
        compressed = 1
        os-instance-uuid = edb53f3e-d2ed-ed0c-b835-dba3cfd5f4ee
        crashtime = 1424796057
        panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff00ba64f4b0 addr=40 occurred in module "zfs" due to a NULL pointer dereference
        panicstack = unix:die+df () | unix:trap+db3 () | unix:cmntrap+e6 () | zfs:vdev_is_dead+c () | zfs:vdev_readable+18 () | zfs:vdev_mirror_child_select+7a () | zfs:vdev_mirror_io_start+1b5 () | zfs:zio_vdev_io_start+1e0 () | zfs:zio_execute+78 () | zfs:zio_wait+37 () | zfs:arc_read+5d0 () | zfs:dmu_objset_open_impl+f2 () | zfs:dsl_pool_init+40 () | zfs:spa_load_impl+4fc () | zfs:spa_load+14e () | zfs:spa_tryimport+aa () | zfs:zfs_ioc_pool_tryimport+51 () | zfs:zfsdev_ioctl+4a7 () | genunix:cdev_ioctl+39 () | specfs:spec_ioctl+60 () | genunix:fop_ioctl+55 () | genunix:ioctl+9b () | unix:brand_sys_sysenter+1c9 () | 
        dump-incomplete = 0
        fm-panic = 0
        panic-time = Tue Feb 24 17:40:57 2015 CET
    (end attr)

    __ttl = 0x1
    __tod = 0x54ecb00d 0x13aaca95

#2

Updated by Alexander Shvayakov over 5 years ago

Could cause panic the ZFS cache files during import pool?

GIGO4  cachefile    /opt/HAC/RSF-1/etc/volume-cache/GIGO4.cache-live

I just copied them from the old system.
There could be the old disk names

#3

Updated by Alexander Shvayakov over 5 years ago

On Friday, I deleted ZFS cache files and successfully moved the pools.

I found the reason, but not sure what the panic it is normal for this situation.

It remains to find the reason why I got these messages in the /var/adm/messages

Feb 24 15:45:55 ad999 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c0a@3,2/pci1000,3000@0 (mpt_sas1):
Feb 24 15:45:55 ad999 Unable to allocate dma memory for extra SGL.
Feb 24 15:45:55 ad999 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3c0a@3,2/pci1000,3000@0 (mpt_sas1):
Feb 24 15:45:55 ad999 MPT SGL mem alloc failed

Also available in: Atom PDF