Project

General

Profile

Bug #11820

upstream Nexenta iSCSI and COMSTAR fixes

Added by Jerry Jelinek 6 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
comstar - iSCSI/FC/SAS target
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

The following set of iSCSI fixes are in the Nexenta src, but not upstream. For people using iSCSI, have some or all of this should be beneficial.

NEX-15346 COMSTAR hang with thousands of threads waiting in idm_refcnt_wait_ref() from iscsit_conn_lost()
NEX-15497 Multiple iSCSI targets stuck offlining and an old thread in idm_refcnt_wait_ref
NEX-15955 Support for more than 255 iSCSI Targets
NEX-16625 Max amount of iSCSI targets is hard limited with doorfs core definitions
NEX-16937 Poor performance observed during large delete / SCSI UNMAP operations (Comstar portion)
NEX-16711 STMF task workers do not scale correctly
NEX-17397 Getting panic: kernel heap corruption detected when trying to create big count of iSCSI targets and mappings
NEX-17551 Relax ATS overlap checking
NEX-17944 HBA drivers don't need the redundant devfs_clean step
NEX-1825 LUN's not discovered with ALUA - Commands sent down standby path
NEX-18878 Getting BAD TRAP panic when payload to iSCSI or FC shared zvol from client
NEX-19189 Very poor iSCSI throughput when VAAI enabled on ESXi host
NEX-19489 BDD tests causes node panic during stage-vm1
NEX-20098 idm_refcnt_unref_task() fails to hold mutex before calling REFCNT_AUDIT
NEX-2178 Multi-block transfers on memory constrained systems for write_same (0x93) and compare_and_write (0x89) cause memory corruption
NEX-2105 assertion failed: (scmd->flags & SBD_SCSI_CMD_TRANS_DATA) && scmd->trans_data != NULL, file: ../../common/io/comstar/lu/stmf_sbd/sbd_scsi.c, line: 2447
NEX-2359 iSCSI target connection handling is missing refcount audit information
NEX-2378 RW_LOCK_HELD assertion blown for sl_access_state_lock in sbd_flush_data_cache()
NEX-2613 There should be a tunable to enable/disable SCSI UNMAP in NexentaStor
NEX-2756 iscsit should fold the case of scsi names that we get from initiators
NEX-3023 Panics and hangs when using write_same and compare_and_write
NEX-3104 Panic in stmf_dlun0_done
NEX-3111 Comstar does not pass cstyle and hdrchk
NEX-3169 STMF has duplicate code in 6 places which is error prone.
NEX-3171 VAAI Disable not yet included in 5.0
NEX-3177 enums and string arrays required to be in sync.
NEX-3204 Panic doing FC rescan from ESXi 5.5u1 with VAAI enabled
NEX-3217 Panic running benchmark at ESX VM NEX-3204 Panic doing FC rescan from ESXi 5.5u1 with VAAI enabled
NEX-3259 COMSTAR has insufficient task workers
NEX-3313 Extraneous ASSERT in idm code needs to be removed.
NEX-3414 CLONE - Port 3339 iscsi/fs:5 causes panic on initiator NEX-3419 CLONE - Run multi initiator sessions to a single target test can panic the initiator
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
NEX-3566 assertion failed: lu == dlun0 || (ilu->ilu_state != STMF_STATE_OFFLINING && ilu->ilu_state != STMF_STATE_OFFLINE), file: ../../common/io /comstar/stmf/stmf.c, line: 4063
NEX-3622 COMSTAR should have per remote port kstats for I/O and latency
NEX-3672 IDM module panics target when PDU has AHS length between 17 and 49
NEX-3746 Panic seen if iSCSI Initiator sends a SCSI Response packet to target
NEX-3777 COMSTAR iscsi/io test can panic the target running 5.0
NEX-3785 This is on 5.0 release and iSCSI target can panic while running iSCSI mpxio disable test
NEX-4433 Deadlock among txg_wait_synced(), sbd_pgr_remove_it_handle(), and ppt_sess_lookup_create() after HA failover
NEX-4707 memory leak in stmf_sbd`sbd_attach() on successful property lookup
NEX-4905 While Deleting an iSCSI target, the appliance gets in to panic mode(restart)
NEX-5428 Backout the 5.0 changes
NEX-2937 Continuous write_same starves all other commands
NEX-6018 Return of the walking dead idm_refcnt_wait_ref comstar threads
NEX-6832 fcsm module's debug level default should be 0 (cstyle fix)
NEX-7048 COMSTAR MODE_SENSE support is broken
NEX-7681 System panics: Deadlock: cycle in blocking chain
NEX-7907 uts/common/sys/scsi/scsi_names.h should define IQN as 223 bytes, not utf-8 characters
NEX-9567 Deadlock: cycle in blocking chain panic
NEX-988 itask_lu_[read|write]_time was inadvertently removed by the Illumos 3862 fix
NEX-9981 Deadman timer panic from idm_refcnt_wait_ref thread while offlining iSCSI targets
OS-69 Open source VAAI
re #11454 rb3750 Fix inconsistent vid/pid in stmf
re #12375 rb4141 Create ALUA Support on NexentaStor; Failover causes loss of storage
re #12981, rbYYYY Panic due to possible race between LU coming to ready state and COMSTAR (stmf) sbd task
re #13660 rb4549 ALUA mode: `/usr/demo/comstar/bin/aluaadm enable 1` sets "ALUA Node" = 0
re #13796 OSX FC Initiator cannot attach to LUN if LUN id is different then other OSX
re #6919, rb2433 ATS not dropping locks upon completion
re #7550 rb2134 lint-clean nza-kernel
re #7936 rb3706 Support for COMSTAR/OEM re #8002 rb3706 Allow setting iSCSI vendor ID via stmf_sbd.conf
re #8499 rb3117 Unreleased ATS lock possibly caused by concurrent svmotions
re #9822 - libiscsi test panics COMSTAR
Running stmfadm remove-hg-member caused a NULL pointer dereference panic in stmf_remove_lu_from_session
SUP-505 ATS LUN lock under high lun load with multiple VM migrations occuring concurrently
SUP-505 ATS LUN lock under high lun load with multiple VM migrations occuring concurrently fixed panic in the abort path of a compare_and_write
SUP-540 panic on page fault in stmf_task_lu_free()
SUP-698 Appliance is regularly crashing with a kernel memory allocator, duplicate free: buffer freed NEX-2105 assertion failed: (scmd->flags & SBD_SCSI_CMD_TRANS_DATA) && scmd->trans_data != NULL, file:
SUP-761 sbd_flush_data_cache() call against closed zvol results in NULL pointer deref in zil_commit() call further down the stack
SUP-765 When a Windows Clustered Shared Volume is placed on a pool under Nexenta HA Cluster control the clustered shared disk looses its PGR3 reservation to the presented zvol.
SUP-770 deadlock between thread acquiring iss->iss_lockp in stmf_task_free() and thread holding sl->sl_pgr->pgr_lock from sbd_pgr_remove_it_handle()
SUP-772 COMSTAR task waiting pressure doesn't increase worker thread pool sizing above initial value
SUP-776 iSCSI target portals not isolating IQNs on different subnets
SUP-782 COMSTAR UNMAP support should limit number of LBAs per operation
SUP-930 Freed and reused idm_conn_t buffer leads to system panic.
Support simultaneous compare_and_write operations for VAAI
SUS-56 libstmfproxy incorrectly checks socket() return code
VAAI (XXX ATS support for COMSTAR, YYY Block-copy support for COMSTAR)
8226 nza-kernel needs to be buildable by itself
Adding AoE support to nza-kernel

There is also a set of tests under usr/src/test/comstar which are currently not in illumos.

History

#1

Updated by Jerry Jelinek 6 months ago

  • Subject changed from upstream Nexenta iSCSI fixes to upstream Nexenta iSCSI and COMSTAR fixes
#2

Updated by Jerry Jelinek 6 months ago

Status on porting commits:

Done - missing from my original list
9b3c7afe2 NEX-4707 memory leak in stmf_sbd`sbd_attach() on successful property lookup

Done - missing from my original list
90df7c62b NEX-15346 COMSTAR hang with thousands of threads waiting in idm_refcnt_wait_ref() from iscsit_conn_lost()

Done
89f91b4c76 NEX-15497 Multiple iSCSI targets stuck offlining and an old thread in idm_refcnt_wait_ref
25e0845fe2 NEX-15497 Multiple iSCSI targets stuck offlining and an old thread in idm_refcnt_wait_ref
6fbbe56416 NEX-16625 Max amount of iSCSI targets is hard limited with doorfs core definitions
6f2ec3d387 NEX-15955 Support for more than 255 iSCSI Targets
586990b5b7 NEX-15955 Support for more than 255 iSCSI Targets
33d72c7e70 NEX-17397 Getting panic: kernel heap corruption detected when trying to create big count of iSCSI targets and mappings
ba6bcff00c NEX-17551 Relax ATS overlap checking

Done - partial, only iscsi_lun.c
766277b697 NEX-17944 HBA drivers don't need the redundant devfs_clean step

Done
1c2722ce8d NEX-1825 LUN's not discovered with ALUA - Commands sent down standby path
480e10aafa NEX-18878 Getting BAD TRAP panic when payload to iSCSI or FC shared zvol from client

Done - Missing from original list
ba3109f0f44 NEX-16937 Poor performance observed during large delete / SCSI UNMAP operations (Comstar portion)

Done
52341e6a28 NEX-19189 Very poor iSCSI throughput when VAAI enabled on ESXi host
a15ac3edbd NEX-19489 BDD tests causes node panic during stage-vm1
37cd26e5cd NEX-2359 iSCSI target connection handling is missing refcount audit information
62d6b23930 NEX-2378 RW_LOCK_HELD assertion blown for sl_access_state_lock in sbd_flush_data_cache()
8506e1f0c8 NEX-2613 There should be a tunable to enable/disable SCSI UNMAP in NexentaStor

NA - already have it
0e9cc2b5f5 NEX-2756 iscsit should fold the case of scsi names that we get from initiators

Done
91f26de312 NEX-3023 Panics and hangs when using write_same and compare_and_write
9fee6c752d NEX-3104 Panic in stmf_dlun0_done
9370945cf7 NEX-3171 VAAI Disable not yet included in 5.0
b13d0dd247 NEX-3177 enums and string arrays required to be in sync.

NA - had all of this already from other commits
4a3250f509 NEX-3204 Panic doing FC rescan from ESXi 5.5u1 with VAAI enabled

Done - mostly would undo final work - maybe out of order
d611a48c05 NEX-3217 Panic running benchmark at ESX VM NEX-3204 Panic doing FC rescan from ESXi 5.5u1 with VAAI enabled
4ac4736b7a NEX-3259 COMSTAR has insufficient task workers

NA - bug doesn't exist in illumos - this was fix for Nexenta code
4314fc227e NEX-3313 Extraneous ASSERT in idm code needs to be removed.

Done
b6a99ee46b NEX-3414 CLONE - Port 3339 iscsi/fs:5 causes panic on initiator NEX-3419 CLONE - Run multi initiator sessions to a single target test can panic the initiator

NA - superseded by previous work for scsi unmap and ZoL trim porting
938cb42cb6 NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos

Done
336829411c NEX-3566 assertion failed: lu == dlun0 || (ilu->ilu_state != STMF_STATE_OFFLINING && ilu->ilu_state != STMF_STATE_OFFLINE), file: ../../common/io /comstar/stmf/stmf.c, line: 4063
303d96186a NEX-3672 IDM module panics target when PDU has AHS length between 17 and 49

Already have
e857c601c9 NEX-3746 Panic seen if iSCSI Initiator sends a SCSI Response packet to target NEX-3777 COMSTAR iscsi/io test can panic the target running 5.0 NEX-3785 This is on 5.0 release and iSCSI target can panic while running iSCSI mpxio disable test

Done - Missing from original list
4a7281a2ff SUP-770 deadlock between thread acquiring iss->iss_lockp in stmf_task_free() and thread holding sl->sl_pgr->pgr_lock from sbd_pgr_remove_it_handle

Done
a52e863a86 NEX-5428 Backout the 5.0 changes NEX-2937 Continuous write_same starves all other commands
ddcde59034 NEX-6018 Return of the walking dead idm_refcnt_wait_ref comstar threads
0b7246953a NEX-7048 COMSTAR MODE_SENSE support is broken

Done - Missing from original list
c19c8d51eed NEX-2178 Multi-block transfers on memory constrained systems for write_same (0x93) and compare_and_write (0x89) cause memory corruption
NEX-2105 assertion failed: (scmd->flags & SBD_SCSI_CMD_TRANS_DATA) && scmd->trans_data != NULL, file: ../../common/io/comstar/lu/stmf_sbd/sbd_scsi.c, line: 2447
7e8b29accf8 SUP-761 sbd_flush_data_cache() call against closed zvol results in NULL pointer deref in zil_commit() call further down the stack
d4b4b46ca61 SUP-761 sbd_flush_data_cache() call against closed zvol results in NULL pointer deref in zil_commit() call further down the stack
d90ff0e842f re #12981, rbYYYY Panic due to possible race between LU coming to ready state and COMSTAR (stmf) sbd task
b42436c149a NEX-988 itask_lu_[read|write]_time was inadvertently removed by the Illumos 3862 fix

Done
db8df6803b NEX-7681 System panics: Deadlock: cycle in blocking chain
8fe9288680 NEX-7907 uts/common/sys/scsi/scsi_names.h should define IQN as 223 bytes, not utf-8 characters
998f36f501 NEX-9567 Deadlock: cycle in blocking chain panic
e48b00e789f NEX-3169 STMF has duplicate code in 6 places which is error prone.
8b476ea1d5d NEX-3622 COMSTAR should have per remote port kstats for I/O and latency
6c169dfb7c NEX-9981 Deadman timer panic from idm_refcnt_wait_ref thread while offlining iSCSI targets

NA - undoes other patches which were later applied
166eec562a re #11454 rb3750 Fix inconsistent vid/pid in stmf

Done
658ce0202b re #12375 rb4141 Create ALUA Support on NexentaStor; Failover causes loss of storage

Already have
01263d2e68 re #13660 rb4549 ALUA mode: `/usr/demo/comstar/bin/aluaadm enable 1` sets "ALUA Node" = 0

Done
f8452c88da re #13796 OSX FC Initiator cannot attach to LUN if LUN id is different then other OSX
c02d661f18 re #6919, rb2433 ATS not dropping locks upon completion

NA
de73e40157 re #7550 rb2134 lint-clean nza-kernel

Done
56ee4547fa re #7936 rb3706 Support for COMSTAR/OEM re #8002 rb3706 Allow setting iSCSI vendor ID via stmf_sbd.conf
52a2703466 re #8499 rb3117 Unreleased ATS lock possibly caused by concurrent svmotions

Already have
13f77e39c1 re #9822 - libiscsi test panics COMSTAR

Done
b96fac394f re #9822 - libiscsi test panics COMSTAR

Already have
6ae9e3c876 Running stmfadm remove-hg-member caused a NULL pointer dereference panic in stmf_remove_lu_from_session

Done
ab78d621a8 Running stmfadm remove-hg-member caused a NULL pointer dereference panic in stmf_remove_lu_from_session
d148e30c09 SUP-505 ATS LUN lock under high lun load with multiple VM migrations occuring concurrently fixed panic in the abort path of a compare_and_write
ae72f1849d SUP-505 ATS LUN lock under high lun load with multiple VM migrations occuring concurrently
b9394e9aaa SUP-540 panic on page fault in stmf_task_lu_free()
4b7f693efe SUP-765 When a Windows Clustered Shared Volume is placed on a pool under Nexenta HA Cluster control the clustered shared disk looses its PGR3 reservation to the presented zvol.
270dab30c3 SUP-772 COMSTAR task waiting pressure doesn't increase worker thread pool sizing above initial value

Already have? illumos#4458
afc5741420 SUP-776 iSCSI target portals not isolating IQNs on different subnets

Done
5862d4a0f8 SUP-782 COMSTAR UNMAP support should limit number of LBAs per operation

Already have (illumos#6742)
6c434f6584 SUP-930 Freed and reused idm_conn_t buffer leads to system panic.

Done
6ecdffb3e1 Support simultaneous compare_and_write operations for VAAI
e01d5f036 VAAI (XXX ATS support for COMSTAR, YYY Block-copy support for COMSTAR)
d0cdd3cf4c OS-69 Open source VAAI

#3

Updated by Jerry Jelinek 5 months ago

For testing, I've run this code as both an initiator and target, with the initiator talking to multiple targets. This is a basic sanity test. I think the real heavy duty testing is the actual production usage by Nexenta over the years that they have built and deployed these changes.

#4

Updated by Electric Monk 5 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 61dfa5098dc8576d9a5e277deba6df647bb70c06

commit  61dfa5098dc8576d9a5e277deba6df647bb70c06
Author: Rick McNeal <rick.mcneal@nexenta.com>
Date:   2019-11-18T19:20:53.000Z

    11820 upstream Nexenta iSCSI and COMSTAR fixes
    Portions contributed by: Albert Lee <trisk@nexenta.com>
    Portions contributed by: Alex Deiter <alex.deiter@nexenta.com>
    Portions contributed by: Bayard G. Bell <bayard.bell@nexenta.com>
    Portions contributed by: Dan Fields <dan.fields@nexenta.com>
    Portions contributed by: Dan McDonald <danmcd@nexenta.com>
    Portions contributed by: Dmitry Gromada <dmitry.gromada@nexenta.com>
    Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Portions contributed by: kcrowenex <kevin.crowe@nexenta.com>
    Portions contributed by: Matt Barden <matt.barden@nexenta.com>
    Portions contributed by: Michael Tsymbalyuk <michael.tsymbalyuk@nexenta.com>
    Portions contributed by: Rob Gittins <rob.gittins@nexenta.com>
    Portions contributed by: Steve Ma <steve.ma@nexenta.com>
    Portions contributed by: Tony Nguyen <tony.nguyen@nexenta.com>
    Portions contributed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
    Reviewed by: Christopher Siden <christopher.siden@delphix.com>
    Reviewed by: Dan Fields <dan.fields@nexenta.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Reviewed by: Evan Layton <evan.layton@nexenta.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
    Reviewed by: Jason King <jason.brian.king@gmail.com>
    Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
    Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
    Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
    Reviewed by: Kody Kantor <kody.kantor@joyent.com>
    Reviewed by: Marcel Telka <marcel.telka@nexenta.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Max Grossman <max.grossman@delphix.com>
    Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
    Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
    Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
    Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
    Reviewed by: Steve Peng <steve.peng@nexenta.com>
    Reviewed by: tim Jacobson <tim.jacobson@nexenta.com>
    Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF