Project

General

Profile

Bug #4187

Threads stuck in zio_wait() in CHECKSUM_VERIFY stage

Added by Marcel Telka about 7 years ago. Updated about 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2013-10-07
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Many zfs threads are stuck in zio_wait() for long time (2 minutes or more) and the system is completely hung:

# echo "::stacks -c zio_wait|::findstack -v" \\
> | mdb 1 | grep zio_wait \\
> | sed -e 's/.*0x5b(\\(.*\\))/\\1::zio/' \\
> | mdb 1 | grep -v ADDRESS
mdb: warning: dump is from SunOS 5.11 illumos-a81df0a; dcmds and macros may not match kernel implementation
mdb: warning: dump is from SunOS 5.11 illumos-a81df0a; dcmds and macros may not match kernel implementation
ffffff05563e8e90        NULL  CHECKSUM_VERIFY  ffffff04de867740 -            
ffffffc01ff191a8        NULL  CHECKSUM_VERIFY  ffffff04de9de8a0 -            
ffffffc021309290        NULL  CHECKSUM_VERIFY  ffffff04eab0b7e0 -            
ffffff0556092b48        NULL  CHECKSUM_VERIFY  ffffff04eab0f780 -            
ffffffe25d0fd100        NULL  CHECKSUM_VERIFY  ffffff04de8bab00 -            
ffffffc020b160d0        NULL  CHECKSUM_VERIFY  ffffff04eab080e0 -            
ffffff0528f63b68        NULL  CHECKSUM_VERIFY  ffffffbffcbf7440 -            
ffffffe25d01b428        NULL  CHECKSUM_VERIFY  ffffff001e471c40 -            
ffffffe25d16dd90        NULL  CHECKSUM_VERIFY  ffffff001ee00c40 -            
ffffffe25ed09a18        NULL  CHECKSUM_VERIFY  ffffff05050cfc40 -            
ffffffc0202b17d8        NULL  CHECKSUM_VERIFY  ffffff04de51c100 -            
ffffff052b60eaa8        NULL  CHECKSUM_VERIFY  ffffff04ea9d18a0 -            
ffffff0556a2b0e8        NULL  CHECKSUM_VERIFY  ffffff00211dcc40 -            
ffffffc0217cead0        NULL  CHECKSUM_VERIFY  ffffff04de51c840 -            
ffffff05566a2830        NULL  CHECKSUM_VERIFY  ffffff04ea9a4bc0 -            
ffffff051ac11108        NULL  CHECKSUM_VERIFY  ffffff04de7f1480 -            
#

This happened on my desktop with OpenIndiana hipster installed (with illumos from September 1st).

The crash dump size is 1.3GB. I'll provide it on request.

#1

Updated by Marcel Telka about 7 years ago

There is one thread in the sata module (and no thread in the ahci module):

> ::stacks -m ahci
> ::stacks -m sata
THREAD           STATE    SOBJ                COUNT
ffffff001e3b5c40 RUN      <NONE>                  1
                 swtch+0x141
                 cv_timedwait_hires+0xec
                 cv_reltimedwait+0x51
                 sata_event_daemon+0xff
                 thread_start+8

> ffffff001e3b5c40::thread -d
            ADDR         DISPTIME BOUND PR SWITCH
ffffff001e3b5c40          c01f71d    -1  0 t-0   
> ::cpuinfo -v
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc30d20  1f    2    0 109   no    no t-0    ffffff001e2cbc40 sched
                       |    |    |
            RUNNING <--+    |    +--> PIL THREAD
              READY         |          10 ffffff001e2cbc40
           QUIESCED         |           9 ffffff001e2d1c40
             EXISTS         |           - ffffff001e205c40 (idle)
             ENABLE         |
                            +-->  PRI THREAD           PROC
                                   60 ffffff00202adc40 sched
                                    0 ffffff0502ae5160 VBoxHeadless

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  1 ffffff04dd06f580  1f    5    0 101   no    no t-0    ffffff001e5b5c40 sched
                       |    |    |
            RUNNING <--+    |    +--> PIL THREAD
              READY         |           2 ffffff001e5b5c40
           QUIESCED         |           - ffffff001e57fc40 (idle)
             EXISTS         |
             ENABLE         +-->  PRI THREAD           PROC
                                   60 ffffff001e3b5c40 sched
                                   39 ffffff0502b47780 VBoxHeadless
                                    0 ffffff05072f4c20 VBoxHeadless
                                    0 ffffff051852ebc0 VBoxHeadless
                                    0 ffffff051ca41520 VBoxHeadless

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  2 fffffffffbc3b620  1b    3    0 104   no    no t-1    ffffff001e66cc40 sched
                       |    |    |
            RUNNING <--+    |    +--> PIL THREAD
              READY         |           5 ffffff001e66cc40
             EXISTS         |           - ffffff001e63cc40 (idle)
             ENABLE         |
                            +-->  PRI THREAD           PROC
                                   38 ffffff04ea9d1160 clock-applet
                                    0 ffffff052b393880 VBoxHeadless
                                    0 ffffff04fcd37860 VBoxHeadless

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  3 ffffff04dd2cda80  1f    0    0 101   no    no t-3    ffffff001e9eac40 sched
                       |         |
            RUNNING <--+         +--> PIL THREAD
              READY                     2 ffffff001e9eac40
           QUIESCED                     - ffffff0508002080 VBoxHeadless
             EXISTS         
             ENABLE         

>

Also available in: Atom PDF