Project

General

Profile

Actions

Bug #2248

open

assertion failed: ilu->ilu_ntasks_min_free <= ilu->ilu_ntasks_free

Added by Will Murnane over 11 years ago. Updated over 11 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
comstar - iSCSI/FC/SAS target
Start date:
2012-03-06
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

My system at home panicked overnight:

$ fmdump -Vp -u bbe929b5-c7b1-c366-d952-ca71caa76cfd
TIME                           UUID                                 SUNW-MSG-ID
Mar 05 2012 19:02:53.516595000 bbe929b5-c7b1-c366-d952-ca71caa76cfd SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Mar 05 19:02:48.8222 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = bbe929b5-c7b1-c366-d952-ca71caa76cfd
        code = SUNOS-8000-KL
        diag-time = 1330992173 449387
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru = sw:///:path=/var/crash/openindiana/.bbe929b5-c7b1-c366-d952-ca71caa76cfd
                resource = sw:///:path=/var/crash/openindiana/.bbe929b5-c7b1-c366-d952-ca71caa76cfd
                savecore-succcess = 0
                os-instance-uuid = bbe929b5-c7b1-c366-d952-ca71caa76cfd
                panicstr = assertion failed: ilu->ilu_ntasks_min_free <= ilu->ilu_ntasks_free, file: ../../common/io/comstar/stmf/stmf.c, line: 4587
                panicstack = genunix:process_type+18a18b () | stmf:stmf_task_lu_check_freelist+13e () | stmf:stmf_check_freetask+7f () | stmf:stmf_svc_timeout+24e () | stmf:stmf_svc+af () | genunix:taskq_thread+32f () | unix:thread_start+8 () |
                crashtime = 1330992076
                panic-time = March  5, 2012 07:01:16 PM EST EST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x4f55542d 0x1eca9d38

It's running as a VMware ESXi guest, and serving iSCSI to another guest on the same machine. This configuration has worked just fine for a few months. I have ECC memory installed, and I don't see any errors related to it in the host logs, so I don't think it's just a bitflip.

Here are the results from savecore:

$ sudo savecore -vf vmdump.0
savecore: incomplete dump on dump device
savecore: System dump time: Mon Mar  5 19:01:16 2012

savecore: saving system crash dump in /var/crash/openindiana/{unix,vmcore}.0
Constructing namelist /var/crash/openindiana/unix.0
Constructing corefile /var/crash/openindiana/vmcore.0
pfn 4065792 not found for as=fffffffffbc3f160, va=ffffff0000000000
pfn 4065793 not found for as=fffffffffbc3f160, va=ffffff0000001000
pfn 4065794 not found for as=fffffffffbc3f160, va=ffffff0000002000
pfn 4065795 not found for as=fffffffffbc3f160, va=ffffff0000003000
pfn 4065796 not found for as=fffffffffbc3f160, va=ffffff0000004000
pfn 4065797 not found for as=fffffffffbc3f160, va=ffffff0000005000
pfn 4065798 not found for as=fffffffffbc3f160, va=ffffff0000006000
pfn 4065799 not found for as=fffffffffbc3f160, va=ffffff0000007000
pfn 4065800 not found for as=fffffffffbc3f160, va=ffffff0000008000
pfn 4065801 not found for as=fffffffffbc3f160, va=ffffff0000009000
 2:13  99% donesavecore: stream tag 3247 not in range 1..5
savecore: bad summary magic dd2b44d0

That doesn't look particularly happy. This is very likely because the system has 16GB of memory, and only 16GB of root filesystem, leaving not enough room to write a full dump. I'll work on reconfiguring things to leave enough room for a complete dump in case this should recur.

The resultant core files are available at http://twoevils.net/.cores/ (total size about 2GB). The mdb command listed on "how to report bugs" doesn't work against them, but I'd bet someone who knows what they're doing can still get some use from them.

Actions #1

Updated by Will Murnane over 11 years ago

I should mention I'm running a custom-built kernel. Here's the tree I built it from:
$ hg id -i
ab0e9bcbdf3e
It's built with debugging symbols.

Actions #2

Updated by Rich Lowe over 11 years ago

  • Category set to comstar - iSCSI/FC/SAS target
  • Tags deleted (needs-triage)

Possibly related to the fix for #2083?

Actions #3

Updated by Will Murnane over 11 years ago

This problem has recurred several times. I have uploaded another core file to the same location as before; look at the *.1 files. I have not had a chance to alter the dump zvol, so they are still incomplete and giving bad magic numbers.

Actions #4

Updated by Dan McDonald over 11 years ago

Will Murnane wrote:

I should mention I'm running a custom-built kernel. Here's the tree I built it from:
$ hg id -i
ab0e9bcbdf3e
It's built with debugging symbols.

Did you make any changes?

Actions #5

Updated by Dan McDonald over 11 years ago

Will Murnane wrote:

This problem has recurred several times. I have uploaded another core file to the same location as before; look at the *.1 files. I have not had a chance to alter the dump zvol, so they are still incomplete and giving bad magic numbers.

I downloaded vmdump.1 and tried expanding it myself, to the same ill result.

Can you share /var/adm/messages? It's not much, but it may help (especially once we know what else was modified in your kernel build).

Actions #6

Updated by Will Murnane over 11 years ago

No, I didn't make any changes.

The only interesting thing in messages:
Mar 19 14:36:59 thor drmach_acpi: [ID 779328 kern.warning] WARNING: drmach_init: number of logical CPUs (6) in physical processor is not power of 2.

The whole log of the crash and subsequent reboot:


Mar 18 13:53:22 thor unix: [ID 836849 kern.notice]
Mar 18 13:53:22 thor ^Mpanic[cpu0]/thread=ffffff001f9a7c40:
Mar 18 13:53:22 thor genunix: [ID 403854 kern.notice] assertion failed: ilu->ilu_ntasks_min_free <= ilu->ilu_ntasks_free, file: ../../common/io/comstar/stmf/stmf.c, line: 4587
Mar 18 13:53:22 thor unix: [ID 100000 kern.notice]
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7ab0 genunix:process_type+18a18b ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b00 stmf:stmf_task_lu_check_freelist+13e ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b20 stmf:stmf_check_freetask+7f ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b60 stmf:stmf_svc_timeout+24e ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b90 stmf:stmf_svc+af ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7c20 genunix:taskq_thread+32f ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7c30 unix:thread_start+8 ()
Mar 18 13:53:22 thor unix: [ID 100000 kern.notice]
Mar 18 13:53:22 thor genunix: [ID 672855 kern.notice] syncing file systems...
Mar 18 13:53:22 thor genunix: [ID 904073 kern.notice] done
Mar 18 13:53:23 thor genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Mar 18 13:53:56 thor genunix: [ID 100000 kern.notice]
Mar 18 13:53:56 thor genunix: [ID 665016 kern.notice] ^M 9% done: 318508 pages dumped,
Mar 18 13:53:56 thor genunix: [ID 495082 kern.notice] dump failed: error 28
Mar 19 14:36:51 thor genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version illumos-gate 64-bit
Mar 19 14:36:51 thor genunix: [ID 877030 kern.notice] Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.

Actions #7

Updated by Will Murnane over 11 years ago

I've set the size of the dump volume to 4G to see if I can get a complete dump. mdb -k shows I'm currently using something like 3.3GB as kernel memory. If that doesn't suffice I'll break out the big guns and put another disk in the machine.

Actions #8

Updated by Will Murnane over 11 years ago

It crashed again on Sunday night. 4G is still not enough, and there's not enough room on the VMDK to give it more space, so I'll work on moving everything over to a new datastore where I can give it a lot more space and see how that goes.

Actions

Also available in: Atom PDF