Bug #2248
openassertion failed: ilu->ilu_ntasks_min_free <= ilu->ilu_ntasks_free
0%
Description
My system at home panicked overnight:
$ fmdump -Vp -u bbe929b5-c7b1-c366-d952-ca71caa76cfd TIME UUID SUNW-MSG-ID Mar 05 2012 19:02:53.516595000 bbe929b5-c7b1-c366-d952-ca71caa76cfd SUNOS-8000-KL TIME CLASS ENA Mar 05 19:02:48.8222 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = bbe929b5-c7b1-c366-d952-ca71caa76cfd code = SUNOS-8000-KL diag-time = 1330992173 449387 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash/openindiana/.bbe929b5-c7b1-c366-d952-ca71caa76cfd resource = sw:///:path=/var/crash/openindiana/.bbe929b5-c7b1-c366-d952-ca71caa76cfd savecore-succcess = 0 os-instance-uuid = bbe929b5-c7b1-c366-d952-ca71caa76cfd panicstr = assertion failed: ilu->ilu_ntasks_min_free <= ilu->ilu_ntasks_free, file: ../../common/io/comstar/stmf/stmf.c, line: 4587 panicstack = genunix:process_type+18a18b () | stmf:stmf_task_lu_check_freelist+13e () | stmf:stmf_check_freetask+7f () | stmf:stmf_svc_timeout+24e () | stmf:stmf_svc+af () | genunix:taskq_thread+32f () | unix:thread_start+8 () | crashtime = 1330992076 panic-time = March 5, 2012 07:01:16 PM EST EST (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x4f55542d 0x1eca9d38
It's running as a VMware ESXi guest, and serving iSCSI to another guest on the same machine. This configuration has worked just fine for a few months. I have ECC memory installed, and I don't see any errors related to it in the host logs, so I don't think it's just a bitflip.
Here are the results from savecore:
$ sudo savecore -vf vmdump.0 savecore: incomplete dump on dump device savecore: System dump time: Mon Mar 5 19:01:16 2012 savecore: saving system crash dump in /var/crash/openindiana/{unix,vmcore}.0 Constructing namelist /var/crash/openindiana/unix.0 Constructing corefile /var/crash/openindiana/vmcore.0 pfn 4065792 not found for as=fffffffffbc3f160, va=ffffff0000000000 pfn 4065793 not found for as=fffffffffbc3f160, va=ffffff0000001000 pfn 4065794 not found for as=fffffffffbc3f160, va=ffffff0000002000 pfn 4065795 not found for as=fffffffffbc3f160, va=ffffff0000003000 pfn 4065796 not found for as=fffffffffbc3f160, va=ffffff0000004000 pfn 4065797 not found for as=fffffffffbc3f160, va=ffffff0000005000 pfn 4065798 not found for as=fffffffffbc3f160, va=ffffff0000006000 pfn 4065799 not found for as=fffffffffbc3f160, va=ffffff0000007000 pfn 4065800 not found for as=fffffffffbc3f160, va=ffffff0000008000 pfn 4065801 not found for as=fffffffffbc3f160, va=ffffff0000009000 2:13 99% donesavecore: stream tag 3247 not in range 1..5 savecore: bad summary magic dd2b44d0
That doesn't look particularly happy. This is very likely because the system has 16GB of memory, and only 16GB of root filesystem, leaving not enough room to write a full dump. I'll work on reconfiguring things to leave enough room for a complete dump in case this should recur.
The resultant core files are available at http://twoevils.net/.cores/ (total size about 2GB). The mdb command listed on "how to report bugs" doesn't work against them, but I'd bet someone who knows what they're doing can still get some use from them.
Updated by Will Murnane over 11 years ago
I should mention I'm running a custom-built kernel. Here's the tree I built it from:
$ hg id -i
ab0e9bcbdf3e
It's built with debugging symbols.
Updated by Rich Lowe over 11 years ago
- Category set to comstar - iSCSI/FC/SAS target
- Tags deleted (
needs-triage)
Possibly related to the fix for #2083?
Updated by Will Murnane over 11 years ago
This problem has recurred several times. I have uploaded another core file to the same location as before; look at the *.1 files. I have not had a chance to alter the dump zvol, so they are still incomplete and giving bad magic numbers.
Updated by Dan McDonald over 11 years ago
Will Murnane wrote:
I should mention I'm running a custom-built kernel. Here's the tree I built it from:
$ hg id -i
ab0e9bcbdf3e
It's built with debugging symbols.
Did you make any changes?
Updated by Dan McDonald over 11 years ago
Will Murnane wrote:
This problem has recurred several times. I have uploaded another core file to the same location as before; look at the *.1 files. I have not had a chance to alter the dump zvol, so they are still incomplete and giving bad magic numbers.
I downloaded vmdump.1 and tried expanding it myself, to the same ill result.
Can you share /var/adm/messages? It's not much, but it may help (especially once we know what else was modified in your kernel build).
Updated by Will Murnane over 11 years ago
No, I didn't make any changes.
The only interesting thing in messages:Mar 19 14:36:59 thor drmach_acpi: [ID 779328 kern.warning] WARNING: drmach_init: number of logical CPUs (6) in physical processor is not power of 2.
The whole log of the crash and subsequent reboot:
Mar 18 13:53:22 thor unix: [ID 836849 kern.notice]
Mar 18 13:53:22 thor ^Mpanic[cpu0]/thread=ffffff001f9a7c40:
Mar 18 13:53:22 thor genunix: [ID 403854 kern.notice] assertion failed: ilu->ilu_ntasks_min_free <= ilu->ilu_ntasks_free, file: ../../common/io/comstar/stmf/stmf.c, line: 4587
Mar 18 13:53:22 thor unix: [ID 100000 kern.notice]
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7ab0 genunix:process_type+18a18b ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b00 stmf:stmf_task_lu_check_freelist+13e ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b20 stmf:stmf_check_freetask+7f ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b60 stmf:stmf_svc_timeout+24e ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7b90 stmf:stmf_svc+af ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7c20 genunix:taskq_thread+32f ()
Mar 18 13:53:22 thor genunix: [ID 655072 kern.notice] ffffff001f9a7c30 unix:thread_start+8 ()
Mar 18 13:53:22 thor unix: [ID 100000 kern.notice]
Mar 18 13:53:22 thor genunix: [ID 672855 kern.notice] syncing file systems...
Mar 18 13:53:22 thor genunix: [ID 904073 kern.notice] done
Mar 18 13:53:23 thor genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Mar 18 13:53:56 thor genunix: [ID 100000 kern.notice]
Mar 18 13:53:56 thor genunix: [ID 665016 kern.notice] ^M 9% done: 318508 pages dumped,
Mar 18 13:53:56 thor genunix: [ID 495082 kern.notice] dump failed: error 28
Mar 19 14:36:51 thor genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version illumos-gate 64-bit
Mar 19 14:36:51 thor genunix: [ID 877030 kern.notice] Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
Updated by Will Murnane over 11 years ago
I've set the size of the dump volume to 4G to see if I can get a complete dump. mdb -k shows I'm currently using something like 3.3GB as kernel memory. If that doesn't suffice I'll break out the big guns and put another disk in the machine.
Updated by Will Murnane over 11 years ago
It crashed again on Sunday night. 4G is still not enough, and there's not enough room on the VMDK to give it more space, so I'll work on moving everything over to a new datastore where I can give it a lot more space and see how that goes.