Bug #593
opencmlb causes panic in xdf
0%
Description
After replacing the second PV disk (xvdb) or adding additional disks to system running as a Xen domU, the system panics when I/O to the second disk is attempted (reading the fdisk table).
[2]> $C fffffffffbc9f6f0 kmdb_enter+0xb() fffffffffbc9f710 debug_enter+0x38(fffffffffbc10a88) fffffffffbc9f7e0 panicsys+0x497(fffffffffb9598f8, ffffff00068014a0, fffffffffbc9f7f0, 1) ffffff00068013e0 vpanic+0x15c() ffffff00068014d0 panic+0x94() ffffff0006801550 die+0x10f(e, ffffff0006801670, ffffff098c43effc, 2) ffffff0006801660 trap+0x1791(ffffff0006801670, ffffff098c43effc, 2) ffffff0006801670 0xfffffffffb80020b() ffffff0006801790 gnttab_grant_foreign_access_ref+0x27(ffffffff, 0, a9560, 0) ffffff00068017d0 xdf`gs_grant+0x5b(ffffff018cae78a0, a9560) ffffff0006801850 xdf`xdf_process_rreq+0x1d6(ffffff01940b8000, ffffff018fa45240, ffffff0194daa120) ffffff00068018a0 xdf`xdf_io_start+0xa8(ffffff01940b8000) ffffff0006801920 xdf`xdf_lb_rdwr+0xeb(ffffff018fe990c8, 0, 0, 0, 0, 0) ffffff0006801a50 cmlb`cmlb_read_fdisk+0x8e(ffffff01946ab000, 0, 0) ffffff0006801aa0 cmlb`cmlb_dkio_set_ext_part+0x48(ffffff01946ab000, 8047b84, 100005, 0) ffffff0006801b20 cmlb`cmlb_ioctl+0x163(ffffff01946ab000, c100000050, 42e, 8047b84, 100005, ffffff0195414f18, ffffff0006801d54, 0) ffffff0006801c30 xdf`xdf_ioctl+0x41c(c100000050, 42e, 8047b84, 100005, ffffff0195414f18, ffffff0006801d54) ffffff0006801c70 cdev_ioctl+0x45(c100000050, 42e, 8047b84, 100005, ffffff0195414f18, ffffff0006801d54) ffffff0006801cb0 specfs`spec_ioctl+0x5a(ffffff019791dd00, 42e, 8047b84, 100005, ffffff0195414f18, ffffff0006801d54, 0) ffffff0006801d30 fop_ioctl+0x7b(ffffff019791dd00, 42e, 8047b84, 100005, ffffff0195414f18, ffffff0006801d54, 0) ffffff0006801e30 ioctl+0x18e(5, 42e, 8047b84) ffffff0006801ec0 dtrace_systrace_syscall32+0x11a(5, 42e, 8047b84, 1c3, 1, 0) ffffff0006801f10 sys_syscall32+0x13e() [2]> ffffff018cae78a0::print ge_slot_t { gs_vreq_link = { list_next = 0xffffff018f525f08 list_prev = 0xffffff018f525f08 } gs_vreq = 0xffffff018f525ee8 gs_oeid = 0 gs_isread = 0x1 gs_ghead = 0xffffffff gs_ngrefs = 0xc gs_ge = [ 0x31, 0x193, 0x192, 0x19a, 0x18f, 0x1a2, 0x6c, 0x190, 0x19e, 0x19c , 0x19d ] } 263 264 static grant_ref_t 265 gs_grant(ge_slot_t *gs, mfn_t mfn) 266 { 267 grant_ref_t gr = gnttab_claim_grant_reference(&gs->gs_ghead); 268 269 ASSERT(gr != -1); 270 ASSERT(gs->gs_ngrefs < BLKIF_MAX_SEGMENTS_PER_REQUEST); 271 gs->gs_ge[gs->gs_ngrefs++] = gr; 272 gnttab_grant_foreign_access_ref(gr, gs->gs_oeid, mfn, !gs->gs_isread); 273 274 return (gr); 275 } 276 [2]> gs_grant::dis xdf`gs_grant: pushq %rbp xdf`gs_grant+1: movq %rsp,%rbp xdf`gs_grant+4: subq $0x10,%rsp xdf`gs_grant+8: movq %rdi,-0x8(%rbp) xdf`gs_grant+0xc: movq %rsi,-0x10(%rbp) xdf`gs_grant+0x10: pushq %rbx xdf`gs_grant+0x11: pushq %r12 xdf`gs_grant+0x13: pushq %r13 xdf`gs_grant+0x15: subq $0x8,%rsp xdf`gs_grant+0x19: movq %rsi,%r13 xdf`gs_grant+0x1c: movq %rdi,%r12 xdf`gs_grant+0x1f: addq $0x20,%rdi xdf`gs_grant+0x23: call +0x3df5c50 <gnttab_claim_grant_refe rence> xdf`gs_grant+0x28: movl %eax,%ebx xdf`gs_grant+0x2a: movl 0x24(%r12),%ecx xdf`gs_grant+0x2f: leal 0x1(%rcx),%eax xdf`gs_grant+0x32: movl %eax,0x24(%r12) xdf`gs_grant+0x37: movslq %ecx,%r8 xdf`gs_grant+0x3a: movl %ebx,0x28(%r12,%r8,4) xdf`gs_grant+0x3f: cmpl $0x0,0x1c(%r12) xdf`gs_grant+0x45: sete %al xdf`gs_grant+0x48: movzbl %al,%ecx xdf`gs_grant+0x4b: movzwl 0x18(%r12),%esi xdf`gs_grant+0x51: movl %ebx,%edi xdf`gs_grant+0x53: movq %r13,%rdx xdf`gs_grant+0x56: call +0x3df578d <gnttab_grant_foreign_access_ref> xdf`gs_grant+0x5b: movl %ebx,%eax xdf`gs_grant+0x5d: addq $0x8,%rsp xdf`gs_grant+0x61: popq %r13 xdf`gs_grant+0x63: popq %r12 xdf`gs_grant+0x65: popq %rbx xdf`gs_grant+0x66: leave xdf`gs_grant+0x67: ret
The condition of gs would've triggered both asserts if they were present.
Related issues
Updated by Albert Lee over 12 years ago
More progress. ddi_dma_addr_bind_handle
is returning successfully with zero cookies, somehow.
panic[cpu2]/thread=ffffff018f98ac60: assertion failed: ndcs > 0, file: ../../common/xen/io/xdf.c, line: 535 ffffff0005e72760 genunix:assfail+7e () ffffff0005e72850 xdf:vreq_setup+6d6 () ffffff0005e728a0 xdf:xdf_io_start+71 () ffffff0005e72920 xdf:xdf_lb_rdwr+100 () ffffff0005e72a50 cmlb:cmlb_read_fdisk+8e () ffffff0005e72aa0 cmlb:cmlb_dkio_set_ext_part+48 () ffffff0005e72b20 cmlb:cmlb_ioctl+163 () ffffff0005e72c30 xdf:xdf_ioctl+42c () ffffff0005e72c70 genunix:cdev_ioctl+45 () ffffff0005e72cb0 specfs:spec_ioctl+5a () ffffff0005e72d30 genunix:fop_ioctl+7b () ffffff0005e72e30 genunix:ioctl+18e () ffffff0005e72ec0 genunix:dtrace_systrace_syscall32+11a () ffffff0005e72f10 unix:brand_sys_syscall32+1ca () panic: entering debugger (continue to save dump) Welcome to kmdb kmdb: unable to determine terminal type: assuming `vt100' Loaded modules: [ scsi_vhci crypto mac neti ptm xpv_psm ufs unix sv zfs krtld s1394 rdc uhci ii hook lofs genunix idm ip nsctl logindmux nsmb sdbc usba specfs nfs random sctp arp stmf sockfs xpv_uppc smbsrv ] [2]> $C fffffffffbc9f6f0 kmdb_enter+0xb() fffffffffbc9f710 debug_enter+0x38(fffffffffbc10a88) fffffffffbc9f7e0 panicsys+0x497(fffffffffbbcfa00, ffffff0005e726f0, fffffffffbc9f7f0, 1) ffffff0005e72630 vpanic+0x15c() ffffff0005e72720 panic+0x94() ffffff0005e72760 assfail+0x7e(fffffffff7a61128, fffffffff7a61108, 217) ffffff0005e72850 xdf`vreq_setup+0x6d6(ffffff01908d9500, ffffff018cae8b50) ffffff0005e728a0 xdf`xdf_io_start+0x71(ffffff01908d9500) ffffff0005e72920 xdf`xdf_lb_rdwr+0x100(ffffff018c6acd20, 0, 0, 0, 0, 0) ffffff0005e72a50 cmlb`cmlb_read_fdisk+0x8e(ffffff0194d72000, 0, 0) ffffff0005e72aa0 cmlb`cmlb_dkio_set_ext_part+0x48(ffffff0194d72000, 8047b84, 100005, 0) ffffff0005e72b20 cmlb`cmlb_ioctl+0x163(ffffff0194d72000, c100000090, 42e, 8047b84, 100005, ffffff018c5ae008, ffffff0005e72d54, 0) ffffff0005e72c30 xdf`xdf_ioctl+0x42c(c100000090, 42e, 8047b84, 100005, ffffff018c5ae008, ffffff0005e72d54) ffffff0005e72c70 cdev_ioctl+0x45(c100000090, 42e, 8047b84, 100005, ffffff018c5ae008, ffffff0005e72d54) ffffff0005e72cb0 specfs`spec_ioctl+0x5a(ffffff0194d31c80, 42e, 8047b84, 100005, ffffff018c5ae008, ffffff0005e72d54, 0) ffffff0005e72d30 fop_ioctl+0x7b(ffffff0194d31c80, 42e, 8047b84, 100005, ffffff018c5ae008, ffffff0005e72d54, 0) [2]> ffffff018cae8b50::print v_req_t { v_link = { list_size = 0xffffff01908d9540 list_offset = 0xffffff01908d9540 list_head = { list_next = 0x58 list_prev = 0 } } v_gs = { pg = 0x58 position = 0 } v_status = 0x2 v_buf = { ps_buflen = 0x97d08a80 ps_channel_name_len = 0xffffff01 ps_id = 0 ps_op = 0 ps_type = 0 } v_ndmacs = 0 v_dmaw = 0 v_ndmaws = 0 v_nslots = 0 v_blkno = 0 v_memdmahdl = 0 v_align = 0 (LGRP_LAT_CPU_TO_MEM) v_dmahdl = 0xffffff019426b0c0 v_dmac = 0xfffffffffffff000 v_abuf = { _q = <FLOAT> _l = [ 0, 0, 0, 0 ] } v_flush_diskcache = { tv_sec = 0 tv_nsec = 0 } v_runq = [ 0, 0, 0x5800000000, 0, 0, 0, 0x700000000, 0x9485690000000000, 0x1ffffff01, 0x100000000, 0xbf2d1300000000, 0, 0, 0x9426b0c000000000, 0x2edd2000ffffff01, 0x40000000000, 0, 0xbaddcafe, 0, 0, 0, 0x5800000000, 0, 0, 0 , 0x700000000, 0x97d0888000000000, 0x1ffffff01 ] } [2]>
Updated by Albert Lee over 12 years ago
[0]> $C fffffffffbc9f6f0 kmdb_enter+0xb() fffffffffbc9f710 debug_enter+0x38(fffffffffbc10a88) fffffffffbc9f7e0 panicsys+0x497(fffffffffb959928, ffffff000707c650, fffffffffbc9f7f0, 1) ffffff000707c590 vpanic+0x15c() ffffff000707c680 panic+0x94() ffffff000707c710 die+0xdd(e, ffffff000707c830, 1be, 0) ffffff000707c820 trap+0x1791(ffffff000707c830, 1be, 0) ffffff000707c830 0xfffffffffb80020b() ffffff000707ca50 bcopy+0x178() ffffff000707caa0 cmlb`cmlb_dkio_set_ext_part+0x48(ffffff019020e000, 8047b84, 100005, 0) ffffff000707cb20 cmlb`cmlb_ioctl+0x163(ffffff019020e000, c100000110, 42e, 8047b84, 100005, ffffff018c5ae208, ffffff000707cd54, 0) ffffff000707cc30 xdf`xdf_ioctl+0x42c(c100000110, 42e, 8047b84, 100005, ffffff018c5ae208, ffffff000707cd54) ffffff000707cc70 cdev_ioctl+0x45(c100000110, 42e, 8047b84, 100005, ffffff018c5ae208, ffffff000707cd54) ffffff000707ccb0 specfs`spec_ioctl+0x5a(ffffff0195bc5d00, 42e, 8047b84, 100005, ffffff018c5ae208, ffffff000707cd54, 0) ffffff000707cd30 fop_ioctl+0x7b(ffffff0195bc5d00, 42e, 8047b84, 100005, ffffff018c5ae208, ffffff000707cd54, 0) ffffff000707ce30 ioctl+0x18e(5, 42e, 8047b84) [0]> ffffff019020e000::print struct cmlb_lun { cl_devi = 0xffffff018fd72350 cl_vtoc = { v_bootinfo = [ 0, 0, 0 ] v_sanity = 0 v_version = 0 v_volume = [ '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0' ] v_sectorsz = 0 v_nparts = 0 v_reserved = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] v_part = [ ... ] timestamp = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] v_asciilabel = [ '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', ' \0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0 ', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', ... ] } cl_g = { dkg_ncyl = 0 dkg_acyl = 0 dkg_bcyl = 0 dkg_nhead = 0 dkg_obs1 = 0 dkg_nsect = 0 dkg_intrlv = 0 dkg_obs2 = 0 dkg_obs3 = 0 dkg_apc = 0 dkg_rpm = 0 dkg_pcyl = 0 dkg_write_reinstruct = 0 dkg_read_reinstruct = 0 dkg_extra = [ 0, 0, 0, 0, 0, 0, 0 ] } cl_blockcount = 0 cl_tgt_blocksize = 0 cl_solaris_size = 0 cl_solaris_offset = 0 cl_map = [ ... ... ] cl_offset = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ] cl_fmap = [ ... ... ] cl_asciilabel = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... ] cl_lgeom = { g_ncyl = 0 g_acyl = 0 g_nhead = 0 g_nsect = 0 g_secsize = 0 g_capacity = 0 g_intrlv = 0 g_rpm = 0 } cl_pgeom = { g_ncyl = 0 g_acyl = 0 g_nhead = 0 g_nsect = 0 g_secsize = 0 g_capacity = 0 g_intrlv = 0 g_rpm = 0 } cl_dkg_skew = 0 cl_def_labeltype = 0x1 (CMLB_LABEL_VTOC) cl_last_labeltype = 0x1 (CMLB_LABEL_VTOC) cl_cur_labeltype = 0 (0) cl_label_from_media = 0 (0) cl_state = 0x1 (CMLB_ATTACHED) cl_f_geometry_is_valid = 0 (0) cl_sys_blocksize = 0x200 cl_mutex = { _opaque = [ 0xffffff0195c263a0 ] } cl_is_removable = 0 (0) cl_is_hotpluggable = 0x1 (B_TRUE) cl_alter_behavior = 0x8 cl_node_type = 0xfffffffffbe9a9d0 "ddi_block:xvmd" cl_device_type = 0 cl_reserved = 0xffffffff cmlb_tg_ops = xdf`xdf_lb_ops cl_update_ext_minor_nodes = 0 cl_logical_drive_count = 0 cl_msglog_flag = 0x1 } [0]>
So the root cause is actually in cmlb, which doesn't ensure the struct cmlb_lun
cl_tgt_blocksize
and cl_blockcount
members are set before handling ioctls, which results in zero-length I/O requests. The structure's locking scheme is also extremely inconsistent.
Updated by Albert Lee over 12 years ago
http://src.illumos.org/source/xref/illumos-opengrok/usr/src/uts/common/xen/io/xdf.c#1695 shows that xdf_lb_rdwr
supports zero-length for cache flush operations.
Updated by Albert Lee over 12 years ago
- Subject changed from Panic in xdf (Xen PV disk) I/O to cmlb causes panic in xdf
Updated by Garrett D'Amore over 12 years ago
This looks like a problem in the xdf driver, not in cmlb.
Updated by Albert Lee over 12 years ago
cmlb itself clearly has a problem - it's not ensuring initialisation of certain structures before handling ioctls (and also get_prop). xdf probably should handle zero-side I/O though.
Updated by Garrett D'Amore over 12 years ago
Actually, I think the problem is in xdf.
cmlb initializes some structures by calling the "tg" operations, and I think its likely based on the call stack I see in the code that xdf is attaching cmlb before it has assigned reasonable values to its own underlying parameters including sector size and other geometry details.
The solution probably is to move cmlb_attach() in xdf to a point after critical geometry has been established.
Updated by Pavan Chandrashekar over 12 years ago
I had seen a similar issue while I was working on the extended partition project.
Check out the following on bugs.opensolaris.org
Bug ID: 6840756
Synopsis: Incorrect handling of virtualized disk state in hvm
I am not sure how much more information is available from the opensolaris site, but the bug description might be of some help in figuring out if what you are seeing in another manifestation of the same problem.