Project

General

Profile

Bug #1414

libdiskmgmt does not clean up invalid paths

Added by George Wilson about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
2011-08-26
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

The installer coredumps when it probes the disks on the hpux box. Here's what the corefile reveals:

$c

libc.so.1`strlen+0x30(fef664fc)
libc.so.1`fprintf+0xa8(fef34be0, fef664fc, 302c64, 814fbf8)
libdiskmgt.so.1`remove_controller+0xb4(8149b98, 84d9310, 2b97859, fef614e2)
libdiskmgt.so.1`remove_invalid_controller+0x133(8376120, 84d9310, fd7ae4b0,
$c
libc.so.1`strlen+0x30(9, 20, 1, fe8a91fa)
libnvpair.so.1`nvlist_add_common+0x60(85039c8, fdb96044, 9, 1, 20, 3)
libnvpair.so.1`nvlist_add_string+0x18(85039c8, fdb96044, 20, fdb8a321)
libdiskmgt.so.1`controller_get_attributes+0x55(80a6358, fd7de4f8, fd7de4d8,
fdb85bb7)
libdiskmgt.so.1`dm_get_attributes+0xcf(80a6358)
libtd.so.1`ddm_drive_set_ctype+0x7f(80a6100, 0)
libtd.so.1`ddm_get_disk_attributes+0x3f9(80a6100, 0, fd7de598, fdbe5ed1)
libtd.so.1`td_attributes_get+0x53(0, fdc58a5c, fd7de5d8, fdc33c13)
tgt.so`TgtDisk_enumerate+0x64(9, fd7de63c, fd7de658, fdc33741)
tgt.so`discover_target_data+0x4e(0, 0, fd7de6a8, fed58cc5)
libpython2.6.so.1.0`call_function+0x311(fd7de71c, 0, 14397506, 0)
libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb(84fcbdc, 0, 829135c, 0)
libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb(82a7530, 829135c, 0, 825d478, 2,
80a1620)
libpython2.6.so.1.0`function_call+0x13d(82cbca4, 825d46c, 83fc934, fecc2641)
libpython2.6.so.1.0`PyObject_Call+0x67(82cbca4, 825d46c, 83fc934, fd7de930)
libpython2.6.so.1.0`ext_do_call+0x156(82cbca4, fd7de930, 3, 0, 0, 1f)
libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2d88(84fca6c, 0, 817302c, 0)
libpython2.6.so.1.0`fast_function+0x10b(81869cc, fd7dea6c, 1, 1, 0, fedce804)
libpython2.6.so.1.0`call_function+0xe4(fd7dea6c, 0, b6, 0)
libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb(84fc8bc, 0, 817302c, 0)
libpython2.6.so.1.0`fast_function+0x10b(8186a74, fd7debac, 1, 1, 0, fedce804)
libpython2.6.so.1.0`call_function+0xe4(fd7debac, 0, 11, 0)
libpython2.6.so.1.0`PyEval_EvalFrameEx+0x2cdb(84fc754, 0, 817302c, 0)
libpython2.6.so.1.0`PyEval_EvalCodeEx+0x8cb(8178920, 817302c, 0, 83fd0d8, 1, 0)
libpython2.6.so.1.0`function_call+0x190(8186a04, 83fd0cc, 0, fecc2641)
libpython2.6.so.1.0`PyObject_Call+0x67(8186a04, 83fd0cc, 0, fecd1f9c)
libpython2.6.so.1.0`instancemethod_call+0x151(82bc0cc, 806202c, 0, fecc2641)
libpython2.6.so.1.0`PyObject_Call+0x67(82bc0cc, 806202c, 0, fed58a5d)
libpython2.6.so.1.0`PyEval_CallObjectWithKeywords+0xb8(82bc0cc, 806202c, 0,
fed87575)
libpython2.6.so.1.0`t_bootstrap+0x41(82d2ff8, fef6f000, fd7defe8, feefe22a)
libc.so.1`_thrp_setup+0x9b(fd961240)
libc.so.1`_lwp_start(fd961240, 0, 0, 0, 0, 0)

Let's look closer at the object we're inspecting:

80a6100::print descriptor_t

{
p = {
generic = 0x822f7d0
disk = 0x822f7d0
controller = 0x822f7d0
bus = 0x822f7d0
path = 0x822f7d0
}
name = 0
secondary_name = 0
next = 0x80a60d8
prev = 0x80a6128
type = 0 (DM_DRIVE)
refcnt = 0x1
}

0x822f7d0::print disk_t

{
device_id = 0x834d7b8 "id1,sd@n600601602f2026009a3c0439df5ce011"
devid = 0x80a5fc0
kernel_name = 0x82d2a40 "disk8"
product_id = 0x84fed98 "VRAID"
vendor_id = 0x84feda8 "DGC"
controllers = 0x82d2940
paths = 0
aliases = 0x82d7750
next = 0x82864f8
drv_type = 0x1
removable = 0
sync_speed = 0xffffffff
rpm = 0x1130
wide = 0xffffffff
cd_rom = 0
}

0x82d7750::print alias_t

{
kstat_name = 0x84fedb8 "disk8"
alias = 0x8547ac8 "c5t5006016044600E5Dd1"
devpaths = 0x85484b8
orig_paths = 0
wwn = 0x8547b28 "50060160c4600e5d"
cluster = 0
lun = 0x1
target = 0x10300
next = 0
}

This is what format shows for this device:

5. c5t5006016044600E5Dd1 <DGC -VRAID -0428 cyl 6524 alt 2 hd 255
sec 63>

/pci@39,0/pci1166,142@10/pci10df,fe00@0/fp@0,/disk@w5006016044600e5d,1

So it's trying to get some information from the controller associated with this
device. We can see the controller path above but let's see what libdiskmgmt
thinks it is:

80a6358::print descriptor_t

{
p = {
generic = 0x8149b98
disk = 0x8149b98
controller = 0x8149b98
bus = 0x8149b98
path = 0x8149b98
}
name = 0
secondary_name = 0
next = 0x80a6218
prev = 0
type = 1 (DM_CONTROLLER)
refcnt = 0x1
}

0x8149b98::print controller_t

{
name = 0x827c848
kstat_name = 0x814fbf8
ctype = 0x20
freq = 0x8
disks = 1
paths = 6
bus = 0x74767264
next = 0x657079
multiplex = 0x1
scsi_options = 0
}

Hmm, this is garbage. It looks like this may have been freed before.

Enabling debugging on libdiskmgmt.so confirms that:

INFO: Removed controller /pci@39,0/pci1166,142@10/pci10df,fe00@0,1 from list
INFO: Removed controller /pci@39,0/pci1166,142@10/pci10df,fe00@0 from list
INFO: removing inbound management controller with disk ptrs.

So this particular disk is still pointing to a controller which should have
been removed. From format we see there is another disk which uses the same
controller:

4. c5t5006016844600E5Dd0 &lt;drive type unknown&gt;

/pci@39,0/pci1166,142@10/pci10df,fe00@0/fp@0,/disk@w5006016844600e5d,0

Let's see what that disk thinks it's controller looks like:

0x82864f8::print disk_t

{
device_id = 0x8351390 "id1,sd@n50060160c4600e5d50060160c4600e5d"
devid = 0x80a5f70
kernel_name = 0x82d2930 "disk5"
product_id = 0x82d2a00 "LUNZ"
vendor_id = 0x82d2a10 "DGC"
controllers = 0x82d2a80
paths = 0
aliases = 0x835a098
next = 0x8354220
drv_type = 0x1
removable = 0
sync_speed = 0xffffffff
rpm = 0xffffffff
wide = 0xffffffff
cd_rom = 0
}

0x82d2a80/K

0x82d2a80: 0

Its controller list is NULL which is what it should be. So somehow the
controller for disk8 was not updated correctly.

So when libdiskmgmt finds devices which it considers "invalid" it tries to
remove the controller information from its cache by calling
remove_controller(). The comment from the code explains what it's trying to do:

/*
                 * loop through the disks and remove the reference to the
                 * controller for this disk structure. The disk itself
                 * is still a valid device, the controller being removed
                 * is a 'path' so any disk that has a reference to it
                 * as a controller needs to have this reference removed.
*/

Unfortunately the implementation doesn't do this as it ends up walking the
wrong list to try to do this cleanup. As a result it only cleans up the first
disk associated with the controller and misses all others.

History

#1

Updated by Eric Schrock almost 8 years ago

  • Status changed from New to Resolved

changeset: 13485:d90c0839897e
tag: tip
user: George Wilson <>
date: Tue Oct 18 12:16:24 2011 -0700

description:
1414 libdiskmgmt does not clean up invalid paths
Reviewed by: Robert Mustacchi <>
Reviewed by: Jason King <>
Reviewed by: Igor Kozhukhov <>
Reviewed by: Garrett D'Amore <>
Reviewed by: Dan McDonald <>
Reviewed by: Eric Schrock <>
Reviewed by: Adam Leventhal <>
Approved by: Gordon Ross <>

modified:
usr/src/lib/libdiskmgt/common/findevs.c

Also available in: Atom PDF