Bug #396

format crashes

Added by Roy Sigurd Karlsbakk over 8 years ago. Updated over 7 years ago.

Status:NewStart date:2010-11-04
Priority:HighDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Difficulty:Medium Tags:needs-triage

Description

Hi all

With an LSI SAS9211-8i, format crashes and won't let me go further. OI installs ok, but I can't find the other drives. See below for output.

root@tos-backup:~# format
Searching for disks...Arithmetic Exception (core dumped)
root@tos-backup:~# ls l /dev/rdsk/core
-rw------
1 root root 2463431 2010-11-04 17:41 /dev/rdsk/core
root@tos-backup:~# pstack /dev/rdsk/core
core '/dev/rdsk/core' of 1217: format
fee62e4a UDiv (4, 0, 8046c80, 80469a0, 8046a30, 8046a50) + 2a
08079799 auto_sense (4, 0, 8046c80, 0) + 281
080751a6 add_device_to_disklist (80479c0, 80475c0, fefd995b, feffb140) + 62a
080746ff do_search (0, 1, 8047e28, 8066576) + 273
0806658d main (1, 8047e58, 8047e60, 8047e4c) + c1
0805774d _start (1, 8047f00, 0, 8047f07, 8047f0b, 8047f1f) + 7d
root@tos-backup:~# zpool status
pool: rpool
state: ONLINE
scan: none requested
config:

NAME                       STATE     READ WRITE CKSUM
rpool ONLINE 0 0 0
c4t5000C50019891202d0s0 ONLINE 0 0 0

errors: No known data errors
root@tos-backup:~# cfgadm -a
Ap_Id Type Receptacle Occupant Condition
c6 scsi-sas connected configured unknown
c6::es/ses0 ESI connected configured unknown
c6::smp/expd0 smp connected configured unknown
c6::w5000c50019891202,0 disk-path connected configured unknown
c6::w5000c50019890fed,0 disk-path connected configured unknown
c7 scsi-sas connected unconfigured unknown
usb8/1 unknown empty unconfigured ok
usb8/2 unknown empty unconfigured ok
usb9/1 unknown empty unconfigured ok
usb9/2 usb-device connected configured ok
usb10/1 unknown empty unconfigured ok
usb10/2 unknown empty unconfigured ok
usb10/3 unknown empty unconfigured ok
usb10/4 unknown empty unconfigured ok
usb11/1 unknown empty unconfigured ok
usb11/2 unknown empty unconfigured ok
usb12/1 unknown empty unconfigured ok
usb12/2 unknown empty unconfigured ok
usb13/1 unknown empty unconfigured ok
usb13/2 unknown empty unconfigured ok
usb14/1 usb-hub connected configured ok
usb14/1.1 unknown empty unconfigured ok
usb14/1.2 unknown empty unconfigured ok
usb14/1.3 usb-hub connected configured ok
usb14/1.3.1 usb-device connected configured ok
usb14/1.3.2 unknown empty unconfigured ok
usb14/1.3.3 unknown empty unconfigured ok
usb14/1.3.4 unknown empty unconfigured ok
usb14/1.4 unknown empty unconfigured ok
usb14/2 unknown empty unconfigured ok
usb14/3 unknown empty unconfigured ok
usb14/4 unknown empty unconfigured ok
usb14/5 unknown empty unconfigured ok
usb14/6 unknown empty unconfigured ok
root@tos-backup:~#

core - coredump (2.29 MB) Aurélien ROUGEMONT, 2011-11-17 04:20 PM


Related issues

Duplicated by illumos gate - Bug #3303: format: Searching for disks...Arithmetic Exception New 2012-10-25

History

#1 Updated by Bill Sommerfeld over 8 years ago

looks like a divide-by-zero. from disassembly of the oi_147 version of usr/bin/format, the division by zero appears to be in the generic_disk_sense() function in
usr/src/cmd/format/auto_sense.c (auto_sense+0x281 is right after a call to generic_disk_sense()).

my guess is that either pcyl or nhead is zero in:

if (nblocks > (pcyl * nhead)) {
err_print("Mode sense page(3) reports nsect value"
" as %d, adjusting it to %llu\n",
nsect, nblocks / (pcyl * nhead));
nsect = nblocks / (pcyl * nhead);
} else {

what output do you get from format -M ? (which turns on "option_msg" and "diag_msg", which causes it to print out pcyl and nhead and a few other variables earlier in generic_disk_sense() ) ?

#2 Updated by Roy Sigurd Karlsbakk almost 8 years ago

Seems there was some bogus on the drives that format/OI didn't like - after clearing the drives (I just put a dummy pool there), it worked

#3 Updated by Rich Lowe almost 8 years ago

  • Project changed from OpenIndiana Distribution to illumos gate
  • Category deleted (Drivers)

#4 Updated by Gary Mills over 7 years ago

  • Difficulty set to Medium
  • Tags set to needs-triage

Has anyone else experienced this problem? The code in
auto_sense() takes care to ensure that the variables in
question are not zero. It's certainly possible to reproduce
the problem and fix it, but only if we knew what was on
that disk in the first place. Now that that's gone, we have
nothing to go on.

I'd recommend closing this bug report, or at least reducing
its priority until somebody else encounters the problem.

#5 Updated by Roy Sigurd Karlsbakk over 7 years ago

I'd recommend closing this bug report, or at least reducing
its priority until somebody else encounters the problem.

I agree - just close it for now. What's on the disk is long gone anyway.

roy

#6 Updated by Aurélien ROUGEMONT over 7 years ago

Bill Sommerfeld wrote:

looks like a divide-by-zero. from disassembly of the oi_147 version of usr/bin/format, the division by zero appears to be in the generic_disk_sense() function in
usr/src/cmd/format/auto_sense.c (auto_sense+0x281 is right after a call to generic_disk_sense()).

my guess is that either pcyl or nhead is zero in:

if (nblocks > (pcyl * nhead)) {
err_print("Mode sense page(3) reports nsect value"
" as %d, adjusting it to %llu\n",
nsect, nblocks / (pcyl * nhead));
nsect = nblocks / (pcyl * nhead);
} else {

Got the same problem here :

root@r710-testdisk:~# format -M
Searching for disks...
c6t5001517959515252d0: attempting auto configuration
Inquiry:
00 00 05 12 45 00 00 02 41 54 41 20 20 20 20 20     ....E...ATA     
49 4e 54 45 4c 20 53 53 44 53 41 32 43 57 31 36     INTEL SSDSA2CW16
30 33 30 32 43 56 50 52 31 31 32 33 30 33 45 47     0302CVPR112303EG
31 36 30 44 47 4e 20 20 00 00 00 77 1e a0 03 00     160DGN  ...w....
03 20 0b fd 16 00 00 00 00                          . .......
Product id: INTEL SSDSA2CW16
Capacity: 12 a1 9e af 00 00 02 00 
blocks:  312581807 (0x12a19eaf)
blksize: 512
disk name:  `SA2CW16`

Mode sense page 0x3 (current):
header: 23 00 00 08 00 00 00 00 00 00 02 00 
data:   03 16 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
        00 00 00 00 00 00 00 00 

Mode sense page 0x4 (current):
header: 23 00 00 08 00 00 00 00 00 00 02 00 
data:   04 16 02 54 34 00 00 00 00 00 00 00 00 00 00 00 
        00 00 00 00 00 00 00 00 
Geometry:
    pcyl:    152628
    ncyl:    152626
    heads:   0
    nsects:  0
    acyl:    2
    bcyl:    0
    rpm:     0
    nblocks:     312581808
The current rpm value 0 is invalid, adjusting it to 3600
Arithmetic Exception (core dumped)

Which makes sense... since heads =0

core file attached

Also available in: Atom