Bug #15302
openbroadcom 3108 kernel driver update
0%
Description
Greetings,
I have a new shiny fileserver that's running OmniOS 151038 LTS (also tested against 151044, same problem). The host includes a Broadcom 3108 "Invader" HBA. The host is having problems booting because the HBA is reporting problems. On digging into it we're seeing this in the devices' internal log:
35992: 22-12-19,18:46:59 WARNING:Host driver needs to be upgraded to enable extended LD support
/var/adm/messages has the following line:
Jan 4 12:17:21 hvfs1 mr_sas: [ID 728489 kern.info] mr_sas0: 0x1000:0x5d 0x15d9:0x809, irq:11 drv-ver:6.503.00.00ILLUMOS-20170524
If I'm reading it right, the current driver kernel version is 6.503.00.00.
Looking at https://www.supermicro.com/wdl/driver/SAS/Broadcom/3108/Driver/ I'm seeing a bunch of drivers, none of which have that low a number. The highest number I see is 07.723.02.00-1, dated 29 NOV, 2022.
This file server is unstable so we've had to pull it out of service. I'd really like to get the driver updated so I can put it back into service soon (I had to press my old test server into production to cover this).
thanks,
nomad
Files
Updated by Lee Damon 5 months ago
We've purchased a new card so we can get this critical server back online. I suspect this problem will crop up again in the near future as older cards get harder to find.
Once I get the new hardware and verify it is working I am willing to loan this card to the developer who works on this bug so they have hardware to work against. I would very much appreciate someone updating the driver soon so we don't get bitten by this again.
Updated by Dan McDonald 5 months ago
A quick peek over in FreeBSD's mrsas driver indicates that we need to recognize if a (firmware upgraded?!?) board supports "extended LD support". This is done by its function mrsas_update_ext_vd_details(), which checks for the control info's "max_lds" to be greater than 64 (which is our #defined MAX_LOGICAL_DRIVES in fusion.h).
We'll need to check for that max_lds value, and make adjustments. Alternatively we could just make everything big enough to hold the FreeBSD's MAX_LOGICAL_DRIVES_EXT value of 256 instead.
Nope, the control information in their mrsas driver has evolved quite a bit since the one in our Fusion update. Much more to bring in as part of this.
Updated by Lee Damon 3 months ago
- File mega_errors.txt mega_errors.txt added
- File PXL_20221219_183121402.jpg PXL_20221219_183121402.jpg added
Sympotoms: the host boots and sees all of the drives. Over time as it runs it accumulates errors in the HBA's internal log.
Eventually, when the host is rebooted (e.g. for patching) it refuses to boot until those errors are cleared by going into the BIOS and resetting the card.
The rest of this is from the exchange I had with SuperMicro when the problem first manifested:
On boot it reports "AVAGO EFI SAS Driver is Unhealthy" then it refuses to continue booting without interaction. (See photo, attached).
When I go into the BIOS and go to Advanced -> Driver Health I see
"AVAGO EFI SAS Driver - Failed"
Selecting that I see
"AVAGO 3108 MegaRAID Configuration Required"
Selecting that I get a wall of text about "Backup power source needs attention or replacement"
I ack that by hitting 'y' (Since IIRC there is no backup power for this card, we don't need or want hardware RAID) I'm told "Critical Message handling completed. Please exit".
Hitting <ESC> I'm then told
"AVAGO 3108 MegaRAID Erconnect Required"
I hit <ENTER> and say "Ok" to "Proceed with Reconnecting the Controller"
It then says Healthy.
I save configuration and reset to exit the BIOS .... and the system reboots to exactly the same error.
mega_errors.txt is an excerpt from the system log.
From the HBA's log. note line 35992. That's the culpret:
35991: 22-12-19,18:46:05 Info:Time established as 12/19/22 18:46:05; (63 seconds since power on)
35992: 22-12-19,18:46:59 WARNING:Host driver needs to be upgraded to enable extended LD support
35993: 22-12-19,18:47:14 Info:Unexpected sense: PD 29(e0x27/s6) Path 5003048020d09086, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 10 0
35994: 22-12-19,18:47:14 Info:Unexpected sense: PD 37(e0x27/s2) Path 5003048020d09082, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 C 0
35995: 22-12-19,18:47:14 Info:Unexpected sense: PD 38(e0x27/s1) Path 5003048020d09081, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 B 0
35996: 22-12-19,18:47:14 Info:Unexpected sense: PD 39(e0x27/s3) Path 5003048020d09083, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 D 0
35997: 22-12-19,18:47:14 Info:Unexpected sense: PD 3a(e0x27/s4) Path 5003048020d09084, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 E 0
35998: 22-12-19,18:47:14 Info:Unexpected sense: PD 3b(e0x27/s0) Path 5003048020d09080, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 A 0
35999: 22-12-19,18:47:14 Info:Unexpected sense: PD 3c(e0x27/s5) Path 5003048020d09085, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 F 0
36000: 22-12-19,18:47:14 Info:Unexpected sense: PD 29(e0x27/s6) Path 5003048020d09086, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 10 0
36001: 22-12-19,18:47:14 Info:Unexpected sense: PD 37(e0x27/s2) Path 5003048020d09082, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 C 0
36002: 22-12-19,18:47:14 Info:Unexpected sense: PD 38(e0x27/s1) Path 5003048020d09081, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 B 0
36003: 22-12-19,18:47:14 Info:Unexpected sense: PD 39(e0x27/s3) Path 5003048020d09083, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 D 0
36004: 22-12-19,18:47:14 Info:Unexpected sense: PD 3a(e0x27/s4) Path 5003048020d09084, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 E 0
36005: 22-12-19,18:47:14 Info:Event log wrapped
36006: 22-12-19,18:47:14 Info:Unexpected sense: PD 3b(e0x27/s0) Path 5003048020d09080, CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00, Sense: 72 1 0 1D 0 0 0 E 9 C 0 0 0 0 0 0 0 4F 0 C2 A 0
Updated by Jason King 3 months ago
Just to provide a bit more detail here, if I'm decoding things correctly, it's doing an ATA-PASSTHROUGH (16) command, specifically the SMART command with a sub code of RETURN STATUS.
The sense code is:
Descriptor format, current data (0x72)
Sense Key is RECOVERED ERROR (0x01)
ASC/ASCQ is ATA PASS-THROUGH INFORMATION AVAILABLE (0x00/0x1D)
The status (last byte) is 0.
I guess it'd be good to confirm these are SATA drives. It seems like whatever is doing the SCSI<->SATA translation (which in this case isn't the OS) is behaving in a way that the HBA is not expecting. I'm not familiar enough with the existing driver to know if there's anything that could be added to handle this or not (but hopefully the detail is useful).
Updated by Lee Damon 3 months ago
Expander information, in case that's needed.
Device0 /devices/pci@33,0/pci8086,2032@2/pci15d9,808@0/iport@ff/enclosure@w5003048020d090bd,0:0
Class [ses]
Vendor : SMC
Product : SC846-P
Firmware revision : 100b
Chassis Serial Number : 5003048020d090bf
Target-port identifier : w5003048020d090bd
Device1 /devices/pci@33,0/pci8086,2032@2/pci15d9,808@0/iport@ff/enclosure@w5003048021259c3d,0:0
Class [ses]
Vendor : SMC
Product : SC826N4
Firmware revision : 100d
Chassis Serial Number : 5003048021259c3f
Target-port identifier : w5003048021259c3d