Project

General

Profile

Actions

Bug #7273

closed

nvme_identify() fails to identify controller sporadically

Added by Youzhong Yang almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2016-08-04
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

On one of our eval system, we saw random failures like the following every time the host was rebooted:

2016-07-30T23:11:53.468013-04:00 batfs9995 nvme: [ID 265585 kern.warning] WARNING: nvme3: command timeout, OPC = 6, CFS = 0
2016-07-30T23:11:53.468018-04:00 batfs9995 nvme: [ID 265585 kern.warning] WARNING: nvme3: command timeout, OPC = 8, CFS = 0
2016-07-30T23:11:53.468024-04:00 batfs9995 nvme: [ID 176450 kern.warning] WARNING: nvme3: nvme_admin_cmd failed for ABORT
2016-07-30T23:11:53.468032-04:00 batfs9995 nvme: [ID 366983 kern.warning] WARNING: nvme3: nvme_admin_cmd failed for IDENTIFY
2016-07-30T23:11:53.468038-04:00 batfs9995 nvme: [ID 318795 kern.warning] WARNING: nvme3: failed to identify controller
2016-07-30T23:11:53.468045-04:00 batfs9995 genunix: [ID 408114 kern.info] /pci@6d,0/pci8086,6f04@2/pci10b5,9765@0/pci10b5,9765@7/pci8086,370a@0 (nvme3) down

Here is the host spec:

Baseboad: Supermicro X10DRU-i+ (System SYS-2028U-TN24R4T+)
Memory: 768G
CPU: 2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, each with 16 physical cores, 32 hyper-threaded cores
SATA SSDs: 2 x Intel DC S3100 1TB (SSDSC2KI010X601)
NVMe SSDs: 2 x Intel DC P3700 800GB (SSDPE2MD800G4)
NVMe SSDs: 22 x Intel DC P3600 400GB (SSDPE2ME400G4)
Ethernet: Ethernet Controller X710 for 10GbE SFP+

Once the FIXED interrupt for admin queue is switched to MSI-X every thing works great and we haven't seen a single failure.

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/nvme/nvme.c#1752

Actions #1

Updated by Hans Rosenfeld over 5 years ago

  • Status changed from New to Closed

Fixed by #7296.

Actions

Also available in: Atom PDF