Project

General

Profile

Bug #12920

Newer releases falsely report CPU layer 2 cache is faulty on E4-4600 platform

Added by Jason Matthews 10 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Some time after SmartOS 20190619 and definitely before 2020603 the fault manager reports L2 cache on all CPU packages as faulty, disables them, and reduces system performance. Rolling back to an older release prevents the cores from being disabled.

The system starts disabling cores shortly after boot and slowly disables cores across all physical CPU packages. Maybe you can have one bad package but four going bad at the same time?

I first noticed this back in May I installed a May release of SmartOS on a Dell R820.
Rolled the first system back to 20190605 on 5/15/2020 the errors did not recur.
Deployed 20200603 on another system 6/30/202, errors start shortly after boot. Rolled that system back to 20190605 and 18 hour later there are no errors.

--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jul 01 06:29:54 f8ec8b53-ec0c-c3e2-89a4-80e0bfecab44 INTEL-8000-MQ Major

Host : dba009
Platform : PowerEdge-R820 Chassis_id : DFC5CZ1
Product_sn :

Fault class : fault.cpu.intel.l2dcache
Affects : cpu:///cpuid=48
faulted and taken out of service
FRU : hc://:product-id=PowerEdge-R820:server-id=dba009:chassis-id=DFC5CZ1/motherboard=0/chip=0
faulty

Description : A level 2 Data Cache on this cpu is faulty. Refer to
http://illumos.org/msg/INTEL-8000-MQ for more information.

Response : The system will attempt to offline this cpu to remove it from
service.

Impact : Performance of this system may be affected.

Action : Schedule a repair procedure to replace the affected CPU. Use
'fmadm faulty' to identify the module.

root@dba009:/root# smbios |grep -i xeon
Version: Intel(R) Xeon(R) CPU E5-4650L 0 2.60GHz
Family: 179 (Intel Xeon)
Version: Intel(R) Xeon(R) CPU E5-4650L 0
2.60GHz
Family: 179 (Intel Xeon)
Version: Intel(R) Xeon(R) CPU E5-4650L 0 2.60GHz
Family: 179 (Intel Xeon)
Version: Intel(R) Xeon(R) CPU E5-4650L 0
2.60GHz
Family: 179 (Intel Xeon)

System Configuration: Dell Inc. PowerEdge R820
BIOS Configuration: Dell Inc. 2.0.20 01/16/2014

Let me know what other information I can provide to be helpful.

Also available in: Atom PDF