Project

General

Profile

Actions

Feature #15522

closed

Want support for Zen 4 CPU Performance Counters

Added by Robert Mustacchi about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

AMD has several public docs that describe Zen 4's CPU performance counters:

We will create a data file for Zen 4 based CPUs and put that together and enables this for known processors.

Actions #1

Updated by Robert Mustacchi about 1 year ago

On a Zen 4 system I did some basic looking at what we have for CPU perf counters. Here's basic cpustat output.

rm@gilgamesh ~ $ pfexec cpustat -h
Usage:
        cpustat -c spec [-c spec]... [-p period] [-T u|d]
                [-sntD] [interval [count]]

        -c spec   specify processor events to be monitored
        -n        suppress titles
        -p period cycle through event list periodically
        -s        run user soaker thread for system-only events
        -t        include tsc register
        -T d|u    Display a timestamp in date (d) or unix time_t (u)
        -D        enable debug mode
        -h        print extended usage information

        Use cputrack(1) to monitor per-process statistics.

        CPU performance counter interface: AMD Family 19h

        event specification syntax:
        [picn=]<eventn>[,attr[n][=<val>]][,[picn=]<eventn>[,attr[n][=<val>]],...]

        Generic Events:

        event[0-5]: PAPI_br_cn PAPI_br_ins PAPI_tot_cyc PAPI_tot_ins 
                 PAPI_tlb_dm PAPI_tlb_im 

        See generic_events(3CPC) for descriptions of these events

        Platform Specific Events:

        event[0-5]: FpRetx87FpOps FpRetx87FpOps.DivSqrROps 
                 FpRetx87FpOps.MulOps FpRetx87FpOps.AddSubOps 
                 FpRetSseAvxOps FpRetSseAvxOps.BfloatMacFLOPs 
                 FpRetSseAvxOps.MacFLOPs FpRetSseAvxOps.DivFLOPs 
                 FpRetSseAvxOps.MultFLOPs FpRetSseAvxOps.AddSubFLOPs 
                 FpRetiredSerOps FpRetiredSerOps.SseBotRet 
                 FpRetiredSerOps.SseCtrlRet FpRetiredSerOps.X87BotRet 
                 FpRetiredSerOps.X87CtrlRet FpOpsRetiredByWidth 
                 FpOpsRetiredByWidth.Pack512uOpsRetired 
                 FpOpsRetiredByWidth.Pack256uOpsRetired 
                 FpOpsRetiredByWidth.Pack128uOpsRetired 
                 FpOpsRetiredByWidth.ScalaruOpsRetired 
                 FpOpsRetiredByWidth.MMXuOpsRetired 
                 FpOpsRetiredByWidth.x87uOpsRetired FpOpsRetiredByType 
                 SseAvxOpsRetired FpPackOpsRetired PackedIntOpType 
                 FpDispFaults FpDispFaults.YmmSpillFault 
                 FpDispFaults.YmmFillFault FpDispFaults.XmmFillFault 
                 FpDispFaults.x87FillFault LsBadStatus2 
                 LsBadStatus2.StliOther LsLocks LsLocks.BusLock 
                 LsRetClClush LsRetCpuid LsDispatch LsSmiRx LsIntTaken 
                 LsIntTaken.IntTaken LsSTLF LsStCommitCancel2 
                 LsStCommitCancel2.StCommitCancelWcbFull LsMabAlloc 
                 LsDmndFillsFromSys 
                 LsDmndFillsFromSys.AlternateMemories_NearFar 
                 LsDmndFillsFromSys.Dram_IO_Far 
                 LsDmndFillsFromSys.FarCache_NearFar 
                 LsDmndFillsFromSys.Dram_IO_Near 
                 LsDmndFillsFromSys.NearCache_NearFar 
                 LsDmndFillsFromSys.LocalCcx LsDmndFillsFromSys.LocalL2 
                 LsAnyFillsFromSys 
                 LsAnyFillsFromSys.AlternateMemories_NearFar 
                 LsAnyFillsFromSys.Dram_IO_Far 
                 LsAnyFillsFromSys.FarCache_NearFar 
                 LsAnyFillsFromSys.Dram_IO_Near 
                 LsAnyFillsFromSys.ExtCacheLocal LsAnyFillsFromSys.LocalCcx 
                 LsAnyFillsFromSys.LocalL2 LsL1DTlbMiss 
                 LsL1DTlbMiss.TlbReload1GL2Miss 
                 LsL1DTlbMiss.TlbReload2ML2Miss 
                 LsL1DTlbMiss.TlbReloadCoalescedPageMiss 
                 LsL1DTlbMiss.TlbReload4KL2Miss 
                 LsL1DTlbMiss.TlbReload1GL2Hit 
                 LsL1DTlbMiss.TlbReload2ML2Hit 
                 LsL1DTlbMiss.TlbReloadCoalescedPageHit 
                 LsL1DTlbMiss.TlbReload4KL2Hit LsMisalLoads 
                 LsMisalLoads.MA4K LsMisalLoads.MA64 LsPrefInstrDisp 
                 LsPrefInstrDisp.PREFETCHNTA LsPrefInstrDisp.PREFETCHW 
                 LsPrefInstrDisp.PREFETCH LsWcbCloseFlush LsInefSwPref 
                 LsInefSwPref.MabMchCnt LsInefSwPref.DataPipeSwPfDcHit 
                 LsSwPfDcFills LsSwPfDcFills.AlternateMemories_NearFar 
                 LsSwPfDcFills.Dram_IO_Far LsSwPfDcFills.FarCache_NearFar 
                 LsSwPfDcFills.Dram_IO_Near LsSwPfDcFills.NearCache_NearFar 
                 LsSwPfDcFills.LocalCcx LsSwPfDcFills.LocalL2 LsHwPfDcFills 
                 LsHwPfDcFills.AlternateMemories_NearFar 
                 LsHwPfDcFills.Dram_IO_Far LsHwPfDcFills.FarCache_NearFar 
                 LsHwPfDcFills.Dram_IO_Near LsHwPfDcFills.NearCache_NearFar 
                 LsHwPfDcFills.LocalCcx LsHwPfDcFills.LocalL2 
                 LsAllocMabCount LsNotHaltedCyc LsTlbFlush LsNotHaltedP0Cyc 
                 LsNotHaltedP0Cyc.P0FreqCyc IcCacheFillL2 IcCacheFillSys 
                 BpL1TlbMissL2TlbHit BpL1TlbMissL2TlbMiss 
                 BpL1TlbMissL2TlbMiss.Coalesced4K BpL1TlbMissL2TlbMiss.IF1G 
                 BpL1TlbMissL2TlbMiss.IF2M BpL1TlbMissL2TlbMiss.IF4K 
                 BpL2BTBCorrect BpDynIndPred BpDeReDirect BpL1TlbFetchHit 
                 BpL1TlbFetchHit.IF1G BpL1TlbFetchHit.IF2M 
                 BpL1TlbFetchHit.IF4K ResyncsOrNcRedirects IcTagHitMiss 
                 OpCacheHitMiss DeOpQueueEmpty DeSrcOpDisp 
                 DeSrcOpDisp.LoopBuffer DeSrcOpDisp.OpCache 
                 DeSrcOpDisp.Decoder DeDisOpsFromDecoder 
                 DeDisDispatchTokenStalls1 
                 DeDisDispatchTokenStalls1.FpFlushRecoveryStall 
                 DeDisDispatchTokenStalls1.FPSchRsrcStall 
                 DeDisDispatchTokenStalls1.FpRegFileRsrcStall 
                 DeDisDispatchTokenStalls1.TakenBrnchBufferRsrc 
                 DeDisDispatchTokenStalls1.StoreQueueRsrcStall 
                 DeDisDispatchTokenStalls1.LoadQueueRsrcStall 
                 DeDisDispatchTokenStalls1.IntPhyRegFileRsrcStall 
                 DeDisDispatchTokenStalls2 
                 DeDisDispatchTokenStalls2.RetireTokenStall 
                 DeDisDispatchTokenStalls2.IntSch3TokenStall 
                 DeDisDispatchTokenStalls2.IntSch2TokenStall 
                 DeDisDispatchTokenStalls2.IntSch1TokenStall 
                 DeDisDispatchTokenStalls2.IntSch0TokenStall 
                 DeNoDispatchPerSlot DeAdditionalResourceStalls ExRetInstr 
                 ExRetOps ExRetBrn ExRetBrnMisp ExRetBrnTkn ExRetBrnTknMisp 
                 ExRetBrnFar ExRetNearRet ExRetNearRetMispred 
                 ExRetBrnIndMisp ExRetMmxFpInstr ExRetMmxFpInstr.SseInstr 
                 ExRetMmxFpInstr.MmxInstr ExRetMmxFpInstr.X87Instr 
                 ExRetIndBrchInstr ExRetCond ExDivBusy ExDivCount 
                 ExNoRetire ExRetUcodeInstr ExRetUcodeOps 
                 ExRetMsprdBrnchInstrDirMsmtch ExRetUncondBrnchInstrMispred 
                 ExRetUncondBrnchInstr ExTaggedIbsOps 
                 ExTaggedIbsOps.IbsCountRollover 
                 ExTaggedIbsOps.IbsTaggedOpsRet ExTaggedIbsOps.IbsTaggedOps 
                 ExRetFusedInstr L2RequestG1 L2RequestG1.RdBlkL 
                 L2RequestG1.RdBlkX L2RequestG1.LsRdBlkC_S 
                 L2RequestG1.CacheableIcRead L2RequestG1.ChangeToX 
                 L2RequestG1.PrefetchL2Cmd L2RequestG1.L2HwPf 
                 L2RequestG1.Group2. Read-write L2CacheReqStat 
                 L2CacheReqStat.LsRdBlkCS L2CacheReqStat.LsRdBlkLHitX 
                 L2CacheReqStat.LsRdBlkLHitS L2CacheReqStat.LsRdBlkX 
                 L2CacheReqStat.LsRdBlkC L2CacheReqStat.IcFillHitX 
                 L2CacheReqStat.IcFillHitS L2CacheReqStat.IcFillMiss 
                 L2PfHitL2 L2PfHitL2.L1Region L2PfHitL2.L1Stride 
                 L2PfHitL2.L1Stream L2PfHitL2.L2Stride L2PfHitL2.L2Burst 
                 L2PfHitL2.L2Up_Down L2PfHitL2.L2NextLine 
                 L2PfHitL2.L2Stream L2PfMissL2HitL2 
                 L2PfMissL2HitL2.L1Region L2PfMissL2HitL2.L1Stride 
                 L2PfMissL2HitL2.L1Stream L2PfMissL2HitL2.L2Stride 
                 L2PfMissL2HitL2.L2Burst L2PfMissL2HitL2.L2Up_Down 
                 L2PfMissL2HitL2.L2NextLine L2PfMissL2HitL2.L2Stream 
                 L2PfMissL2L3 L2PfMissL2L3.L1Region L2PfMissL2L3.L1Stride 
                 L2PfMissL2L3.L1Stream L2PfMissL2L3.L2Stride 
                 L2PfMissL2L3.L2Burst L2PfMissL2L3.L2Up_Down 
                 L2PfMissL2L3.L2NextLine L2PfMissL2L3.L2Stream 

        attributes: edge pc inv cmask umask nouser sys 

        See "Processor Programming Reference (PPR) for AMD Family 19h Model 
        11h, Revision B1 Processors Volume 1 of 6" (AMD publication 55901), 
        "Processor Programming Reference (PPR) for AMD Family 19h Model 
        61h, Revision B1 Processors" (AMD publication 56713), "Processor 
        Programming Reference (PPR) for AMD Family 19h Model 70h, Revision 
        A0 Processors" (AMD publication 57019), and 
        amd_f19h_zen4_events(3CPC) 

I then went and looked at the auto-generation in the manual page. It's better than nothing! I next in one window ran:

rm@gilgamesh ~ $ pfexec pbind -b 190 $$
process id 100530: was not bound, now 190
rm@gilgamesh ~ $ while :; do :; done

In another window I ran pfexec cpustat -c PAPI_tot_cyc 1 and watched the output for cpu 190. I saw it around 3432314581 and similar numbers while that was going. When I killed the loop it went back to a more expected idle value like 7321.

I also did a brief check on Zen 2 and Zen 3 systems by looking at the cpustat -h output and then looking at the PAPI_tot_cyc value for all CPUs. I then bound a thread to a CPU and in the shell did a while :; do :; done loop to confirm the frequency changed and actually boosted.

Actions #2

Updated by Electric Monk about 1 year ago

  • Gerrit CR set to 2749
Actions #3

Updated by Robert Mustacchi about 1 year ago

In addition, with Andy's help we were able to test on a Family 0x10 era Athalon II x2 (K10). We confirmed that the refactoring of the event data had not changed there across this. (cpustat -h output). However. we did discover #15539 as part of this.

Actions #4

Updated by Electric Monk about 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit e6bda3ff4c6fb3d2cf0980c3f0ed23a2f1826de8

commit  e6bda3ff4c6fb3d2cf0980c3f0ed23a2f1826de8
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2023-04-05T15:29:20.000Z

    15522 Want support for Zen 4 CPU Performance Counters
    Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
    Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions

Also available in: Atom PDF