使用英特尔 VTune Profiler 进行挖矿CPU指令数据分析

门罗币挖矿指令:

Collection and Platform Info
    Application Command Line:    D:\share\xmrig-6.18.0-msvc-win64\xmrig-6.18.0\xmrig.exe -o fr.minexmr.com:443 -u 4971qQbWrJRUGDvEUUvqsw29MNz68Cus7d6DAsmTmGoZd4o9AL9FAJiFSvo5uZK1ezguR46n689Rk3zApMZTcB3gQfDMULX -p x --tls
    Operating System:    Microsoft Windows 10
    Computer Name:    DESKTOP-ALRVTLS
    Result Size:    1.7 GB 采集的全量数据规模
    Collection start time:    15:29:48 02/08/2022 UTC
    Collection stop time:    15:32:55 02/08/2022 UTC
    Collector Type:    Event-based sampling driver
    Finalization mode: Fast. If the number of collected samples exceeds the threshold, this mode limits the number of processed samples to speed up post-processing.
    CPU
        Name:    Intel(R) microarchitecture code named Rocketlake
        Frequency:    2.6 GHz
        Logical CPU Count:    12
        Cache Allocation Technology
            Level 2 capability:    not detected
            Level 3 capability:    not detected

分析类型:

 

运行截图:

 

=

 

 

运行近2分钟,我们看下数据结果:

 

 

 

全量数据采集有1.7GB!还是比较恐怖的。。。

看下整体结果:

 

 

 

 但从性能上看的话,瓶颈在backend。

 

看看单点的retiring,主要的CPU指令都在做啥:

 

 

 

FP的浮点运算比较多,13%

 

front-end的,cache miss、分支预测失误这些,占比很少:

 

 

 

backend的,

 

 

 

Long-latency operations like divides and memory operations can cause this, as can too many operations being directed to a single execution port (for example, more multiply operations arriving in the back-end per cycle than the execution unit can support).

从描述看,是L2 cache拖后腿了,L1的100%,L2的太低,貌似是这个意思。

 

 

 

看下call stack,耗时最多的就1个module。

 

 

我们看下event count:

 

 

 

将hardware event type导出来:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Hardware Events
    Hardware Event Type Hardware Event Count
    ARITH.DIVIDER_ACTIVE    571,366,714,095   ==>arith.divider_active [Cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations] baclears.any [Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction
                                                                       [当除法单元忙于执行除法或平方根运算时循环。 整数和浮点运算的帐户] baclears.any [计算前端重新转向时的总数,主要是当BPU无法提供正确的预测时******除法、平方根运算,符合挖矿的特质!!!    BACLEARS.ANY 24,000,720                ===》The BACLEARS event counts the number of times the front end is resteered, mainly when the Branch Prediction Unit cannot provide a correct prediction and this is corrected by the Branch Address Calculator at the front end. The BACLEARS.ANY event counts the number of baclears for any type of branch.                                                        翻译过来是:BACLEARS 事件计算前端被重新引导的次数,主要是在分支预测单元无法提供正确预测并且由前端的分支地址计算器纠正时。 BACLEARS.ANY 事件计算任何类型分支的 baclears 数量。==》看来是分支预测miss哪里的!
    BR_INST_RETIRED.ALL_BRANCHES    179,656,042,170   ==>ALL_BRANCHES 计算退出的任何分支指令的数量。 分支预测预测分支目标并使处理器能够在知道分支真实执行路径之前很久就开始执行指令。 所有分支都使用分支预测单元 (BPU) 进行预测。 该单元不仅根据分支的 EIP,还根据执行到达该 EIP 的执行路径来预测目标地址。 BPU 可以有效地预测以下分支类型:条件分支、直接调用和跳转、间接调用和跳转、返回。
    BR_MISP_RETIRED.ALL_BRANCHES    695,542,005
    CPU_CLK_UNHALTED.DISTRIBUTED    2,762,526,000,000  ==》此事件在活动超线程(即 C0 中的超线程)之间分配循环计数。 超线程在执行 HLT 或 MWAIT 指令时变为非活动状态。 如果所有其他超线程都处于非活动状态(或禁用或不存在),则所有计数都归因于该超线程。 要在核心处于活动状态时获得完整计数,请将每个超线程的计数相加。
    CPU_CLK_UNHALTED.REF_TSC    2,522,358,800,000
    CPU_CLK_UNHALTED.THREAD 3,122,854,800,000
    CPU_CLK_UNHALTED.THREAD_P   3,103,054,654,575
    CYCLE_ACTIVITY.CYCLES_L1D_MISS  2,207,076,621,210 ==》Cycles while L1 cache miss demand load is outstanding.
    CYCLE_ACTIVITY.CYCLES_MEM_ANY   2,970,053,910,135
    CYCLE_ACTIVITY.STALLS_L1D_MISS  1,527,559,582,665
    CYCLE_ACTIVITY.STALLS_L2_MISS   226,650,679,950
    CYCLE_ACTIVITY.STALLS_L3_MISS   162,225,486,675
    CYCLE_ACTIVITY.STALLS_MEM_ANY   1,551,274,653,810
    CYCLE_ACTIVITY.STALLS_TOTAL 1,592,284,776,840
    DSB2MITE_SWITCHES.PENALTY_CYCLES    1,669,550,085
    DTLB_LOAD_MISSES.STLB_HIT:cmask=1   5,694,170,820
    DTLB_LOAD_MISSES.WALK_ACTIVE    84,254,527,560
    DTLB_STORE_MISSES.STLB_HIT:cmask=1  292,508,775
    DTLB_STORE_MISSES.WALK_ACTIVE   370,511,115
    EXE_ACTIVITY.1_PORTS_UTIL   273,300,409,950
    EXE_ACTIVITY.2_PORTS_UTIL   390,990,586,485
    EXE_ACTIVITY.BOUND_ON_STORES    195,000,585
    FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE    563,478,403,845
    FRONTEND_RETIRED.ANY_DSB_MISS   24,163,691,340
    FRONTEND_RETIRED.DSB_MISS   660,046,200
    FRONTEND_RETIRED.L2_MISS    24,001,680
    FRONTEND_RETIRED.LATENCY_GE_16  45,003,150
    FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1  25,053,253,605
    FRONTEND_RETIRED.LATENCY_GE_4   232,516,275
    ICACHE_16B.IFDATA_STALL 2,205,039,690
    ICACHE_64B.IFTAG_STALL  1,176,017,640
    IDQ.DSB_CYCLES_ANY  710,761,066,140
    IDQ.DSB_CYCLES_OK   619,500,929,250
    IDQ.DSB_UOPS    3,580,955,371,425
    IDQ.MITE_CYCLES_ANY 92,280,138,420
    IDQ.MITE_CYCLES_OK  67,200,100,800
    IDQ.MITE_UOPS   335,040,502,560
    IDQ.MS_SWITCHES 657,019,710
    IDQ.MS_UOPS 4,468,634,055
    IDQ_UOPS_NOT_DELIVERED.CORE 351,316,053,945
    IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE 38,835,116,505
    ILD_STALL.LCP   7,500,135
    INST_RETIRED.ANY    3,769,987,000,000
    INST_RETIRED.NOP    90,000,135
    INT_MISC.CLEAR_RESTEER_CYCLES   7,215,129,870
    INT_MISC.RECOVERY_CYCLES:cmask=1:e=yes  975,017,550
    INT_MISC.UOP_DROPPING   16,350,049,050
    L1D_PEND_MISS.FB_FULL   3,135,009,405
    L1D_PEND_MISS.FB_FULL_PERIODS   180,000,540
    L1D_PEND_MISS.L2_STALL  2,910,008,730
    L1D_PEND_MISS.PENDING   2,753,288,259,840
    L2_RQSTS.ALL_RFO    37,389,560,835
    L2_RQSTS.RFO_HIT    24,540,368,100
    LD_BLOCKS.STORE_FORWARD 3,000,090
    LD_BLOCKS_PARTIAL.ADDRESS_ALIAS 7,704,231,120
    MACHINE_CLEARS.COUNT    85,502,565
    MEM_INST_RETIRED.ALL_STORES 200,160,600,480
    MEM_INST_RETIRED.ANY    732,047,196,135
    MEM_INST_RETIRED.LOCK_LOADS 15,001,050
    MEM_INST_RETIRED.SPLIT_LOADS    9,000,270
    MEM_INST_RETIRED.SPLIT_STORES   12,000,360
    MEM_INST_RETIRED.STLB_MISS_LOADS    1,413,042,390
    MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT    600,330
    MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM   2,401,320
    MEM_LOAD_RETIRED.FB_HIT 136,277,038,725
    MEM_LOAD_RETIRED.L1_HIT 336,031,008,090
    MEM_LOAD_RETIRED.L1_MISS    60,759,911,385
    MEM_LOAD_RETIRED.L2_HIT 54,858,822,870
    MEM_LOAD_RETIRED.L3_HIT 4,997,549,265
    MEM_LOAD_RETIRED.L3_MISS    456,191,520
    OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4    9,735,029,205
    OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD    2,673,818,021,430
    OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO 1,002,168,006,495
    RESOURCE_STALLS.SCOREBOARD  5,067,152,010
    TOPDOWN.BACKEND_BOUND_SLOTS 9,234,752,770,425
    TOPDOWN.SLOTS   13,658,454,097,535
    UOPS_DECODED.DEC0   33,000,099,000
    UOPS_DECODED.DEC0:cmask=1   17,385,052,155
    UOPS_DISPATCHED.PORT_0  910,771,366,155
    UOPS_DISPATCHED.PORT_1  994,651,491,975
    UOPS_DISPATCHED.PORT_2_3    534,780,802,170
    UOPS_DISPATCHED.PORT_4_9    223,530,335,295
    UOPS_DISPATCHED.PORT_5  850,201,275,300
    UOPS_DISPATCHED.PORT_6  899,491,349,235
    UOPS_DISPATCHED.PORT_7_8    207,810,311,715
    UOPS_EXECUTED.CYCLES_GE_3   855,031,282,545
    UOPS_EXECUTED.THREAD    4,300,326,450,480
    UOPS_ISSUED.ANY 4,063,476,095,205
    UOPS_RETIRED.SLOTS  3,905,945,858,910

 

我++,太多了,写个程序排序下再分析。

https://perfmon-events.intel.com/icelake.html 很多事件的定义在这个链接里可以找到。

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
TOPDOWN.SLOTS 13658454097535   ==》pass,分析用的吧
TOPDOWN.BACKEND_BOUND_SLOTS 9234752770425 ==》同上
UOPS_EXECUTED.THREAD 4300326450480  ==》Number of uops to be executed per-thread each cycle. 对挖矿检测应该没啥用
UOPS_ISSUED.ANY 4063476095205   ==>Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS). 对挖矿检测应该没啥用
UOPS_RETIRED.SLOTS 3905945858910  ==》Counts number of retirement slots used.
INST_RETIRED.ANY 3769987000000  ==>This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers.                                    此事件计算退出执行的指令数。 对于由多个微操作组成的指令,此事件计算指令的最后一个微操作的退出。 计数器在硬件中断、陷阱和内部中断处理程序期间继续计数。********
IDQ.DSB_UOPS 3580955371425     ==》μops coming from the Decoded ICache.
CPU_CLK_UNHALTED.THREAD 3122854800000  ==>Counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling.                                      计算线程未处于暂停状态时的线程周期数。 线程在运行 HLT 指令时进入暂停状态。 由于功率或热节流,核心频率可能会不时改变。
CPU_CLK_UNHALTED.THREAD_P 3103054654575 ==》同上
CYCLE_ACTIVITY.CYCLES_MEM_ANY 2970053910135 ==》Cycles while memory subsystem has an outstanding load.在内存子系统具有未完成负载时的循环。
CPU_CLK_UNHALTED.DISTRIBUTED 2762526000000 ==》This event distributes cycle counts between active hyperthreads, i.e., those in C0. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If all other hyperthreads are inactive (or disabled or do not exist), all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.                                              此事件在活动超线程(即 C0 中的超线程)之间分配循环计数。 超线程在执行 HLT 或 MWAIT 指令时变为非活动状态。 如果所有其他超线程都处于非活动状态(或禁用或不存在),则所有计数都归因于该超线程。 要在核心处于活动状态时获得完整计数,请将每个超线程的计数相加。
L1D_PEND_MISS.PENDING 2753288259840  ==》Counts duration of L1D miss outstanding, that is each cycle number of Fill Buffers (FB) outstanding required by Demand Reads. FB either is held by demand loads, or it is held by non-demand loads and gets hit at least once by demand. The valid outstanding interval is defined until the FB deallocation by one of the following ways: from FB allocation, if FB is allocated by demand from the demand Hit FB, if it is allocated by hardware or software prefetch.Note: In the L1D, a Demand Read contains cacheable or noncacheable demand loads, including ones causing cache-line splits and reads due to page walks resulted from any request type                                        计算未完成的 L1D 未命中的持续时间,即需求读取所需的未完成填充缓冲区 (FB) 的每个周期数。FB 要么由需求负载持有,要么由非需求负载持有并在 至少一次按需求。有效的未完成间隔通过以下方式之一定义直到 FB 释放:从 FB 分配,如果 FB 是按需求分配的 从需求 Hit FB,如果它是通过硬件或软件预取分配的。注意: 在 L1D 中,Demand Read 包含可缓存或不可缓存的需求负载,包括由于任何请求类型导致的页面遍历而导致缓存行拆分和读取的负载。",      
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD 2673818021430
CPU_CLK_UNHALTED.REF_TSC 2522358800000
CYCLE_ACTIVITY.CYCLES_L1D_MISS 2207076621210 ==>L1 缓存未命中需求负载未完成时的周期。 Cycles while L1 cache miss demand load is outstanding.
CYCLE_ACTIVITY.STALLS_TOTAL 1592284776840
CYCLE_ACTIVITY.STALLS_MEM_ANY 1551274653810
CYCLE_ACTIVITY.STALLS_L1D_MISS 1527559582665 ==>Execution stalls while L1 cache miss demand load is outstanding. 当 L1 高速缓存未命中需求负载未完成时,执行会停止。
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO 1002168006495  ==>Counts the number of offcore outstanding demand rfo Reads transactions in the super queue every cycle. The 'Offcore outstanding' state of the transaction lasts from the L2 miss until the sending transaction completion to requestor (SQ deallocation). See the corresponding Umask under OFFCORE_REQUESTS.                                                                    计算每个周期的超队列中的核心未完成需求 rfo 读取事务的数量。
UOPS_DISPATCHED.PORT_1 994651491975 
UOPS_DISPATCHED.PORT_0 910771366155
UOPS_DISPATCHED.PORT_6 899491349235
UOPS_EXECUTED.CYCLES_GE_3 855031282545
UOPS_DISPATCHED.PORT_5 850201275300
MEM_INST_RETIRED.ANY 732047196135
IDQ.DSB_CYCLES_ANY 710761066140
IDQ.DSB_CYCLES_OK 619500929250
ARITH.DIVIDER_ACTIVE 571366714095
FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE 563478403845
UOPS_DISPATCHED.PORT_2_3 534780802170
EXE_ACTIVITY.2_PORTS_UTIL 390990586485
IDQ_UOPS_NOT_DELIVERED.CORE 351316053945
MEM_LOAD_RETIRED.L1_HIT 336031008090
IDQ.MITE_UOPS 335040502560
EXE_ACTIVITY.1_PORTS_UTIL 273300409950
CYCLE_ACTIVITY.STALLS_L2_MISS 226650679950
UOPS_DISPATCHED.PORT_4_9 223530335295
UOPS_DISPATCHED.PORT_7_8 207810311715
MEM_INST_RETIRED.ALL_STORES 200160600480
BR_INST_RETIRED.ALL_BRANCHES 179656042170
CYCLE_ACTIVITY.STALLS_L3_MISS 162225486675
MEM_LOAD_RETIRED.FB_HIT 136277038725
IDQ.MITE_CYCLES_ANY 92280138420
DTLB_LOAD_MISSES.WALK_ACTIVE 84254527560
IDQ.MITE_CYCLES_OK 67200100800
MEM_LOAD_RETIRED.L1_MISS 60759911385
MEM_LOAD_RETIRED.L2_HIT 54858822870
IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE 38835116505
L2_RQSTS.ALL_RFO 37389560835
UOPS_DECODED.DEC0 33000099000
FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1 25053253605
L2_RQSTS.RFO_HIT 24540368100
FRONTEND_RETIRED.ANY_DSB_MISS 24163691340
UOPS_DECODED.DEC0:cmask=1 17385052155
INT_MISC.UOP_DROPPING 16350049050
OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD:cmask=4 9735029205
LD_BLOCKS_PARTIAL.ADDRESS_ALIAS 7704231120
INT_MISC.CLEAR_RESTEER_CYCLES 7215129870
DTLB_LOAD_MISSES.STLB_HIT:cmask=1 5694170820
RESOURCE_STALLS.SCOREBOARD 5067152010
MEM_LOAD_RETIRED.L3_HIT 4997549265
IDQ.MS_UOPS 4468634055
L1D_PEND_MISS.FB_FULL 3135009405
L1D_PEND_MISS.L2_STALL 2910008730
ICACHE_16B.IFDATA_STALL 2205039690
DSB2MITE_SWITCHES.PENALTY_CYCLES 1669550085
MEM_INST_RETIRED.STLB_MISS_LOADS 1413042390
ICACHE_64B.IFTAG_STALL 1176017640
INT_MISC.RECOVERY_CYCLES:cmask=1:e=yes 975017550
BR_MISP_RETIRED.ALL_BRANCHES 695542005
FRONTEND_RETIRED.DSB_MISS 660046200
IDQ.MS_SWITCHES 657019710
MEM_LOAD_RETIRED.L3_MISS 456191520
DTLB_STORE_MISSES.WALK_ACTIVE 370511115
DTLB_STORE_MISSES.STLB_HIT:cmask=1 292508775
FRONTEND_RETIRED.LATENCY_GE_4 232516275
EXE_ACTIVITY.BOUND_ON_STORES 195000585
L1D_PEND_MISS.FB_FULL_PERIODS 180000540
INST_RETIRED.NOP 90000135
MACHINE_CLEARS.COUNT 85502565
FRONTEND_RETIRED.LATENCY_GE_16 45003150
FRONTEND_RETIRED.L2_MISS 24001680
BACLEARS.ANY 24000720
MEM_INST_RETIRED.LOCK_LOADS 15001050
MEM_INST_RETIRED.SPLIT_STORES 12000360
MEM_INST_RETIRED.SPLIT_LOADS 9000270
ILD_STALL.LCP 7500135
LD_BLOCKS.STORE_FORWARD 3000090
MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM 2401320
MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT 600330

 还是多了,继续压缩:

TOPDOWN 22893206867960
CPU_CLK_UNHALTED 11510794254575
CYCLE_ACTIVITY 10237125711285
IDQ 5410863762360 ==》Instruction Decode Queue (IDQ) 
UOPS_EXECUTED 5155357733025 ==》Counts the number of uops from any logical processor.
UOPS_DISPATCHED 4621236931845
UOPS_ISSUED 4063476095205
UOPS_RETIRED 3905945858910  ==》Counts the number of micro-ops retired, (macro-fused=1, mico-fused=2, others=1 - maximum count of 8). The processor decodes complex macro instructions into a sequence of simpler micro-ops. Most instructions are composed of one or two micro-ops. Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. 计算退休的微操作数,(macro-fused=1,mico-fused=2,others=1 - 最大计数为 8)。 处理器将复杂的宏指令解码为一系列更简单的微操作。 大多数指令由一个或两个微操作组成。 一些指令被解码为更长的序列,例如重复指令、浮点超越指令和辅助指令。可能挖矿相关!
INST_RETIRED 3770077000135  ==》Counts the number of instructions retired from execution. For instructions that consist of multiple micro-ops, Counts the retirement of the last micro-op of the instruction. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. INST_RETIRED.ANY_P is counted by a programmable counter and it is an architectural performance event. Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not count as retired instructions. 计算退出执行的指令数。 对于由多个微操作组成的指令,计算指令的最后一个微操作的退出。 在硬件中断、陷阱和内部中断处理程序期间继续计数。
OFFCORE_REQUESTS_OUTSTANDING 3685721057130  ==>Counts the number of offcore outstanding cacheable Core Data Read transactions in the super queue every cycle. A transaction is considered to be in the Offcore outstanding state between L2 miss and transaction completion sent to requestor (SQ de-allocation). See corresponding Umask under OFFCORE_REQUESTS. 计算每个周期在超级队列中的核心未完成的可缓存核心数据读取事务的数量。 在 L2 未命中和发送到请求者的事务完成之间(SQ 解除分配),事务被认为处于 Offcore 未完成状态。
L1D_PEND_MISS 2759513278515 ==>Number of times a request needed a FB (Fill Buffer) entry but there was no entry available for it. A request includes cacheable/uncacheable demands that are load, store or SW prefetch instructions. 请求需要 FB(填充缓冲区)条目但没有可用条目的次数。 请求包括加载、存储或软件预取指令的可缓存/不可缓存需求。
MEM_INST_RETIRED 933656840685
EXE_ACTIVITY 664485997020
MEM_LOAD_RETIRED 593380521855
ARITH 571366714095  ==>Cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations. 挖矿相关!!!
FP_ARITH_INST_RETIRED 563478403845   ==>Counts once for most SIMD 128-bit packed computational double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.  对大多数 SIMD 128 位压缩计算双精度浮点指令计数一次; 如下所述,某些指令将计算两次。 每个计数代表 2 个计算操作,每个元素一个。 适用于压缩双精度浮点指令:ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB。 DPP 和 FM(N)ADD/SUB 指令计数两次,因为它们对每个元素执行 2 次计算。 使用这些事件时需要设置 MXCSR 寄存器中的 DAZ 和 FTZ 标志。挖矿相关!!!
IDQ_UOPS_NOT_DELIVERED 390151170450
BR_INST_RETIRED 179656042170
DTLB_LOAD_MISSES 89948698380
L2_RQSTS 61929928935  ==》Counts the total number of L2 code requests
UOPS_DECODED 50385151155
FRONTEND_RETIRED 50178512250
INT_MISC 24540196470
LD_BLOCKS_PARTIAL 7704231120
RESOURCE_STALLS 5067152010 ==》Counts resource-related stall cycles.
ICACHE_16B 2205039690
DSB2MITE_SWITCHES 1669550085
ICACHE_64B 1176017640
BR_MISP_RETIRED 695542005
DTLB_STORE_MISSES 663019890
MACHINE_CLEARS 85502565
BACLEARS 24000720
ILD_STALL 7500135
MEM_LOAD_L3_HIT_RETIRED 3001650
LD_BLOCKS 3000090==》The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.

 

如果是通过CPU指令检测挖矿的话,CPU高+这些指令特征是否也可以说明本质是在做挖矿???还需要更多的数据分析和上下文,性价比是一个很大的问题。

 

posted @   bonelee  阅读(1175)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
历史上的今天:
2019-08-03 APT 信息收集——shodan.io ,fofa.so、 MX 及 邮件。mx记录查询。censys.io查询子域名。
2017-08-03 1. 批量梯度下降法BGD 2. 随机梯度下降法SGD 3. 小批量梯度下降法MBGD
点击右上角即可分享
微信分享提示