lmbench-2 命令
https://lmbench.sourceforge.net/man/par_mem.8.html
https://francisz.cn/2022/05/12/lmbench/
执行过程
执行过程分析:
results: lmbench
@env OS="${OS}" ../scripts/config-run # 配置文件生成于 ./bin/$OS/CONFIG.xxx
@env OS="${OS}" ../scripts/results
scripts/results:
. ../bin/$OS/$CONFIG
...
RESULTS=results/$OS/xxx # 了解测试进度可以通过tail 这个文件进行了解
...
./lmbench $CONFIG 2>../${RESULTS}
. ../bin/x86_64-linux-gnu/CONFIG.localhost.localdomain
# bash -x ./lmbench CONFIG.localhost.localdomain 2> my.log 这一步其实就是lmbench脚本调用`./bin/`下的可执行文件去执行
# grep BENCHMARK_HARDWARE scripts/lmbench -A 50
if [ X$BENCHMARK_HARDWARE = XYES ]; then
if [ X"$DISKS" != X ]
then for i in $DISKS
do if [ -r $i ]
then echo "Calculating disk zone bw & seek times" \
>> ${OUTPUT}
msleep 250
disk $i
echo "" 1>&2
fi
done
fi
fi
if [ X$BENCHMARK_OS = XYES -o X$BENCHMARK_HARDWARE = XYES -o X$BENCHMARK_BCOPY = XYES ]; then
echo "" 1>&2
echo "libc bcopy unaligned" 1>&2
for i in $HALF; do bw_mem -P $SYNC_MAX $i bcopy; done; echo "" 1>&2
echo "libc bcopy aligned" 1>&2
for i in $HALF; do bw_mem -P $SYNC_MAX $i bcopy conflict; done; echo "" 1>&2
echo "Memory bzero bandwidth" 1>&2
for i in $ALL; do bw_mem -P $SYNC_MAX $i bzero; done; echo "" 1>&2
echo "unrolled bcopy unaligned" 1>&2
for i in $HALF; do bw_mem -P $SYNC_MAX $i fcp; done; echo "" 1>&2
echo "unrolled partial bcopy unaligned" 1>&2
for i in $HALF; do bw_mem -P $SYNC_MAX $i cp; done; echo "" 1>&2
echo "Memory read bandwidth" 1>&2
for i in $ALL; do bw_mem -P $SYNC_MAX $i frd; done; echo "" 1>&2
echo "Memory partial read bandwidth" 1>&2
for i in $ALL; do bw_mem -P $SYNC_MAX $i rd; done; echo "" 1>&2
echo "Memory write bandwidth" 1>&2
for i in $ALL; do bw_mem -P $SYNC_MAX $i fwr; done; echo "" 1>&2
echo "Memory partial write bandwidth" 1>&2
for i in $ALL; do bw_mem -P $SYNC_MAX $i wr; done; echo "" 1>&2
echo "Memory partial read/write bandwidth" 1>&2
for i in $ALL; do bw_mem -P $SYNC_MAX $i rdwr; done; echo "" 1>&2
fi
if [ X$BENCHMARK_HARDWARE = XYES -o X$BENCHMARK_MEM = XYES ]; then
if [ $SYNC_MAX = 1 ]; then
date >> ${OUTPUT}
echo "Calculating effective TLB size" >> ${OUTPUT}
msleep 250
tlb -L $LINE_SIZE -M ${MB}M
echo "" 1>&2
date >> ${OUTPUT}
echo "Calculating memory load parallelism" >> ${OUTPUT}
msleep 250
echo "Memory load parallelism" 1>&2
par_mem -L $LINE_SIZE -M ${MB}M
echo "" 1>&2
# date >> ${OUTPUT}
# echo Calculating cache parameters >> ${OUTPUT}
# msleep 250
# cache -L $LINE_SIZE -M ${MB}M
fi
date >> ${OUTPUT}
echo "McCalpin\'s STREAM benchmark" >> ${OUTPUT}
msleep 250
stream -P $SYNC_MAX -M ${MB}M
stream -P $SYNC_MAX -v 2 -M ${MB}M
date >> ${OUTPUT}
echo "Calculating memory load latency" >> ${OUTPUT}
msleep 250
echo "" 1>&2
echo "Memory load latency" 1>&2
if [ X$FASTMEM = XYES ]
then lat_mem_rd -P $SYNC_MAX $MB 128
else lat_mem_rd -P $SYNC_MAX $MB 16 32 64 128 256 512 1024
fi
echo "" 1>&2
echo "Random load latency" 1>&2
lat_mem_rd -t -P $SYNC_MAX $MB 16
echo "" 1>&2
fi
lat_mem_rd
测试内存时延:lat_mem_rd, 用来测试操作不同数据大小的时延
Usage: ./bin/x86_64-linux-gnu/lat_mem_rd [-P <parallelism>] [-W <warmup>] [-N <repetitions>] [-t] len [stride...]
Memory load latency
+ echo 'Memory load latency'
+ ./bin/lat_mem_rd -P 1 1024 16 32 64 128 256 512 1024
"stride=16
0.00049 1.490
...
"stride=32
0.00049 1.490
...
"stride=64
...
"stride=128
...
"stride=256
...
"stride=512
...
"stride=1024
...
Random load latency
+ echo 'Random load latency'
+ lat_mem_rd -t -P 1 1024 16
"stride=16
0.00049 1.490
stream
测试内存带宽:stream主要用于测试带宽,对应的时延是在带宽跑满情况下的带宽。
Usage: ./bin/x86_64-linux-gnu/stream [-v <stream version 1|2>] [-M <len>[K|M]] [-P <parallelism>] [-W <warmup>] [-N <repetitions>]
点击查看代码
+ echo 'McCalpin'\''s' STREAM benchmark
+ msleep 250
+ stream -P 1 -M 1024M
STREAM copy latency: 1.27 nanoseconds
STREAM copy bandwidth: 12605.27 MB/sec
STREAM scale latency: 1.30 nanoseconds
STREAM scale bandwidth: 12325.71 MB/sec
STREAM add latency: 1.54 nanoseconds
STREAM add bandwidth: 15632.16 MB/sec
STREAM triad latency: 1.87 nanoseconds
STREAM triad bandwidth: 12817.28 MB/sec
+ stream -P 1 -v 2 -M 1024M
STREAM2 fill latency: 0.77 nanoseconds
STREAM2 fill bandwidth: 10348.52 MB/sec
STREAM2 copy latency: 1.27 nanoseconds
STREAM2 copy bandwidth: 12588.20 MB/sec
STREAM2 daxpy latency: 1.34 nanoseconds
STREAM2 daxpy bandwidth: 17869.19 MB/sec
STREAM2 sum latency: 1.54 nanoseconds
STREAM2 sum bandwidth: 5202.24 MB/sec
par_mem
测试mem load并行个数:par_mem
Usage: par_mem [-c] [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
+ echo 'Memory load parallelism'
./bin/par_mem -L 128 -M 64M
load并行度,测试方式为,for循环中使用多个load,For example, the inner loop which measures parallelism 2 would look something like:
for (i = 0; i < N; ++i) {
p0 = (char **)*p0;
p1 = (char **)*p1;
}
in a for loop (the overhead of the for loop is not significant; the loop is an unrolled loop 100 loads long). In this case, if the hardware can process two LOAD operations in parallel, then the overall latency of the loop should be equivalent to that of a single pointer chain, so the measured parallelism would be roughly two.
其他内容和测试原理待阅读源码研究
本文来自博客园,作者:LiYanbin,转载请注明原文链接:https://www.cnblogs.com/stellar-liyanbin/p/18268104