lmbench-2 命令

https://lmbench.sourceforge.net/man/par_mem.8.html
https://francisz.cn/2022/05/12/lmbench/

执行过程

执行过程分析:

results: lmbench
        @env OS="${OS}" ../scripts/config-run # 配置文件生成于 ./bin/$OS/CONFIG.xxx
        @env OS="${OS}" ../scripts/results
        

scripts/results:

. ../bin/$OS/$CONFIG
...
RESULTS=results/$OS/xxx # 了解测试进度可以通过tail 这个文件进行了解
...
./lmbench $CONFIG 2>../${RESULTS}
. ../bin/x86_64-linux-gnu/CONFIG.localhost.localdomain
# bash -x ./lmbench CONFIG.localhost.localdomain 2> my.log 这一步其实就是lmbench脚本调用`./bin/`下的可执行文件去执行

# grep BENCHMARK_HARDWARE scripts/lmbench -A 50

if [ X$BENCHMARK_HARDWARE = XYES ]; then
        if [ X"$DISKS" != X ]
        then    for i in $DISKS
                do      if [ -r $i ]
                        then    echo "Calculating disk zone bw & seek times" \
                                        >> ${OUTPUT}
                                msleep 250
                                disk $i
                                echo "" 1>&2
                        fi
                done
        fi
fi

if [ X$BENCHMARK_OS = XYES -o X$BENCHMARK_HARDWARE = XYES -o X$BENCHMARK_BCOPY = XYES ]; then
        echo "" 1>&2
        echo "libc bcopy unaligned" 1>&2
        for i in $HALF; do bw_mem -P $SYNC_MAX $i bcopy; done; echo "" 1>&2

        echo "libc bcopy aligned" 1>&2
        for i in $HALF; do bw_mem -P $SYNC_MAX $i bcopy conflict; done; echo "" 1>&2

        echo "Memory bzero bandwidth" 1>&2
        for i in $ALL; do bw_mem -P $SYNC_MAX $i bzero; done; echo "" 1>&2

        echo "unrolled bcopy unaligned" 1>&2
        for i in $HALF; do bw_mem -P $SYNC_MAX $i fcp; done; echo "" 1>&2

        echo "unrolled partial bcopy unaligned" 1>&2
        for i in $HALF; do bw_mem -P $SYNC_MAX $i cp; done; echo "" 1>&2

        echo "Memory read bandwidth" 1>&2
        for i in $ALL; do bw_mem -P $SYNC_MAX $i frd; done; echo "" 1>&2

        echo "Memory partial read bandwidth" 1>&2
        for i in $ALL; do bw_mem -P $SYNC_MAX $i rd; done; echo "" 1>&2

        echo "Memory write bandwidth" 1>&2
        for i in $ALL; do bw_mem -P $SYNC_MAX $i fwr; done; echo "" 1>&2

        echo "Memory partial write bandwidth" 1>&2
        for i in $ALL; do bw_mem -P $SYNC_MAX $i wr; done; echo "" 1>&2

        echo "Memory partial read/write bandwidth" 1>&2
        for i in $ALL; do bw_mem -P $SYNC_MAX $i rdwr; done; echo "" 1>&2
fi


if [ X$BENCHMARK_HARDWARE = XYES -o X$BENCHMARK_MEM = XYES ]; then
        if [ $SYNC_MAX = 1 ]; then
            date >> ${OUTPUT}
            echo "Calculating effective TLB size" >> ${OUTPUT}
            msleep 250
            tlb -L $LINE_SIZE -M ${MB}M
            echo "" 1>&2

            date >> ${OUTPUT}
            echo "Calculating memory load parallelism" >> ${OUTPUT}
            msleep 250
            echo "Memory load parallelism" 1>&2
            par_mem -L $LINE_SIZE -M ${MB}M
            echo "" 1>&2

#           date >> ${OUTPUT}
#           echo Calculating cache parameters >> ${OUTPUT}
#           msleep 250
#           cache -L $LINE_SIZE -M ${MB}M
        fi

        date >> ${OUTPUT}
        echo "McCalpin\'s STREAM benchmark" >> ${OUTPUT}
        msleep 250
        stream -P $SYNC_MAX -M ${MB}M
        stream -P $SYNC_MAX -v 2 -M ${MB}M

        date >> ${OUTPUT}
        echo "Calculating memory load latency" >> ${OUTPUT}
        msleep 250
        echo "" 1>&2
        echo "Memory load latency" 1>&2
        if [ X$FASTMEM = XYES ]
        then    lat_mem_rd -P $SYNC_MAX $MB 128
        else    lat_mem_rd -P $SYNC_MAX $MB 16 32 64 128 256 512 1024 
        fi
        echo "" 1>&2
        echo "Random load latency" 1>&2
        lat_mem_rd -t -P $SYNC_MAX $MB 16
        echo "" 1>&2
fi

lat_mem_rd

测试内存时延:lat_mem_rd, 用来测试操作不同数据大小的时延

Usage: ./bin/x86_64-linux-gnu/lat_mem_rd [-P <parallelism>] [-W <warmup>] [-N <repetitions>] [-t] len [stride...]
Memory load latency
+ echo 'Memory load latency'
+ ./bin/lat_mem_rd -P 1 1024 16 32 64 128 256 512 1024
"stride=16
0.00049 1.490
...

"stride=32
0.00049 1.490
...
"stride=64
...
"stride=128
...
"stride=256
...
"stride=512
...
"stride=1024
...

Random load latency
+ echo 'Random load latency'
+ lat_mem_rd -t -P 1 1024 16
"stride=16
0.00049 1.490

stream

测试内存带宽:stream主要用于测试带宽,对应的时延是在带宽跑满情况下的带宽。

Usage: ./bin/x86_64-linux-gnu/stream [-v <stream version 1|2>] [-M <len>[K|M]] [-P <parallelism>] [-W <warmup>] [-N <repetitions>]
点击查看代码
+ echo 'McCalpin'\''s' STREAM benchmark
+ msleep 250
+ stream -P 1 -M 1024M
STREAM copy latency: 1.27 nanoseconds
STREAM copy bandwidth: 12605.27 MB/sec
STREAM scale latency: 1.30 nanoseconds
STREAM scale bandwidth: 12325.71 MB/sec
STREAM add latency: 1.54 nanoseconds
STREAM add bandwidth: 15632.16 MB/sec
STREAM triad latency: 1.87 nanoseconds
STREAM triad bandwidth: 12817.28 MB/sec
+ stream -P 1 -v 2 -M 1024M
STREAM2 fill latency: 0.77 nanoseconds
STREAM2 fill bandwidth: 10348.52 MB/sec
STREAM2 copy latency: 1.27 nanoseconds
STREAM2 copy bandwidth: 12588.20 MB/sec
STREAM2 daxpy latency: 1.34 nanoseconds
STREAM2 daxpy bandwidth: 17869.19 MB/sec
STREAM2 sum latency: 1.54 nanoseconds
STREAM2 sum bandwidth: 5202.24 MB/sec

par_mem

测试mem load并行个数:par_mem

Usage: par_mem [-c] [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
+ echo 'Memory load parallelism'
./bin/par_mem -L 128 -M 64M

load并行度,测试方式为,for循环中使用多个load,For example, the inner loop which measures parallelism 2 would look something like:

for (i = 0; i < N; ++i) {
   p0 = (char **)*p0;
   p1 = (char **)*p1;
}

in a for loop (the overhead of the for loop is not significant; the loop is an unrolled loop 100 loads long). In this case, if the hardware can process two LOAD operations in parallel, then the overall latency of the loop should be equivalent to that of a single pointer chain, so the measured parallelism would be roughly two.

其他内容和测试原理待阅读源码研究

posted @ 2024-06-26 01:32  LiYanbin  阅读(22)  评论(0编辑  收藏  举报