Stream使用教程

现在,计算机CPU运行速度的快速发展已经远远超过了内存的访问速度。越来越多的程序性能被内存带宽所限制,而不是计算机的CPU运行速率。

Stream benchmark是一个简单的合成基准测试程序,衡量内存带宽(MB /秒)和相应的内核向量的计算速度。

 Each of the four tests adds independent information to the results:

1. copy”:在没有算术的衡量传输速率;

2. Scale

  • ``Copy'' measures transfer rates in the absence of arithmetic.
  • ``Scale'' adds a simple arithmetic operation.
  • ``Sum'' adds a third operand to allow multiple load/store ports on vector machines to be tested.
  • ``Triad'' allows chained/overlapped/fused multiply/add operations.

 

[root@RedHat stream]# make

gcc -O2 stream.c -o stream

[root@RedHat stream]# ls

Makefile  stream  stream.c  stream.f

[root@RedHat stream]# ./stream

-------------------------------------------------------------

STREAM version $Revision: 5.9 $

-------------------------------------------------------------

This system uses 8 bytes per DOUBLE PRECISION word.

-------------------------------------------------------------

Array size = 2000000, Offset = 0

Total memory required = 45.8 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------

Printing one line per active thread....

-------------------------------------------------------------

Your clock granularity/precision appears to be 6 microseconds.

Each test below will take on the order of 17340 microseconds.

   (= 2890 clock ticks)

Increase the size of the arrays if this shows that

you are not getting at least 20 clock ticks per test.

-------------------------------------------------------------

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------

Function      Rate (MB/s)   Avg time     Min time     Max time

Copy:        2264.0171       0.0154       0.0141       0.0169

Scale:       2212.8422       0.0154       0.0145       0.0164

Add:         2936.3309       0.0196       0.0163       0.0214

Triad:       2673.8123       0.0199       0.0180       0.0216

-------------------------------------------------------------

Solution Validates

-------------------------------------------------------------

[root@RedHat stream]#

 

posted on   YoungerChina  阅读(1318)  评论(0编辑  收藏  举报

编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示