OpenCL performance on intel i5-11400 by Clpeak

Platform: NVIDIA CUDA
Device: NVIDIA GeForce RTX 4090
Driver version  : 550.127.05 (Linux x64)
Compute units   : 128
Clock frequency : 2520 MHz

Global memory bandwidth (GBPS)
  float   : 873.20
  float2  : 901.24
  float4  : 917.89
  float8  : 928.70
  float16 : 938.94

Single-precision compute (GFLOPS)
  float   : 84761.26
  float2  : 80760.14
  float4  : 80512.55
  float8  : 79900.18
  float16 : 79513.42

No half precision support! Skipped

Double-precision compute (GFLOPS)
  double   : 1398.84
  double2  : 1397.85
  double4  : 1394.48
  double8  : 1387.83
  double16 : 1374.64

Integer compute (GIOPS)
  int   : 44124.49
  int2  : 44080.14
  int4  : 43970.14
  int8  : 44089.10
  int16 : 44104.19

Integer compute Fast 24bit (GIOPS)
  int   : 44067.89
  int2  : 44081.56
  int4  : 44038.71
  int8  : 43851.83
  int16 : 43369.82

Integer char (8bit) compute (GIOPS)
  char   : 38655.31
  char2  : 38334.73
  char4  : 37103.88
  char8  : 30839.88
  char16 : 28388.27

Integer short (16bit) compute (GIOPS)
  short   : 36869.31
  short2  : 35287.81
  short4  : 36894.71
  short8  : 32896.40
  short16 : 28145.07

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 10.68
  enqueueReadBuffer               : 15.51
  enqueueWriteBuffer non-blocking : 10.08
  enqueueReadBuffer non-blocking  : 13.46
  enqueueMapBuffer(for read)      : 19.79
    memcpy from mapped ptr        : 11.54
  enqueueUnmap(after write)       : 25.13
    memcpy to mapped ptr          : 11.41

Kernel launch latency : 4.06 us
posted @   可乐马  阅读(11)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
点击右上角即可分享
微信分享提示