[AWS GPU] Performance and pricing

Ref: Choosing the right GPU for deep learning on AWS

P系列,适合训练;

G系列,适合推理。

 

 

Amazon EC2 P4 instance product details

Ref: https://aws.amazon.com/ec2/instance-types/g4/

Ref: How can I get 65Tflops performance with NVIDIA T4

Speed calculation:
if->
On a GPU which can provide 18.7Tflops of performance YOLO runs at 160fps with 100% GPU utilization
Then
Then on a GPU which can provide 65Tflops of performance YOLO should run at 555 fps with 100% GPU utilization ( With no bttlenecks)

 

至少能处理6路视频,或者6个模型。Detecting 200 categories become possible!!!

Single Precision Floating Point Performance 8.1 TFLOPS (GPU Boost Clock) 60 fps
Mixed Precision (FP16 / FP32) 65 TFLOPS 550 fps
INT8-Precision 130 TOPS  
INT4-Precision 260 TOPS  

 

  Instance Size GPU vCPUs Memory (GB) Storage (GB) Network Bandwidth (Gbps) EBS Bandwidth (Gbps) On-Demand Price/hr* 1-yr Reserved Instance Effective Hourly* (Linux) 3-yr Reserved Instance Effective Hourly* (Linux)

G4dn

Single GPU VMs g4dn.xlarge 1 4 16 125 Up to 25 Up to 3.5 $0.526 $0.316 $0.210
g4dn.2xlarge 1 8 32 225 Up to 25 Up to 3.5 $0.752 $0.452 $0.300
g4dn.4xlarge 1 16 64 225 Up to 25 4.75 $1.204 $0.722 $0.482
g4dn.8xlarge 1 32 128 1x900 50 9.5 $2.176 $1.306 $0.870
g4dn.16xlarge 1 64 256 1x900 50 9.5 $4.352 $2.612 $1.740
                     
Multi GPU VMs g4dn.12xlarge 4 48 192 1x900 50 9.5 $3.912 $2.348 $1.564
g4dn.metal 8 96 384 2x900 100 19 $7.824 $4.694 $3.130

G4ad

Single GPU VMs g4ad.xlarge 1 4 16 150 Up to 10 Up to 3 $0.379 $0.227 $0.178
g4ad.2xlarge 1 8 32 300 Up to 10 Up to 3 $0.541 $0.325 $0.254
g4ad.4xlarge 1 16 64 600 Up to 10 Up to 3 $0.867 $0.520 $0.405
                     
Multi GPU VMs g4ad.8xlarge 2 32 128 1200 15 3 $1.734 $1.040 $0.810
g4ad.16xlarge 4 64 256 2400 25 6 $3.468 $2.081 $1.619
 

 

 

Amazon EC2 P3 instance product details

Ref: https://aws.amazon.com/ec2/instance-types/p3/

Ref: V100服务器和T4服务器的性能指标

Instance SizeGPUs - Tesla V100GPU Peer to PeerGPU Memory (GB)vCPUsMemory (GB)Network BandwidthEBS BandwidthOn-Demand Price/hr*1-yr Reserved Instance Effective Hourly*3-yr Reserved Instance Effective Hourly*
p3.2xlarge 1 N/A 16 8 61 Up to 10 Gbps 1.5 Gbps $3.06 $1.99 $1.05
p3.8xlarge 4 NVLink 64 32 244 10 Gbps 7 Gbps $12.24 $7.96 $4.19
p3.16xlarge 8 NVLink 128 64 488 25 Gbps 14 Gbps $24.48 $15.91 $8.39
p3dn.24xlarge 8 NVLink 256 96 768 100 Gbps 19 Gbps $31.218 $18.30 $9.64
 

V100 vs 2080 Ti

How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? 

Ref: hackmd.io 2080Ti and V100 Benchmarks 【非常好】

复制代码
2080 TI

Running warm up
Done warm up
Step   Img/sec                                     total_loss
1      images/sec: 115.1 +/- 0.0 (jitter = 0.0)    9.865
10     images/sec: 113.0 +/- 0.6 (jitter = 1.2)    9.741
20     images/sec: 112.8 +/- 0.4 (jitter = 1.5)    10.067
30     images/sec: 112.9 +/- 0.3 (jitter = 1.2)    9.834
40     images/sec: 112.9 +/- 0.2 (jitter = 1.1)    10.052
50     images/sec: 113.0 +/- 0.2 (jitter = 0.9)    9.889
60     images/sec: 113.0 +/- 0.2 (jitter = 1.0)    9.771
70     images/sec: 112.8 +/- 0.2 (jitter = 1.2)    9.697
80     images/sec: 112.6 +/- 0.2 (jitter = 1.3)    9.946
90     images/sec: 112.5 +/- 0.1 (jitter = 1.3)    9.611
100    images/sec: 112.3 +/- 0.1 (jitter = 1.6)    9.870
----------------------------------------------------------------
total images/sec: 112.24
----------------------------------------------------------------

V100

Running warm up
Done warm up
Step   Img/sec                                     total_loss
1      images/sec: 122.4 +/- 0.0 (jitter = 0.0)    9.924
10     images/sec: 123.3 +/- 0.6 (jitter = 2.2)    9.732
20     images/sec: 124.4 +/- 0.4 (jitter = 1.6)    10.058
30     images/sec: 124.6 +/- 0.3 (jitter = 1.0)    9.818
40     images/sec: 124.9 +/- 0.2 (jitter = 0.9)    10.044
50     images/sec: 125.1 +/- 0.2 (jitter = 1.0)    9.893
60     images/sec: 125.0 +/- 0.2 (jitter = 1.1)    9.798
70     images/sec: 125.1 +/- 0.2 (jitter = 1.1)    9.733
80     images/sec: 125.1 +/- 0.2 (jitter = 1.1)    9.947
90     images/sec: 125.1 +/- 0.1 (jitter = 1.1)    9.631
100    images/sec: 125.1 +/- 0.1 (jitter = 1.2)    9.861
----------------------------------------------------------------
total images/sec: 125.05
----------------------------------------------------------------
复制代码

 

两张2080 ti,看来收敛快乐一点点~

复制代码
Running warm up
Done warm up
Step   Img/sec                                     total_loss
1      images/sec: 205.7 +/- 0.0 (jitter = 0.0)    9.789
10     images/sec: 206.1 +/- 0.3 (jitter = 1.3)    9.812
20     images/sec: 205.9 +/- 0.4 (jitter = 1.5)    9.996
30     images/sec: 205.9 +/- 0.3 (jitter = 1.5)    9.851
40     images/sec: 205.6 +/- 0.3 (jitter = 1.3)    10.102
50     images/sec: 205.5 +/- 0.2 (jitter = 1.3)    9.877
60     images/sec: 205.3 +/- 0.2 (jitter = 1.5)    9.866
70     images/sec: 205.2 +/- 0.2 (jitter = 1.4)    9.916
80     images/sec: 205.1 +/- 0.2 (jitter = 1.5)    9.897
90     images/sec: 205.1 +/- 0.2 (jitter = 1.5)    9.799
100    images/sec: 205.0 +/- 0.2 (jitter = 1.5)    9.787
----------------------------------------------------------------
total images/sec: 204.94
----------------------------------------------------------------
复制代码

 

 

总结

训练10小时,30.6美金;训练十次,306美金(430澳币 or 1960人民币)

 

posted @   郝壹贰叁  阅读(134)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
点击右上角即可分享
微信分享提示