[AWS GPU] Performance and pricing
Ref: Choosing the right GPU for deep learning on AWS
P系列,适合训练;
G系列,适合推理。
Amazon EC2 P4 instance product details
Ref: https://aws.amazon.com/ec2/instance-types/g4/
Ref: How can I get 65Tflops performance with NVIDIA T4
Speed calculation:
if->
On a GPU which can provide 18.7Tflops of performance YOLO runs at 160fps with 100% GPU utilization
Then
Then on a GPU which can provide 65Tflops of performance YOLO should run at 555 fps with 100% GPU utilization ( With no bttlenecks)
至少能处理6路视频,或者6个模型。Detecting 200 categories become possible!!!
Single Precision Floating Point Performance | 8.1 TFLOPS (GPU Boost Clock) | 60 fps |
Mixed Precision (FP16 / FP32) | 65 TFLOPS | 550 fps |
INT8-Precision | 130 TOPS | |
INT4-Precision | 260 TOPS |
Instance Size | GPU | vCPUs | Memory (GB) | Storage (GB) | Network Bandwidth (Gbps) | EBS Bandwidth (Gbps) | On-Demand Price/hr* | 1-yr Reserved Instance Effective Hourly* (Linux) | 3-yr Reserved Instance Effective Hourly* (Linux) | |
G4dn |
||||||||||
Single GPU VMs | g4dn.xlarge | 1 | 4 | 16 | 125 | Up to 25 | Up to 3.5 | $0.526 | $0.316 | $0.210 |
g4dn.2xlarge | 1 | 8 | 32 | 225 | Up to 25 | Up to 3.5 | $0.752 | $0.452 | $0.300 | |
g4dn.4xlarge | 1 | 16 | 64 | 225 | Up to 25 | 4.75 | $1.204 | $0.722 | $0.482 | |
g4dn.8xlarge | 1 | 32 | 128 | 1x900 | 50 | 9.5 | $2.176 | $1.306 | $0.870 | |
g4dn.16xlarge | 1 | 64 | 256 | 1x900 | 50 | 9.5 | $4.352 | $2.612 | $1.740 | |
Multi GPU VMs | g4dn.12xlarge | 4 | 48 | 192 | 1x900 | 50 | 9.5 | $3.912 | $2.348 | $1.564 |
g4dn.metal | 8 | 96 | 384 | 2x900 | 100 | 19 | $7.824 | $4.694 | $3.130 | |
G4ad |
||||||||||
Single GPU VMs | g4ad.xlarge | 1 | 4 | 16 | 150 | Up to 10 | Up to 3 | $0.379 | $0.227 | $0.178 |
g4ad.2xlarge | 1 | 8 | 32 | 300 | Up to 10 | Up to 3 | $0.541 | $0.325 | $0.254 | |
g4ad.4xlarge | 1 | 16 | 64 | 600 | Up to 10 | Up to 3 | $0.867 | $0.520 | $0.405 | |
Multi GPU VMs | g4ad.8xlarge | 2 | 32 | 128 | 1200 | 15 | 3 | $1.734 | $1.040 | $0.810 |
g4ad.16xlarge | 4 | 64 | 256 | 2400 | 25 | 6 | $3.468 | $2.081 | $1.619 |
Amazon EC2 P3 instance product details
Ref: https://aws.amazon.com/ec2/instance-types/p3/
Ref: V100服务器和T4服务器的性能指标
Instance Size | GPUs - Tesla V100 | GPU Peer to Peer | GPU Memory (GB) | vCPUs | Memory (GB) | Network Bandwidth | EBS Bandwidth | On-Demand Price/hr* | 1-yr Reserved Instance Effective Hourly* | 3-yr Reserved Instance Effective Hourly* |
---|---|---|---|---|---|---|---|---|---|---|
p3.2xlarge | 1 | N/A | 16 | 8 | 61 | Up to 10 Gbps | 1.5 Gbps | $3.06 | $1.99 | $1.05 |
p3.8xlarge | 4 | NVLink | 64 | 32 | 244 | 10 Gbps | 7 Gbps | $12.24 | $7.96 | $4.19 |
p3.16xlarge | 8 | NVLink | 128 | 64 | 488 | 25 Gbps | 14 Gbps | $24.48 | $15.91 | $8.39 |
p3dn.24xlarge | 8 | NVLink | 256 | 96 | 768 | 100 Gbps | 19 Gbps | $31.218 | $18.30 | $9.64 |
V100 vs 2080 Ti
How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price?
Ref: hackmd.io 2080Ti and V100 Benchmarks 【非常好】
2080 TI
Running warm up
Done warm up
Step Img/sec total_loss
1 images/sec: 115.1 +/- 0.0 (jitter = 0.0) 9.865
10 images/sec: 113.0 +/- 0.6 (jitter = 1.2) 9.741
20 images/sec: 112.8 +/- 0.4 (jitter = 1.5) 10.067
30 images/sec: 112.9 +/- 0.3 (jitter = 1.2) 9.834
40 images/sec: 112.9 +/- 0.2 (jitter = 1.1) 10.052
50 images/sec: 113.0 +/- 0.2 (jitter = 0.9) 9.889
60 images/sec: 113.0 +/- 0.2 (jitter = 1.0) 9.771
70 images/sec: 112.8 +/- 0.2 (jitter = 1.2) 9.697
80 images/sec: 112.6 +/- 0.2 (jitter = 1.3) 9.946
90 images/sec: 112.5 +/- 0.1 (jitter = 1.3) 9.611
100 images/sec: 112.3 +/- 0.1 (jitter = 1.6) 9.870
----------------------------------------------------------------
total images/sec: 112.24
----------------------------------------------------------------
V100
Running warm up
Done warm up
Step Img/sec total_loss
1 images/sec: 122.4 +/- 0.0 (jitter = 0.0) 9.924
10 images/sec: 123.3 +/- 0.6 (jitter = 2.2) 9.732
20 images/sec: 124.4 +/- 0.4 (jitter = 1.6) 10.058
30 images/sec: 124.6 +/- 0.3 (jitter = 1.0) 9.818
40 images/sec: 124.9 +/- 0.2 (jitter = 0.9) 10.044
50 images/sec: 125.1 +/- 0.2 (jitter = 1.0) 9.893
60 images/sec: 125.0 +/- 0.2 (jitter = 1.1) 9.798
70 images/sec: 125.1 +/- 0.2 (jitter = 1.1) 9.733
80 images/sec: 125.1 +/- 0.2 (jitter = 1.1) 9.947
90 images/sec: 125.1 +/- 0.1 (jitter = 1.1) 9.631
100 images/sec: 125.1 +/- 0.1 (jitter = 1.2) 9.861
----------------------------------------------------------------
total images/sec: 125.05
----------------------------------------------------------------
两张2080 ti,看来收敛快乐一点点~
Running warm up
Done warm up
Step Img/sec total_loss
1 images/sec: 205.7 +/- 0.0 (jitter = 0.0) 9.789
10 images/sec: 206.1 +/- 0.3 (jitter = 1.3) 9.812
20 images/sec: 205.9 +/- 0.4 (jitter = 1.5) 9.996
30 images/sec: 205.9 +/- 0.3 (jitter = 1.5) 9.851
40 images/sec: 205.6 +/- 0.3 (jitter = 1.3) 10.102
50 images/sec: 205.5 +/- 0.2 (jitter = 1.3) 9.877
60 images/sec: 205.3 +/- 0.2 (jitter = 1.5) 9.866
70 images/sec: 205.2 +/- 0.2 (jitter = 1.4) 9.916
80 images/sec: 205.1 +/- 0.2 (jitter = 1.5) 9.897
90 images/sec: 205.1 +/- 0.2 (jitter = 1.5) 9.799
100 images/sec: 205.0 +/- 0.2 (jitter = 1.5) 9.787
----------------------------------------------------------------
total images/sec: 204.94
----------------------------------------------------------------
总结
训练10小时,30.6美金;训练十次,306美金(430澳币 or 1960人民币);
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律