【2014-11-23】Heterogeneous Parallel Programming – Section 1 - sjtujoe - 博客园

【2014-11-23】Heterogeneous Parallel Programming – Section 1

Latency devices(CPU cores)
Throughput devices(GPU cores)
Use the best match for the job (heterogeneity in mobile SOC
CPU: Latency Oriented Design

Powerful ALU

Reduced operation latency

Large caches

convert long latency memory accesses to short latency cache accesses

Sophisticated control

Branch prediciton for reduced branch latency
Data forwarding for reduced data latency

GPU: Throughput Oriented Design

Small caches

To boost memory throughput

Simple control

No branch prediction
No data forwarding

Energy efficient ALUs

Many long latency but heavily pipelined for high throughput

Scalability

Portability

SPMD – Single Program, Multiple Data
Threads within a block cooperate via shared memory, atomic operation, barrier synchronization

posted on 2014-11-23 21:41 sjtujoe 阅读(243) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

导航

公告