【2014-11-23】Heterogeneous Parallel Programming – Section 1

  1. Latency devices(CPU cores)
  2. Throughput devices(GPU cores)
  3. Use the best match for the job (heterogeneity in mobile SOC
  4. image
  5. image
  6. CPU: Latency Oriented Design
    • Powerful ALU
      • Reduced operation latency
    • Large caches
      • convert long latency memory accesses to short latency cache accesses
    • Sophisticated control
      • Branch prediciton for reduced branch latency
      • Data forwarding for reduced data latency
  7. GPU: Throughput Oriented Design
    • Small caches
      • To boost memory throughput
    • Simple control
      • No branch prediction
      • No data forwarding
    • Energy efficient ALUs
      • Many long latency but heavily pipelined for high throughput
  8. Scalability
    • image
  9. Portability
    • image
  10. SPMD – Single Program, Multiple Data
  11. Threads within a block cooperate via shared memory, atomic operation, barrier synchronization
  12. image

posted on 2014-11-23 21:41  sjtujoe  阅读(243)  评论(0编辑  收藏  举报