关于最近的cuda原子操作问题

一定一定得避免原子操作,因为对于性能的影响实在是太明显了,例如,throughput从800MBps骤降至110MBps,

看论坛是看到有人转述的一筒子的话,记录于下:

honestly, if you're trying to do this you're probably going down the wrong path, but general rules of thumb are

- don't have multiple threads within a warp contending for a lock, that leads to all sorts of confusing issues for most people because inter-warp branches are not the same as intra-warp branches
- avoid global memory contention as much as possible (e.g., if you need to have a critical section among all warps in all CTAs, do per-CTA shared memory locks then a global lock)
- traditional threading primitives implemented with atomics are a pretty terrible idea, if you can avoid atomics as much as possible (or entirely) you can get a big perf win (and there are very interesting ways you can do this, and when I say big perf win, I mean on the order of 5-10x)

("well," you think, "it sounds like tim is speaking from experience!" oh yes, I am)

posted on 2011-07-30 12:01  馒头山小八路  阅读(1388)  评论(0编辑  收藏  举报

导航