Managing memory hierarchy in software is painful (Zz)
Problem:
local memory is scarce, e.g.
- Cell SPE: 256KB for code & data
- CSX600 PE: 6KB for data
data-movement hardware constraints, e.g. alignment
forces early optimisation (which is the root of all evil!), e.g.
- choosing data transfer/buffer sizes
- using double/tripple buffering schemes
optimisation is not portable and disrupts code base
Solution:
- use a suitable description of memory access patterns
- generate efficient data movement code automatically