[文章翻译] Software Controls Cache Memory to Speed CPUs
报道地址在这里:http://spectrum.ieee.org/semiconductors/memory/software-controls-cache-memory-to-speed-cpus
该链接当中同时也是提供了PACT‘13上发表的这篇论文;
副标题:
Letting the operating system control cache memory management saves power too
让操作系统管理cache还会带来节能效果;
进入正文:
A new process for managing the fast-access memory inside a CPU has led to as much as a twofold speedup and to energy-use reductions of up to 72 percent. According to its designers, realizing such stunning gains requires a big shift in what part of the computer controls this crucial memory: Right now that control is hard-wired into the CPU’s circuitry, but the substantial speedup came when the designers let the operating system handle things instead.
一种新的在cache管理的的进展获得了2X的加速,同时将能耗降低72%.据设计者介绍,这种出色的(性能)收益需要对于cache管理思路作出大幅调整:传统的cache管理都是基于CPU硬件,但是如果让OS来接手控制cache,那么就会获得大幅度的加速比;
(这里原文有张配图,是关于AMD的CPU架构图:)
The CPU uses high-speed internal memory caches as a kind of digital staging area. Caches are a CPU’s workbench, whether they’re holding onto instructions a CPU may need soon or data it may need to crunch. And from smartphones to servers, nearly every CPU today manages the flow of bits in and out of its caches using algorithms built into its own circuits.
CPU使用高速中间缓存cache作为一种数据缓冲区域。cache就是CPU的工作台,不论cache中是否存有CPU即将使用的指令或者CPU可能用到的数据。从智能手机到服务器,基于所有的CPU架构当中管理CPU cahce的逻辑都是写入电路固件当中了。
But, say two MIT researchers, as computers and portable devices accumulate more and more memory and CPU cores, it makes less and less sense to leave cache management entirely up to the CPU. Instead, they say, it might be better to let the operating system share the burden.
但是,两个来自MIT的熊孩子,发现随着从计算机到便携设备当中CPU核数以及内存不断增长,将所有cache管理工作完全交由CPU硬件来管理不再合理;相反,他们认为该轮到OS出场来承担cache的管理任务;
In itself, this idea is not completely new. Some of IBM’s Cell processors, as well as Sony’s PlayStation 3—which runs on Cell technology—allow their applications and OS kernels to fiddle with low-level CPU memory management. What’s new about the MIT technology, called Jigsaw, is its middle-ground approach, which enables software to configure some on-chip memory caches but without requiring so much control that programming becomes a memory-management nightmare.
这种想法在事实上上并非全新。一些IBM Cell处理器 以及索尼的PlayStation 3,使用了Cell技术来允许应用程序和OS内核来操纵底层的CPU内存管理。而MIT的做法Jigsaw,则是一种折中的做法,它使得程序只需要做对于部分片上缓存进行配置,而不必获取更多权限去完成所有的内存管理工作;
“If you go back six or seven years, you’ll see that everybody was complaining that they launched the PlayStation 3 and nobody could program it well,” says Daniel Sanchez, the assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory and one of the inventors of Jigsaw.
“如果你倒回6、7年前,你会发现人人都在抱怨当他们载入运行PlayStation 3后没有人能正常进行编程”,出自Daniel Sanchez,MIT计算机系 AI实验室的助理教授,同时也是Jigsaw的作者之一。
Today, CPU hardware typically controls all the on-chip caches. So those caches must be designed to handle every conceivable job, from pure floating-point number crunching (which places a small burden on caches) to intensive searches and queries of a computer’s memory banks (which can stretch their limits). Moreover, CPUs have no higher-level knowledge of the kinds of jobs they’re doing. This means a self-contained numerical simulation with complex equations but little need for memory access would run with exactly the same cache resources as would a graph search, a memory-hogging hunt for relationships between stored data.
如今,CPU硬件控制着所有的片上缓存。而这些cahce被设计用于能够处理各种可能的任务,从纯浮点数计算到密集的搜索查询操作。并且,CPU对于他们所处理的任务不加区分,这就意味着对于那些独立的包含复杂方程计算的数字模拟计算,即使他们几乎不需要访存操作,但是在运行特征上也会像图形关系检索这种耗内存的操作一样,使用相同的cache管理策略。
So Sanchez and his graduate student Nathan Beckmann thought, Why not let the OS trim the cache size for pure computation and swell its ranks for graph search?
有鉴于此,Sanchez教授和他的学生Nathan Beckmann认为,为什么不利用OS来将分配给纯计算的cache规模进行压缩,而将节省出来的部分补充给耗存图的搜索操作呢?
The first step, they say, would be to give perhaps 1 percent of the CPU’s footprint to a simple piece of hardware that could monitor in real time the cache activity in each core. Hardware cache monitors would give Jigsaw the independent oversight it would need to play air traffic controller with the CPU’s caches.
第一步,据这两位作者描述,他们会CPU中大约1%的存储空间分配个一个简单的硬件用于实时监控各个CPU核中的cache活动。硬件的cache监视机器会给Jigsaw独立的视角,以便他可以方便监视CPU中所有的chache;[没翻译好]
Second, Sanchez and Beckmann say, the OS’s kernel needs at most a few thousand more lines of code. That’s not much of an addition, considering that Linux’s kernel in 2012 weighed in with 15 million lines and Apple’s and Microsoft’s kernels unofficially contained tens of millions more than that.
在此之后,Sanchez和Beckmann宣称,OS的内核需要至多数万行的代码。that's not much of an addition, 考虑到在2012年时Linux 内核拥有1500万行代码量,而Apple和Microsoft的内核代码量据非官方估计也至少得有数千万行。
One of Jigsaw’s more prominent features is a software module, to be folded in with the OS, that the researchers call Peekahead. This module was adapted from the Lookahead Cache, developed more than a decade ago by Beijing computer scientists. Peekahead computes the best configuration of CPU caches based on the upcoming jobs it expects the cores to do in the coming clock cycles.
Jgisaw当中一个著名的特征是一个名为Peekahead的封装于OS当中的软件模块。该模块由Lookahead Cache来调用。 而前向cache 这项技术则是由Beijing的计算机科学家在十年前发明的。Peekahead是根据在下一时钟周期处理当前即将到来的计算任务的处理核的信息,来对于CPU当中cache进行一个最优的配置。
“When you let software be in charge, you have to be careful of your overhead,” Sanchez says. A poorly designed cache management system, he says, might trim the cache to its optimum size and do it again every fraction of a second. But doing so taxes the CPU. And what’s the point of a CPU efficiency algorithm that requires extraordinary amounts of CPU time? “The exact solution is really expensive. So we have to come up with a quick way of getting the job done so that the overhead doesn’t negate the gains you get,” he says.
“当你使用软件进行控制,你就必须留心引入的新的负载”,Sachez介绍道。一个设计糟糕的cache管理系统,可能会经常将cache裁剪为最佳的规模,但是此举可能会增加CPU负担。一个有效利用CPU的算法是不能消耗大量CPU时间的。否则这种解决方案成本高昂。所以我们必须设计出一种带来整体性能提升的算法。
Linley Gwennap of the Linley Group, a semiconductor consulting firm based in Mountain View, Calif., says he’s impressed with Jigsaw but cautions that it’s not quite ready for chip-fab prime time. “The problem is generally that a scheme that’s effective on one processor may not be effective on another processor with a different hardware design,” he says. “Every time the processor changes, you have to redo your software, which customers generally don’t like.”
位于Calif Mountain View的半导体咨询公司Linley Gwennap,对于Jigsaw持保留意见,认为它并不适合当前芯片领域;“问题在于这种机制只是在一种处理器上适用,但是未必适合其他种类的处理器。而且一旦处理器的设计改变,你必须重新设计软件,而这样是用户所不愿看到的“。
Sanchez counters that software applications and utilities would remain unaffected by Jigsaw. “Only the operating system code needs to be aware of that intimate knowledge of the hardware, like the topology of the different portions of the cache,” he says.
作为回应,Sanchez认为软件应用等对于Jigsaw并无影响。“只有OS需要与底层硬件进行肉搏,例如chace的拓扑结构”。
Jason Mars, an assistant professor of computer science at the University of Michigan, says Jigsaw works well as a proof of concept, which he says chipmakers might adapt as they see fit.
来自UM的围观群众Jason Mars(不是那位I‘m yours的歌手???),认为Jigsaw作为POC是没有问题的,而各个芯片制造商可以考虑是否将其采用该技术;
“The crisp novelty in this work has to do with the codesign between hardware and software,” Mars says. “Much of the prior work was biased in one direction. More was expected to be done in hardware, and there was a little bit less flexibility. Jigsaw really...builds a holistic system that spans both the hardware and the software.”
“这项工作的创新之处在于需要系统设计硬件和软件。”Mars补充说,“大多以往的工作都是偏重于一个方向。更多的是期望在硬件上做功课,但是那种方式缺乏灵活性。 Jigsaw却是做了通盘的考虑,使得该系统可以在软件和硬件两方面进行扩展。”
=======================