intel cpu微架构之bonnell架构-(设计目标针对超便携移动个人电脑,嵌入式设备等)

翻译自:https://en.wikichip.org/wiki/intel/microarchitectures/bonnell

Bonnell was a microarchitecture for Intel's 45 nm ultra-low(超低) voltage microprocessors first introduced in 2008 for their then-new Atom family.

Bonnell, which was named after the highest point in Austin - Mount Bonnell, was Intel's first x86-compatible(兼容x86) microarchitecture 

designed to target the ultra-low power market.

1.nm即纳米, 45nm指的就是芯片生产工艺
2.晶体管的工作原理其实很简单,就是用两个状态表示二进制的“0”和“1”。源极和漏极之间是沟道,
当没有对栅极(G)施加电压的时候,沟道中不会聚集有效的电荷,源极(S)和漏极(D)之间不会有有效电流产生,
晶体管处于关闭状态。可以把这种关闭的状态解释为“0”,当对栅极(G)施加电压的时候,沟道中会聚集有效的电荷,
形成一条从源极(S)到漏极(D)导通的通道,晶体管处于开启状态,可以把这种状态解释为“1”。这样二进制的两个状态就由晶体管的开启和关闭状态表示出来了。
3.我们可以把栅极比喻为控制水管的阀门,开启让水流过,关闭截止水流。晶体管的开启/关闭的速度就是我们说的频率,如果主频是1GHz,也就是晶体管可以在1秒钟开启和关闭的次数达10亿次
从65nm开始,我们已经无法让栅极介电质继续缩减变薄,而且到45nm,晶体管的尺寸要进一步缩小,源极和漏极也靠得更近了,如果不能解决栅极向下的漏电问题以及源极和漏极之间的漏电问题,
新一代处理器的问世可能变得遥遥无期
4.处理器性能的不断提高离不开优秀的核心微架构设计,而芯片生产工艺的更新换代是保证不断创新设计的处理器变为现实的基础。
每一次制作工艺的更新换代都给新一轮处理器高速发展铺平了大道。
因为线宽越小,晶体管也越小,让晶体管工作需要的电压和电流就越低,
晶体管开关的速度也就越快,这样新工艺的晶体管就可以工作在更高的频率下,随之而来的就是芯片性能的提升。

 

Bonnell (project Silverthorne then) was designed by a then-new low-power design team 
Bonnell 由当时的超低压设计团队设计

Intel created at their Texas Development Center in Austin in 2004 along with a new chipset (Poulsbo) design team.

该团队和一个新的芯片组设计团队,于2004年,在位于奥斯汀的德克萨斯开发中心开发了Bonnell 

The design team was led by Elinora Yoeli. 这个设计团队的leader是爱琳诺菈.尤里

While Yoeli previously worked at her native country,  尤里之前工作在它的国家

Bonnell was a US design and was unconnected to any of Intel's projects worked on by the Israel Design Center in Haifa.
Bonnell 是美国设计的,它和海法的以色列设计中心没有一点关系.

Previously Yoeli led the Israeli team in the development of Pentium M.  以前尤里领导以色列团队开发了奔腾M系列

PlatformChipset(芯片组开发代号)Core(研发代号)-处理器Target(面向的设备)
Menlow平台 Poulsbo

Silverthorne(基于Atom)

SliverThorne是Intel为移动设备市场所开发的一款新一代CPU

MIDs(多功能信息分发系统; 移动互联网设备; 多功能信息分配系统; 功能信息分发传输系统; 装置;)
Menlow平台 Poulsbo
Diamondville(基于Atom处理器) Nettops
Moorestown平台 Langwell
Lincroft(Lincroft是内部集成了Atom处理器核心) MIDs
Pine Trail平台 Tiger Point
Pineview Nettops(nettop通常比全台式计算机或笔记本电脑更小,所需的功率更少)
Queens Bay平台 Topcliff
Tunnel Creek Embedded(嵌入式设备)
Queens Bay平台 Topcliff
Stellarton Embedded + Altera FPGA
    Sodaville CE
    Groveland CE
    Elk Rock CE

 

Generation successor 代传

First Generation(第一代) Second Generation(第二代) Third Generation(第三代)
Silverthorne Lincroft
Diamondville Pineview
    Tunnel Creek
    Stellarton
    Sodaville Groveland

Brands 

Intel sold Bonnell-based processors under the Atom brand. Additionally, manufacturers were allowed to use the Centrino Atom brand if the system consist of a Bonnell-based processor, the chipset, wireless capabilities (WiFi3GWiMAX), is battery powered, and had a screen size of up to 6".

英特尔以Atom品牌出售基于Bonnell的处理器,另外,如果系统是由基于Bonnell处理器,芯片组,无线功能,电池的组成等,生产商也被允许使用  Centrino Atom(迅驰 凌动)品牌

 

Release Dates (发布日期)

The Atom family was officially announced on March 2, 2008 under the Intel Atom and Intel Centrino Atom brands.
Bonnell was first introduced on April 2nd 2008 during the Intel Developers Forum in Shanghai.  

Process Technology (工艺技术)

45 nm Manufacturing Fabs
FabLocation
D1D Hillsboro, Oregon
Fab 32 Chandler, Arizona
Fab 28 Kiryat Gat, Israel

Bonell is designed to be manufactured using a 45 nm process.

Bonell被设计用于生产工艺为45 nm的处理器上

Intel's 45 nm process is the first high-volume manufacturing process to introduce High-k + metal gate transistors.
intel的45纳米工艺是第一个大容量的制作工艺,引入了High-k+金属栅晶体管(High-K金属栅极工艺,用来提高栅极控制能力,可以看下Low-K和High-K)

 

intel 45nm transistor.png
 Bonnell的指标
  45 nm
Gate Pitch 180 nm
Interconnect Pitch 160 nm
SRAM bit cell (HD)​ 0.346 µm²
SRAM bit cell (LP)​ 0.3816 µm²
 

Compatibility(能力)

Vendor( 供应商OS(操作系统)VersionNotes(提示)
Microsoft Windows Windows XP Embedded SP2 Support
Windows Embedded CE 6.0 Support
Windows 7 Support
Windows Embedded Standard 7 Support
Linux Linux Kernel 2.4/2.6? Initial Support
MeeGo 1 Support

Compiler support(编译器支持)

CompilerArch-SpecificArch-Favorable
GCC -march=bonnell -mtune=bonnell
LLVM -march=bonnell -mtune=bonnell
Visual Studio /arch:SSE3

Architecture(架构)

 
Silverthorne processor next to a penny.
Bonnell features a brand new(全新,崭新) architecture not based on any previous Intel design.
Bonnell功能采用了一种全新的体系架构,不基于任何以前的intel设计 The architecture was specifically designed
for ultra-mobile PCs (UMPCs), mobile internet devices (MID), and other embedded devices.
这个架构明确设计是为 超移动计算机,手机互联设备和其他嵌入式设备.

Bonnell's primary goals were(Bonnell的主要目标是):

1.Reduce power consumption,  //减少电量消耗
2.while staying fully x86-compatible, //保持兼容性
3.at acceptable performance    //可接受的新能
Performance/Power new rule: +1% performance for at most +1% power consumption.
In additional to full-x86 compatibility and power requirements, Bonnell was also required to maintain 100% compatibility 
with Intel's Core architecture (specficially the then-new Core 2 Duo processors.

 

Architecture (架构) 

  • Strictly ultra-low power 超低压
    • 90%+ lower power than 90 nm Pentium M (比90纳米的奔腾M更低的功耗)
  • 45 nm process(45 纳米工艺), 9 metal layers, CMOS
  • 500 mW to 2 W TDP(热设计是500 兆瓦 to 2瓦特
    • Average 220 mW  普通220兆瓦
    • Idle under 80 mW    80兆瓦以下闲置
  • 533 MT/s dual mode (GTL & CMOSFSB   533 MT/秒 双模 FSB
  • In-order 顺序指令
  • 2-issue decode 二期解码
  • Simple 2-way SMTSMT表面组装技术(表面贴装技术),英文全称为Surface Mount Technology,是目前电子组装行业里最流行的一种技术和工艺
  • Instruction Queue of 32 entries (16 entries/thread) 32个条目的指令队列
  • FP Register File (per thread)--FP寄存器文件(register file)又称寄存器堆,是CPU中多个寄存器组成的阵列,通常由快速的静态随机读写存储器(SRAM)实现
  • Integer Register File (per thread) -整数寄存器文件(缩写IRF)
  • Private L1 cache for each core (每核都配置私有的L1缓存)
  • Shared L2 cache for the entire chip(对于整个芯片,L2缓存是共享的)

The number of functional units were kept to minimum to cut on power consumption.

  • address generation units (AGUs)
  • 2 Integer ALUs (1 for jumps, 1 for shifts)
  • 2 FP ALUs (1 adder, 1 for others)--ALU是算术逻辑单元(FP即floating-point)
  • No Integer multiplier & divider (shared with FP ALU instead)

Block Diagram (示意图)

bonnell block diagram.svg

Memory Hierarchy(内存层次结构)

  • Cache
    • Hardware prefetchers
    • C6 cache
      • 10.5 KiB array to hold the architectural state during deep power down state
      • 1-read, 1-write ported
    • L1 Instruction Cache
      • 36 KiB
      • 1 read and 1 write port
      • 8 transistors (instead of 6) to reduce voltage
      • 1-bit pairty (but no ECC)
    • L1 Data Cache
      • 24 KiB
        • 6-way set associative
      • 1 read and 1 write port
      • 8 transistors (instead of 6) to reduce voltage
      • 1-bit pairty (but no ECC)
      • Per core
    • L2 Cache:
      • 512 KiB 8-way set associative
      • ECC support
      • Shrinkable from 512 KiB to 128 KiB (2-way)
      • 64-bit cache line
      • Per core
    • Tag/LRU/State bit
      • All in one array
      • Tag + data = 4.5 KiB + 17.5 KiB
    • L3 Cache:
      • No level 3 cache
    • RAM
      • Maximum of 2 GiB, 4 GiB, and 8 GiB

Note that the L1 cache for data and instructions were originally both 32 KiB (8-way), however due to power restrictions, the L1d$ was later reduced to 24 KiB.

  • TLB
    • ITLB
      • 32-entry
      • fully associative
    • DTLB
      • 4 KiB PAges
        • 64-entry TLB
          • 4-way set associative
        • 16-entry micro-TLB
          • fully associative
          • duplicated for each thread
        • 16-entry PDE cache
          • fully associative
      • Large Pages
        • 8 entries, 4-way set associative

Overview(概述)

Bonnell's architecture shares very little in common with(共同) other Intel designs.
Bonnell的体系结构与其他英特尔设计几乎没有共同之处。

To achieve the strict ultra-low power objects,

为达到严格的超低压功耗目标

Bonnell features a very slimmed down design discarding many high-performance techniques used by Intel's high-performance architectures such as aggressive speculative executionout-of-order execution, and µop transformation.

Part of the design requirement was that Bonnell retain full x86 compatibility, up to the latest extension - at one tenth of the power consumption of the Pentium M.

This meant any software is now 100% compatible but it forced engineers to deal with all the baggage the architecture brought along.

The decision to offer full compatibility brought its own set of benefits such as access to the largest software code base in the world, including the ability to run any other x86 operating system unmodified. At the same time it forced the design team to resort to other means of reducing power.

Up to Bonnell, all of Intel's existing architectures put very low priority on power efficiency (note that this has significantly changed since the introduction of Sandy Bridge). High-performance, high-throughput, complex designs are simply inadequate for the kind of power goals required out of Bonnell, even if they were trimmed down. It was decided that Bonnell would be designed from the scratch with power goals in mind. For those reasons Bonnell resembles the P5 microarchitecture.

Pipeline

Much like the original P5 microarchitecture, Bonnell consists of an in-order dual-issue pipeline. The pipeline is shown below. Note the pipeline is duplicated for dual-issue execution.

 

bonnell pipeline.svg


Unlike P5, which only had 5 stages, Bonnell has 16 to 19 pipeline stages. The longer pipeline allows a more even spreading of heat across the chip with more units. This also allows a higher clock rate.

Front End 

Bonnell's front end is very simple when compared to Intel's high-performance architectures. Out-of-order execution (OoOE) that is found ubiquitously in all HPC architectures was rejected. Bonnell's power and area constraints simply couldn't allow for the complex logic needed to support that capability. The Instruction Fetch consists of 3 stages, capable of going through up to 16 bytes per cycle. Like fetch, the Instruction Decode is also 3 stages, capable of decording instructions with up to 3 prefixes each cycle (considerably longer for more complex instructions).

Bonnell is a departure from all modern x86 architectures with respect to decoding (including those developed by AMD and VIA and every Intel architecture since P6). Whereas modern architectures transform complex x86 instructions into a more easily digestible µop form, Bonnell does almost no such transformations. The pipeline is tailored to execute regular x86 instructions as single atomic operations consisting of a single destination register and up to three source-registers (typical load-operate-store format). Most instructions actually correspond very closely to the original x86 instructions. This design choice results in lower complexity but at the cost of performance reduction. Bonnell has two identical decoders capable of decoding complex x86 instructions. Being variable length instruction architecture introduces an additional layer of complexity. To assist the decoders, Bonnell implements predecoders that determine instruction boundaries and mark them using a single-bit marker. Two cycles are allocated for predecoding as well as L1 storage. Boundary marks are also stored in the L1 eliminating the need to preform needlessly redundant predecoding. Repeated operations are retrieved pre-marked eliminating two cycles. Bonnel has a 36 KiB L1 instruction cache consisting of 32 KiB instruction cache and 4 KiB instruction boundary mark cache. All instructions (coming from both cache or predecode) must undergo full decode. It's worthwhile noting that Intel states Bonnell is a 16-stage pipeline because for the most part, after a cache hit you'll have 16 stages. This is also true in some cases where the processor can simultaneously decode the next instruction. However, in the cases where you get a miss, it will cost 3 additional stages to catch up and locate the boundary for that instruction for a total of 19 stages.

Some x86 instructions are simply too complex to handle directly. Those selected few get diverted into the micro-code sequencer ROM (MSROM) for decoding producing much more sane RISCish instructions at the cost of 2 additional cycles. Intel estimates that only 5% of common software require instructions to be split up. Only decoder0 can request transfer to use the MSROM. All instructions longer than 8 bytes or instructions having more than three prefixes will result in a MSROM transfer unconditionally. Those instructions will experience two cycles of delay. The inability to execute things out-of-order eliminates lots of optimization opportunities at this stage. One thing Bonnell can do is lockstep instructions that can be execute simultaneously such as in the case of instructions that performance a memory access along an arithmetic operation. In those instances Bonnell will issue the instruction as if it were two separate instructions executing simultaneously. In addition, only one x87 instruction can be decoded per cycle.

Because Bonnell has support for Hyper-Threading, Intel's brand name for their own simultaneous multithreading technology, a number of modifications had to be done. The prefetch buffer and the instruction queue have been duplicated for each thread.

Branch predicto

No aggressive speculative execution is done in Bonnell, however it does implements a light-weight Gshare branch predictor consisting of a two-level adaptive predictor with a 12-bit global history table. The pattern history table has 4096 entries and is competitively shared between threads. The branch buffer target has 128 entries (4-way by 32 sets). While unconditional jumps are not recorded in the table, always-taken and never-taken jumps do.

The branch-misprediction penalty is 11 to 13 cycles. Some of the rare or complex x86 instructions will detour into a microcode sequencer for decoding, necessitating two additional clock cycles. Additionally there is a roughly 7 cycle penalty for correctly predicted branches but no target can be predicted because of a missing branch target buffer (BTB) entry. Bonnell return stack buffer is 8-entry deep.

Back End

Each cycle two instructions are dispatched in-order. The scheduler can take a pair of instructions from a single thread or across threads. Bonnell in-order back-end resembles a traditional early 90s design featuring a dual ALU, a dual FPU and a dual AGU. Similarly to the front-end, in order to accommodate simultaneous multithreading, the Bonnell design team chose to duplicate both the floating-point and integer register files. The duplication of the register files allows Bonnell to perform context switching on each stage by maintaining duplicate states for each thread. The decision to duplicate this logic directly results in more transistors and larger area of the silicon. Overall implementing SMT still required less power and less die area than the other heavyweight alternatives (i.e., out-of-order and larger superscaler). Nonetheless the total register file area accounts for 50% of the entire core's die area which was single-handedly an important contributor to the overall chip power consumption.

FP/SIMD execution Cluster
SIMD/FP Execution Cluster Ports
Port 0Port 1
SIMD ALU
(128-bit / 64-bit int)
SIMD ALU
(128-bit)
Shuffle unit
(128-bit / 64-bit int)
FP Adder
SIMD/FP multiply unit
(128-bit / 64-bit int)
Divide unit (support IMUL, IDIV)

In the further pursuit of power saving specialized execution units were minimized as much as possible. Bonnell's floating point & SIMD execution cluster does most of the heavy lifting. It features a 128 bit SIMD integer path containing 2 SIMD ALUs and 1 shuffle unit. Bonnell's SIMD integer multiplier and floating point divider are also responsible for the scalar integer multiply and integer divider operations. Additionally the cluster includes a 64 bit FP & SIMD integer multipliers and a 128 bit FP adder.

Additionally, this cluster contains a Safe Instruction Recognition (SIR) unit responsible for supporing out-of-order commits. The idea behind the SIR unit is fairly simple, when conditions are met (i.e, when there are no inter-dependency between varying latency instructions) the two instructions will execute simultaneously allowing the shorter latency instruction to execute and finish before a possibly longer latency floating point operation ends. This algorithm reduces needless stalls that plagues traditional in-order pipelines.

 
Integer Execution Cluster 
SIMD/FP Execution Cluster Ports
Port 0Port 1
Load/Store Jump unite and LEA
ALU0 ALU1
Shift/Rotate unit Bit processing unit

The integer execution cluster contains two ALUs, a shifter, and a jump execution unit capable of performing single-cycle 64 bit integer operations. The Integer cluster has store-forwarding support allowing for a 0-cycle latency effective load-to-use.

 
Memory Subsystem (内存子系统)

Bonnell has two address generation units (AGUs). For data, there is 24 KiB write-back L1 cache with a 2-level DTLB hierarchy, hardware page walker, and an integer store-to-load forwarding support. Additionally, there is a rather large 512 KiB L2 cache with inline ECC and hardware pre-fetchers. The tag, LRU, and the state bits are all stored in a single array to minimize area. The tag and data consist of 8 4.5 KiB tag sub arrays and 32 17.5 KiB data sub-arrays made of 256 cells on the bit line and 136 cells on the write line.

As a power-saving feature, the L2 cache can be configured down to 2-way dynamically (i.e. programmatically) for applications that do not require the full performance. Doing so reduces the power and downsizes the cache to 128 KiB. Additionally, for less demanding tasks, the Bonnell power gates unused ways. It's interesting to note that the design team placed the TAG blocks at the bottom of the DATA arrays allowing, in theory, to expend the L2 to 1 MiB should they want to.

I/O Bus (输入输出总线)

dual-mode fsb driver.png
Main article: Front-Side Bus

Traditionally, Intel has been using AGTL+ transceivers (Advanced Gunning Transceiver Logic) for their front-side bus communication. With bonnell (and the chipset) Intel also introduced a CMOS signaling logic mode. CMOS has the advantage of only drawing power during transition. The switch to CMOS saves 200-500 mW at the cost of worse latency and slower bus which ranges from 400 to 533 MHz. Bonnell's intended applications is not heavy processing machine, the lower bus speed was likely a worthy compromise.

Bonnell implements both mode, so designers who prefer the faster bus can opt for the traditional AGTL+ transceivers while those who seek low power can opt for the CMOS implementation. Intel offers both types by simply fusing the appropriate circuitry. This is done by reprogramming the NFET control pull-down and the PFET control accordingly, activating or deactivating the resistor and switching the voltage.

bonnell split power planes.png

Note that during deep sleep, the design team designed the power rails using two power planes. To further save power, only keeping 21 pins are kept alive, reducing the average power by another 10% while killing off 182 of the other I/O which are not necessary for that state.

 

Features 

Multithreading

Bonnell supports Intel's Hyper-Threading(Bonnell 支持超线程技术), their marketing term for their own implementation of simultaneous multithreading. The notion of implementing simultaneous multithreading on such a low-power architecture might seem unusual at first. In fact, it's one of only a handful of ultra-low power architectures to support such feature. Intel justified this design choice by demonstrating that performance enjoys an uplift of anywhere from 30% to 50% while worsening power consumption by up to 20% (with an average of 30% performance increase for 15% more power). The toll on the die area was a mere 8%.

In the front-end, the prefetch buffer and the instruction queue have been duplicated for each thread, everything else is competitively shared between the threads. In the back-end, only the integer and floating register files are duplicated, everything else is competitively shared as well. Note that both threads compete over the L1 instruction and data caches as well as the L2 and the TLBs with the exception of a 16-entry micro-TLB that's duplicated for each thread.

Low-power features 

bonnell c-states.png

Bonnell implements a number of features to enhance battery life including several lower power states (C-states). Bonnell is capable of achieving 2 GHz core frequency at 1 V and can go down all the way to 600 MHz at 0.75 V though down-dialing the core phase-lock-loop (PLL) ratio. Bonnell supports up to C6 C-state where more power saving is achieving with higher C-state which in term means more components (i.e., features) are turned off.

• C-0 state, the processor can operate at its highest frequency (in high-frequency mode (HFM)) and in its lowest frequency (low frequency mode (LFM)).
• C-1 state, the core is power-gated and the L1 caches are flushed, yielding lower dynamic power; exit latency is sub-1µs
• C-4 state, the PPLs are shut down down as well, exit latency is in the order of 30 µs
• C-6 state, the state of the machine is kept alive in a 10.5 KiB register file (SRAM kept at VCC of 0.3 V) with the core power is completely shut off; exit latency in the order of 100 µs

Intel estimates C-6 residency to be between 80% and 90% resulting in an average power in the order of 220 mW. Likewise Idle power, which is dominated by leakage power of the functional units, is below 80 mW.

 

Modularity (调制率)

Bonnell is a highly modular(模块化的) architecture with almost all features disableable via built-in fuses allowing for many binning variation.

Bonnell 是高度模块化的架构设计,几乎所有的功能都可以关闭,通过内置的引信

Both virtualization support (VT-x/d) and Hyper-Threading may be disabled to cut on power.

虚拟化支持(VT-x/d)和超线程都可能被禁用以切断电源

Bonnell implements both AGTL+ and CMOS transceiver logic for the front-side bus signaling with either one capable of being fused off.
CMOS signaling allows for lower power but cannot reach the high bug speeds that AGTL+ can. This may or maybe not be a restriction that system designers might face.

Second Generation Enhancements 

lincroft goals.png
bonnell system board size goals.png

With the introduction of Lincroft, Intel has made substantial improvements the overall platform. The Silverthorne-based systems had a great core in terms of power and performance, but they were drugged behind when was combined with far less efficient chipset and system design. These deficiencies were addressed in the second generation of Bonnell-based models.

The first variant was Lincroft which set out to reduce the original system standby power of 1.6 W down to 32 mW (a 50x reduction) while reducing the overall board size by 2x. To achieve those goals Intel turned to higher integration, moving Graphics, CPU core, Video Acceleration, Display Controller, and Memory Controller all in a single system on a chip. Those components were previously incorporated on the 130 nm process chipset. This leaves the Langwell chipset with just the low-power southbridge functionalities. The new chipset is also manufactured on a considerably better 65 nm process

Performance Features[edit]

To address the higher performance goals, Intel introduced a number of new features into Lincroft including Bus Turbo Mode and Burst Mode.

Clock Domains[edit]

Each of Lincroft's multimedia engines are assigned a specific clock ratios and using a farm of clock dividers and clock selectors the appropriate clocks get generated to the individual multimedia engines. The complex clocking architecture implemented in Lincroft was designed to allow greater flexibility and a wider range of devices. This is done by simply tweaking the appropriate ratios for each engine based on the desired performance and power goals.

lincroft clock domains.png

Bus Turbo Mode & Burst Mode 

Intel also introduced Burst Turbo Mode, a feature designed to reduce memory latency by dynamically increasing bus frequencies in sync with CPU bursts. At pre-defined CPU frequencies, the bus gets dynamically overclocked to reduce the bottlenecking that might occur. This is implemented directly in hardware using the clock dividers (see § Clock Domains) without the need to re-clock the PLLs.

Another feature that was introduced was Burst Mode, the ability for the CPU to opportunistically take advantage of the thermal headroom on the Tjunction and Tskin by temporarily increasing the CPU frequency. Upon violation of Tjunction/Tskin, the system throttles down back to recovery points (LFM c-state).

Low-power features 

In order to further reduce power Intel introduced a number of new features:

  • Low power architecture features
  • Enhanced Geyserville for ULFM
  • Extended CPU Power C-States
  • Distributed Power Gating

Enhanced Geyserville (eGVL) 

lincroft new egvl mode.png

Enhanced Geyserville is a new mode that allows the CPU to run below LFM at Vmin. This enables linear saving of average power during instances where the CPU is idle while in C0 C-State (cV²F, note that leakage is mostly a constant due to V=Vmin the entire time). Equivalent, the bus frequency is also down-clocked at predefined frequencies (see § Bus Turbo Mode). The additional ultra low-power mode is exposed as a P-State to the operating system.

Below is the C-State chart with the additional Ultra-low LFM state added, enabling further decrease in average power consumption.

lincroft extended c-states.png

Extensive power-gating[edit]

Lincroft introduces an extensive system of power-gating. The entire SoC is divided up into multiple physical power islands. Each island can be individually controlled through a distributed power-gating system. Lincroft allows for a fine-grained management of power through both hardware and software to be able to disable areas of the chip that are not being actively utilized.

 

lincroft all off.png
lincroft all on.png

Die (die是晶片颗粒,一颗裸晶片,die是CPU本身。包含有core,IPC,cache等一系列CPU内部组件,die是是CPU的原材料)

Silverthorne 

  • 45 nm process
  • 9 metal layers
  • 47,212,207 transistors
  • 3.1 mm x 7.8 mm
  • 24.18 mm² die size
  • packaged in a Halide-Free 441 ball, 14 mm x 13 mm µFCBGA

Silverthorne die shot.jpg

Silverthorne die shot 2.jpg

Silverthorne die shot (marked).png

Function Unit Blocks (FUBs):

  • BIC/BIU - Bus Interface Cluster/Unit
  • MEC - Memory Cluster Execution & L1d$
  • FPC - FP/SIMD execution Cluster
  • IEC - Integer Execution Cluster
  • FEC - Front-End Cluster & L1i$
  • FSB - Front Side Bus

Physical layout[edit]

bonnell die size areas.svg
bonnell die size areas 2.svg

The Atom design team was considerably smaller than Intel's typical design teams which forced them to work in a slightly different way. The design team used a methodology they described as a "sea of Functional Unit Block" (FUBs) where by all cluster hierarchies (including unit-level hierarchies) are flattened at the chip level. This development methodology allowed for faster iteration. The various FUB designs were divided among the team members allowing them to handle the design in a more manageable way. All in all, Bonnell's physical database consisted of 205 unique FUBs interlinked via 41,000 FUB-to-FUB interconnects. Bonnell is manufactured on Intel's 45 nm process. 91% of the FUBs using pre-characterized standard cells (45% structured data-path and 46% fully synthesized random logic blocks) with only the remaining 9% being full-custom blocks. The unusually high utilization of standard cells (at least for Intel) is likely due to the limited resources given to the Bonnell design team.

TypeUniqueInstances
Random Logic Synthesized 92 92
Structured Data Paths 88 140
L2 sub-arrays 2 40
Custom 18 19
Repeater Station - 317
Total200608
ClusterTransistor Count
Core 13,828,574
Uncore 2,738,951
L2 & L2 tag 30,644,682
Total47,212,207
 

Lincroft 

Moorestown Platform 

  • 45 nm process
  • 140,000,000
  • Die size 7.34 mm × 8.89 mm
  • Size area 65.2526 mm²

lincroft die shot.png

lincroft die shot (annotated).png


lincroft die shot 2.png

lincroft die shot 2 (annotated).png

Oak Trail Platform[edit]

lincroft oak trail die shot.png

Cores 

Bonnell has lived through a number of iterations unlike the mainstream variants which followed a far more ambitious development cycle. Products based on Bonnell can more or less be split into two generations:

  • First Generation - initial Bonnell processor models. Those relied on a number of external chipset chips for the I/O, graphics, and various other system features.
  • Second Generation - considerably higher integration was introduced. The original CPU was not incorporated along with many of its peripheral on a single chip to create a System on a Chip.

First generation 

First generation of Bonnell-based microprocessors introduced 2 cores: Silverthorne for ultra-mobile PCs and mobile Internet devices (MIDs) and Diamondville for ultra cheap notebooks and desktops.

  • Silverthorne was the codename for a series of Mobile Internet Devices (MIDs) introduced in 2008. These processors had 1 core and 2 threads with a FSB operating at 400 MHz-533 MHz. Those models were branded as Atom MIDs and went along with the poulsbo chipset.
  • Diamondville was the codename for the series of ultra cheap notebooks and desktops introduced in 2008. Diamondville is very much a soldered-on-motherboard derivative of Silverthorne with faster FSB (operating at 533 MHz - 667 MHz). The dual-core version is a Multi Chip Module (MCM) Silverthorne variant operating on the same FSB.

Second Generation 

First generation of Bonnell-based microprocessors while being low power had to work with the older 90 nm process 945GSE chipset and 82801GBM I/O controller with a TDP of almost 9.5 watts - almost 4 times that of the processor itself. Second generation Bonnell-based microprocessors aimed to address this issue by integrating a memory controller and GPU on-chip. This drastically reduced power consumption and cost.

  • Lincroft is the codename for Bonnell-based Silverthorne's successor. Lincroft integrates on-die the graphics and memory controller. Lincroft effectively replaces the original Silverthorne offering 2x reduction in average circuit board size and up to 50x standby power reduction vs Menlow equivalent. Lincroft also introduces a 2x reduction in the overall active power consumption of the system.

Pineview 

Main article: Pineview

Pineview was the codename for second generate Bonnell-based processors which integrated a memory controllerDirect Media Interface (DMI) link, and the GMA 3150 GPU. Pineview is the successor for Diamondville, targeting the same ultra cheap desktops, nettops and netbooks.

Tunnel Creek 

Main article: Tunnel Creek

Tunnel Creek was the codename for a series of MPUs for embedded applications.

Stellarton 

Main article: Stellarton

Stellarton was the codename for a series of MPUs for embedded applications. Stellarton is the Tunnel Creek core packaged with an Altera FPGA.

Sodaville 

Main article: Sodaville

Sodaville is the codename for a series of consumer electronics system on a chip (e.g. set-top box).

Groveland 

Main article: Groveland

Groveland is the codename for a series of consumer electronics MPUs (e.g. smart TVs).

All Bonnell Chips 

 List of Bonnell-based Processors
 Main processorBusIGPFeatures
ModelPriceCoreLaunchedCTFreqBurstTDPSDPSpeedRateNameFrequencyPackageHTVT-xEIST
230 $ 29.00 Diamondville 3 June 2008 1 2 1.6 GHz   4,000 mW   133.33 MHz 533.33 MT/s     FCBGA-437
330 $ 43.00 Diamondville 21 September 2008 2 4 1.6 GHz   8,000 mW   133.33 MHz 533.33 MT/s     FCBGA-437
N270 $ 44.00 Diamondville 3 June 2008 1 2 1.6 GHz   2,500 mW   133.33 MHz 533.33 MT/s     FCBGA-437
N280   Diamondville 7 February 2009 1 2 1.667 GHz   2,500 mW   166.66 MHz 666.66 MT/s     FCBGA-437
Z500 $ 45.00 Silverthorne 2 April 2008 1 2 0.8 GHz   650 mW 960 mW 100 MHz 400 MT/s     FCBGA-441
Z510 $ 45.00 Silverthorne 2 April 2008 1 1 1.1 GHz   2,000 mW 960 mW 100 MHz 400 MT/s     FCBGA-441
Z510P   Silverthorne 2 March 2009 1 2 1.1 GHz   2,000 mW   100 MHz 400 MT/s     FCBGA-437
Z510PT   Silverthorne 2 March 2009 1 2 1.1 GHz   2,000 mW   100 MHz 400 MT/s     FCBGA-437
Z515   Silverthorne 8 April 2009 1 2 1.2 GHz   650 mW   100 MHz 400 MT/s     FCBGA-441
Z520 $ 65.00 Silverthorne 2 April 2008 1 2 1.333 GHz   2,000 mW 960 mW 133.33 MHz 533.33 MT/s     FCBGA-441
Z520PT   Silverthorne 2 March 2009 1 2 1.333 GHz   2,000 mW   133.33 MHz 533.33 MT/s     FCBGA-437
Z530 $ 95.00 Silverthorne 2 April 2008 1 2 1.6 GHz   2,200 mW   133.33 MHz 533.33 MT/s     FCBGA-441
Z530P   Silverthorne 2 March 2009 1 2 1.6 GHz   2,000 mW   133.33 MHz 533.33 MT/s     FCBGA-437
Z540 $ 160.00 Silverthorne 2 April 2008 1 2 1.867 GHz   2,400 mW 960 mW 133.33 MHz 533.33 MT/s     FCBGA-441
Z550   Silverthorne 8 April 2009 1 2 2 GHz   2,400 mW   133.33 MHz 533.33 MT/s     FCBGA-441
Z560   Silverthorne June 2010 1 2 2.133 GHz   2,500 mW   133.33 MHz 533.33 MT/s     FCBGA-441
Z600   Lincroft 4 May 2010 1 2 0.8 GHz 1.2 GHz 1,300 mW   100 MHz 400 MT/s PowerVR SGX535 200 MHz FCBGA-518
Z605   Lincroft 4 May 2010 1 2 1 GHz   2,200 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z610   Lincroft 4 May 2010 1 2 0.8 GHz 1.2 GHz 1,300 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z612   Lincroft 4 May 2010 1 2 0.9 GHz 1.5 GHz 1,300 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z615   Lincroft 4 May 2010 1 2 1.2 GHz 1.6 GHz 2,200 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z620   Lincroft 4 May 2010 1 2 0.9 GHz 1.6 GHz 1,300 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z625   Lincroft 4 May 2010 1 2 1.5 GHz 1.9 GHz 2,200 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z650   Lincroft 11 April 2011 1 2 1.2 GHz   3,000 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Z670 $ 75.00 Lincroft 11 April 2011 1 2 1.5 GHz   3,000 mW   100 MHz 400 MT/s PowerVR SGX535 400 MHz FCBGA-518
Count: 25

Documents[edit]

 

栅极
posted @ 2022-04-12 00:46  jinzi  阅读(489)  评论(0编辑  收藏  举报