GDDR6 vs DDR4 vs HBM2?为什么CPU还不用GDDR?异构内存的未来在哪里?

GDDR6 vs DDR4 vs HBM2?为什么CPU还不用GDDR?异构内存的未来在哪里? - 知乎

High Bandwidth Memory is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.

24 MAY 2017, The HBM 2 memory that ships with Pascal P100 cards is extremely expensive. Nvidia dropped the prices of Tesla (compute) P100 PCIe 16GB cards to just above 7000 Euro, while just a few months ago Nvidia wanted 10,000 Euro for the same card. According to our multiple, well informed industry insiders, the 4GB HBM 2 memory stack cost around $80 USD. Vega needs two HBM stacks to get to 8GB memory putting the price all the way up to $160 just for the memory. Using the same math, Nvidia spends $320 for four times 4GB for its Pascal P100 cards. The price is tailored for each customer and, quite likely, customers are getting different prices and the $80 should be regarded as just a rough guideline. [link]

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). Since November 2017, all of the world's fastest 500 supercomputers run Linux-based operating systems.

Supercomputers play an important role in the field of computational science, and are used for a wide range of computationally intensive tasks in various fields, including quantum mechanics, weather forecasting, climate research, oil and gas exploration, molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), and physical simulations (such as simulations of the early moments of the universe, airplane and spacecraft aerodynamics, the detonation of nuclear weapons, and nuclear fusion). They have been essential in the field of cryptanalysis.

Supercomputers were introduced in the 1960s... Through the decade, increasing amounts of parallelism were added, with one to four processors being typical. In the 1970s, vector processors operating on large arrays of data came to dominate. A notable example is the highly successful Cray-1 of 1976. Vector computers remained the dominant design into the 1990s. From then until today, massively parallel supercomputers with tens of thousands of off-the-shelf processors became the norm.

The US has long been the leader in the supercomputer field, first through Cray's almost uninterrupted dominance of the field, and later through a variety of technology companies. Japan made major strides in the field in the 1980s and 90s, with China becoming increasingly active in the field. As of June 2020, the fastest supercomputer on the TOP500 supercomputer list is Fugaku, in Japan, with a LINPACK benchmark score of 415 PFLOPS, followed by Summit, by around 266.7 PFLOPS. The US has four of the top 10; China and Italy have two each, Switzerland has one. In June 2018, all combined supercomputers on the list broke the 1 exaFLOPS mark. E, P, T, G都是1000倍的关系。1 P = 1015.

Cray left CDC in 1972 to form his own company, Cray Research. Four years after leaving CDC, Cray delivered the 80 MHz Cray-1 in 1976, which became one of the most successful supercomputers in history. The Cray-2 was released in 1985. It had eight central processing units (CPUs), liquid cooling and the electronics coolant liquid fluorinert was pumped through the supercomputer architecture. It performed at 1.9 gigaFLOPS and was the world's second fastest after M-13 supercomputer in Moscow. 现在单核2GHz主频,每条浮点指令1个时钟周期,就是 2GFLOPS了。现代CPU都是怪兽,Intel的Atom都能靠流水线,1个时钟周期2条浮点指令。预测明天的天气要算48小时没有意义,2分钟和2秒钟么,2秒钟快倒是快。cryptanalysis用不到浮点吧。

In 1982, Osaka University's LINKS-1 Computer Graphics System used a massively parallel processing architecture, with 514 microprocessors, including 257 Zilog Z8001 control processors and 257 iAPX 86/20 floating-point processors. It was mainly used for rendering realistic 3D computer graphics. 希捷的老总说硬盘基本用来存某种片子了,不会显卡基本用来打游戏了吧。Meta公司…… 人为了虚拟化真是不遗余力。

Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994 with a peak speed of 1.7 gigaFLOPS (GFLOPS) per processor. The Lockheed Martin F-22 Raptor is an American single-seat, twin-engine, all-weather stealth tactical fighter aircraft developed exclusively for the United States Air Force (USAF). First flight: 7 September 1997. The Mikoyan-Gurevich MiG-25 (Russian: Микоян и Гуревич МиГ-25) (NATO reporting name: Foxbat) is a supersonic interceptor and reconnaissance aircraft that was among the fastest military aircraft to enter service. First flight: 6 March 1964.

The parallel architectures of supercomputers often dictate the use of special programming techniques to exploit their speed. Software tools for distributed processing include standard APIs such as MPI and PVM, VTL, and open source software such as Beowulf.

In the most common scenario, environments such as PVM and MPI for loosely connected clusters and OpenMP for tightly coordinated shared memory machines are used. Significant effort is required to optimize an algorithm for the interconnect characteristics of the machine it will be run on; the aim is to prevent any of the CPUs from wasting time waiting on data from other nodes. GPGPUs have hundreds of processor cores and are programmed using programming models such as CUDA or OpenCL.

In 2016 Penguin Computing, R-HPC, Amazon Web Services, Univa, Silicon Graphics International, Sabalcore, and Gomput started to offer HPC cloud computing. The Penguin On Demand (POD) cloud is a bare-metal compute model to execute code, but each user is given virtualized login node. POD computing nodes are connected via nonvirtualized 10 Gbit/s Ethernet or QDR InfiniBand networks. User connectivity to the POD data center ranges from 50 Mbit/s to 1 Gbit/s. Citing Amazon's EC2 Elastic Compute Cloud, Penguin Computing argues that virtualization of compute nodes is not suitable for HPC. Penguin Computing has also criticized that HPC clouds may have allocated computing nodes to customers that are far apart, causing latency that impairs performance for some HPC applications.

No single number can reflect the overall performance of a computer system, yet the goal of the Linpack benchmark is to approximate how fast the computer solves numerical problems and it is widely used in the industry. The FLOPS measurement is either quoted based on the theoretical floating point performance of a processor (derived from manufacturer's processor specifications and shown as "Rpeak" in the TOP500 lists), which is generally unachievable when running real workloads, or the achievable throughput, derived from the LINPACK benchmarks and shown as "Rmax" in the TOP500 list. The LINPACK benchmark typically performs LU decomposition of a large matrix. The LINPACK performance gives some indication of performance for some real-world problems, but does not necessarily match the processing requirements of many other supercomputer workloads, which for example may require more memory bandwidth, or may require better integer computing performance, or may need a high performance I/O system to achieve high levels of performance.

The stages of supercomputer application may be summarized in the following table:

  • 1970s Weather forecasting, aerodynamic research (Cray-1).
  • 1980s Probabilistic analysis,[105] radiation shielding modeling[106] (CDC Cyber).
  • 1990s Brute force code breaking (EFF DES cracker).
  • 2000s 3D nuclear test simulations as a substitute for legal conduct Nuclear Non-Proliferation Treaty (ASCI Q).
  • 2010s Molecular Dynamics Simulation (Tianhe-1A)
  • 2020s Scientific research for outbreak prevention/Electrochemical Reaction Research

The IBM Blue Gene/P computer has been used to simulate a number of artificial neurons equivalent to approximately one percent of a human cerebral cortex, containing 1.6 billion neurons with approximately 9 trillion connections. The same research group also succeeded in using a supercomputer to simulate a number of artificial neurons equivalent to the entirety of a rat's brain.

研究活老鼠的大脑可以插针,人脑呢?死人的脑子有研究价值吗?所以我觉得就是个simulation. Blue Gene/P, 200 m2, 1 Pflops, 2.9 MW. [link] 1千卡=1大卡=1000卡; 1卡=1卡路里=4.186焦耳。一般来说,成人每天至少需要1500千卡的能量来维持身体机能。1瓦特=1焦耳/秒。1500*4186/8/3600=218瓦,硅基生物还真是废呢。俺们碳基还没加班呢 :-)

Modern-day weather forecasting also relies on supercomputers. The National Oceanic and Atmospheric Administration uses supercomputers to crunch hundreds of millions of observations to help make weather forecasts more accurate.

In early 2020, Coronavirus was front and center in the world. Supercomputers used different simulations to find compounds that could potentially stop the spread. These computers run for tens of hours using multiple paralleled running CPU's to model different processes.

六级/考研单词: compute, flop, million, instruct, intensive, mechanic, forecast, molecule, compound, crystal, physics, airplane, fusion, indispensable, parallel, array, data, dominate, notable, norm, stride, summit, liquid, electron, pump, atom, render, realistic, numerical, tunnel, tactic, unite, militant, march, seldom, dictate, exploit, hardware, scenario, cluster, coordinate, penguin, web, silicon, execute, cite, elastic, allocate, impair, float, derive, manufacture, summary, radiate, shield, cracker, substitute, illicit, conduct, treaty, dynamic, outbreak, simulate, equivalent, billion, trillion, rat, nationwide, administer, crunch, multiple

posted @   Fun_with_Words  阅读(371)  评论(0编辑  收藏  举报
(评论功能已被禁用)
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律









 和4张牌。

点击右上角即可分享
微信分享提示