New Xilinx SmartNIC Looks More Like a 100Gbps Arm-Based Coprocessor

The New Xilinx SmartNIC Looks More Like a 100Gbps Arm-based Coprocessor

 

 

 

Latest entry into the server disaggregation space has an FPGA chip and an Arm SoC, with a way to split network traffic between the two — and maybe a new way to make use of that FPGA.
Scott Fulton III | Feb 23, 2021

It’s being called a high-bandwidth, “smart” network interface controller — the latest in an emerging line of distributed processing components for so-called “disaggregated” servers. But a close examination of Xilinx’s new SN1000 Alveo SmartNIC series, announced Tuesday, reveals an emerging softness about the classes of workloads this new accelerator may be capable of performing.

“The SN1000 is what we’re calling the industry’s first SmartNIC at 100-gig speed, with composable hardware,” Kartik Srinivasan, director of marketing for the Xilinx Data Center Group, told DCK. The new component combines a Xilinx FPGA chip with a separate 16-core Arm A72 NXP processor. It’s this Arm processor that will be responsible for dividing the data plane from the control plane for SDN processing, managing incoming traffic at 100 million packets per second (100 Mpps).

It then directs ingress data from the data plane to the FPGA for what Srinivasan calls “100 percent hardware data acceleration.” The control plane workload is processed separately, through the Arm SoC, offloading all the network control work from the server’s host CPU.

SN1000 aims to address what Srinivasan described as the second phase of SmartNIC customer adoption. The earliest adopters, in the first phase, were hyperscale cloud platforms that applied essentially even workloads evenly across masses of functionally uniform servers, he explained.

This second phase will be comprised of enterprises, operators of smaller data centers, and service providers looking to compress more power and bandwidth into much tighter spaces.

“With CPU-based implementations, even though programmability comes with ease, the performance capabilities at scale come way short,” Srinivasan said. “You’re putting one processor in front of another processor.” With “the relief that you’re trying to provide to the host CPU, you’re in fact squeezing the balloon and moving [the traffic] on to another part of the motherboard.”

Interfaces on the card itself are two QSFP28 (quad, small form-factor, pluggable) sockets. Thermal design point is rated for 70W, which typically means it should require 70 watts of cooling power to maintain a nominal operating temperature.

“Straight out-of-the-box, there is a plug-and-play experience to our SmartNIC,” Srinivasan asserted.  “You plug our product into the system, and you then have hardware acceleration functions for virtualization for networking, virtualization for storage, virtual switching with OVS (Open vSwitch), and there’s also pre-programmed acceleration modules we provide for storage and security. That enables our customers to not have to worry about programming anything.”

Composable Data Plane Traffic Workflow as Differentiator

As an example, he offered a use case where a customer programs this SmartNIC for remote NVMe-over-fabric [PDF] access to an SSD storage component. Both NVMe and data compression functions in the workflow may be directed to the FPGA for hardware-based acceleration. Immediately, storage traffic should receive a boost. But if encryption and firewall functions are also part of this workflow, these functions may be made to flow to the A72 cores.

Now suppose Ceph software-defined storage is added to the traffic flow later. This new traffic class may then be directed to the FPGA as a separate workload or offloaded to additional modules by way of a custom virtual switch, whose operations are composed by customers’ development teams. This is what Srinivasan explained as Xilinx’s differentiator from competitive SmartNICs, which he said lack the ability to compose data plane traffic workflow, conceivably injecting existing accelerators and network assets into the flow. Xilinx experts may be made available to customers, he told us, to help customers figure out their traffic flow composability.

“There’s new data center challenges around performance, efficiency, manageability that can’t be addressed merely by extending what’s been historically done. Something new, something disruptive, is required,” Pejman Roshan, VP of marketing for Xilinx’s Data Center Group, told us.

Xilinx’s new entry joins an increasingly crowded field in what has simultaneously been described as the disaggregation space and the acceleration space. Last October, Nvidia made a major charge toward disaggregation, advancing an architecture where its supercomputer-class GPUs are paired with a new class of Arm-based processor it calls DPUs. Conceivably, a SmartNIC would be a fourth category of processor here. That August Fungible advanced its own class of DPU, which would serve as a NIC but based on the startup’s own Multiple Instruction/Multiple Data (MIMD) architecture rather than an FPGA accelerator.

As Srinivasan explained, however, Xilinx’s position is that disaggregation can be made more difficult, architecturally speaking, by stacking new processors in front of one another in the existing workflow.

“Virtual switching enables us to accelerate east-west traffic within the server without having to go into the network,” he said. “That’s one way we can reduce the network congestion. When you go to a disaggregated deployment strategy, what becomes critical is the network itself… We’re not trying to put a CPU in front of another CPU and push or hide the bottleneck somewhere else. We’re providing software-defined hardware acceleration for multiple use cases and we’re allowing our customers to compose the data part within the architecture for application-specific acceleration. And we’re providing this with easy-to-use, high-level programming languages like P4 and C++.”

P4 is a domain-specific language geared for routing packets through a data forwarding element. To encourage enterprises to make use of it for their own purposes, Xilinx plans to utilize its existing App Store as a channel for distributing custom forwarding utilities. Today, the Xilinx store features container-based, self-deploying functions for deploying such things as inference engines, video search aids, and data analytics tools to its existing Alveo accelerators.

Unlike a CPU or SoC, an FPGA is highly accessible and programmable. Customers are capable of fitting out FPGAs with mission-critical, highly specialized functions, more often than not for AI workloads. NASA’s Perseverance rover, which successfully landed on Mars this month, was fitted with a pair of Xilinx Virtex FPGAs equipped with machine learning and neural networking applications using accelerated TensorFlow and PyTorch libraries.

Usually, when a vendor plants the seeds for a customer-driven “ecosystem,” it’s with an eye toward planting its brand squarely in the public conscience. A disaggregated server model, where all processing is no longer centered solely around the CPU, would give processor makers like Xilinx, Nvidia, Marvell, and others breathing room to compete in a field that has perhaps too long been considered the domain of x86 architectures — and by extension Intel.

Accomplishing this for the long-term, however, requires building whatever would successfully disaggregate the server into something more like a computer unto itself and less like an appliance one installs and forgets.

“We’re trying to build composability into every piece of hardware that we provide,” remarked Xilinx’s Roshan. 

SN1000 series joins Xilinx’s existing product line, which had been spearheaded by its X2 variable traffic speed offload NIC and its U25 25 Gbps SmartNIC. The first model in the series, SN1022, will be made available in March. Pricing has yet to be announced.

Correction: February 23, 2021
NASA’s Perseverance rover is fitted with a pair of Xilinx Virtex FPGAs, not a Xilinx Kintex FPGA, as this article previously said.
TAGS: NETWORKS

 

posted @ 2021-08-21 10:33  张同光  阅读(103)  评论(0编辑  收藏  举报