Allied Business Intelligence Inc.

01/08/2024 | News release | Distributed by Public on 01/08/2024 23:30

UALink versus NVLink—Open versus Closed: Chipping Away at NVIDIA’s Proprietary Wall

By Paul Schell | 3Q 2024 | IN-7460

May of this year saw the formation of the Ultra Accelerator Link (UALink) Promoter Group to endorse an open standard for the scale-up of Artificial Intelligence (AI) accelerators in data centers to rival NVIDIA's proprietary NVLink. Spearheaded by AMD, founding members include Intel, Broadcom, Microsoft, Meta, and Hewlett Packard Enterprise (HPE), representing broad ecosystem support, with Samsung reportedly in talks to join the club.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

Operating as One-Scale-Up in the Modern AI Data Center

NEWS

Frontier data center AI implementations, including for the training and large-scale inference of Large Language Models (LLMs), require bulky clusters of accelerators able to communicate at a low latency and improve their inference and training performance by increasing throughput and utilization. For example, Meta, with its widely implemented Llama family of LLMs, shared details of two 24,000 Graphics Processing Unit (GPU) clusters used for training Llama-3 (requiring over 30 million "GPU hours"), as part of its wider 350,000 NVIDIA H100 portfolio. NVIDIA, through a series of strategic acquisitions and internal R&D, has developed a scalable solution to support workload acceleration. One of the key elements is NVLink, which supports Central Processing Unit (CPU)-GPU and GPU-GPU communications.

UALink, through a newly formed consortium of semiconductor vendors from across the ecosystem, looks to challenge NVIDIA's hegemony by providing an open alternative to NVLink's single-vendor proprietary solution by standardizing hardware across numerous vendors. UALink is proposing a single front to rival NVIDIA in a standard that will be implemented by all members of the group, utilizing established Ethernet standards and AMD's Infinity Fabric's shared memory protocol for memory sharing across accelerators. The Promoter Group consists of accelerator vendors, cloud providers Google and Microsoft (which are also captive vendors that design their own data center accelerators), and, crucially, networking and switching vendors Broadcom and Cisco. These networking players, along with the Ultra Ethernet Consortium (UEC), which is also part of UALink, complete the picture needed to build supercomputers with scalable platforms like NVIDIA's DGX and GB200 exascale systems, covering both scale-up and linked memory within a pod (of up to 1,024 GPUs for specification 1.0), and scale-out between clusters or pods.

A Unified Proposition with UEC?

IMPACT

NVIDIA's high-performance GPUs are but one element of its successful proposition. Its dominance in the Artificial Intelligence (AI)/Machine Learning (ML) space is best analyzed on a systems level, and this applies to implementations from single accelerator cards all the way up to supercomputers made up of thousands of connected GPUs scaled-up within one computing node via NVLink. "Scale-outs" of several nodes connected via NVIDIA's proprietary InfiniBand and Ethernet-based networking equipment are also important systems. The inclusion of the UEC-focused on scale-out-in UALink creates a more holistic, complementary front from which to challenge NVIDIA, as both scale-ups (of numerous accelerators operating in lock-step) and scale-outs (between such "nodes") are needed to provide a solution like that enabled by NVIDIA's proprietary hardware. Moreover, Broadcom's pre-UALink announcement that its switches will be compatible with Infinity Fabric will address the switching component of the system (NVSwitch in NVIDIA land).

As with other open consortia like the Unified Acceleration Foundation (UXL) and the Universal Chiplet Interconnect Express (UCIe), UALink's open standard should spur innovation and collaboration, resulting in more options for AI server nodes and rack scale systems-at diverse price points. Cross-vendor compatibility for UALink hardware is baked into such open efforts, and innovation will not be stifled by proprietary licensing fees. Nonetheless, and as with NVIDIA's networking solutions, UALink is years behind NVLink. The first specification is expected to be released this quarter supporting bandwidths of 128 Gigabits per Second (Gbps), which will be roughly doubled with an upgrade following shortly thereafter in the fourth quarter. NVLink's fourth-generation specification supports up to 900 Gbps per GPU via 18 links per accelerator.

Convergence and Competition

RECOMMENDATIONS

The industry expects the first commercial implementations of UALink hardware to be in 2026, representing a fast turnaround for such a far-reaching standard and reflecting the need for alternatives to NVIDIA's monopoly. This monopoly will not be easy to break, but UALink-compliant hardware for operating large systems consisting of 1,000+ accelerators will go some way. ABI Research makes the following recommendations and predictions.

Predictions:

  • Connectivity vendors like Broadcom, a member of UALink, will be clear winners, having already committed to compatibility with Infinity Fabric for its Atlas switches, and UEC compatibility with its Network Interface Cards (NICs). Broadcom's hardware will be implemented for both scale-up and scale-out in AI data centers.
  • Expect broad support from hyperscalers-all of which have significantly increased their infrastructure spending to address burgeoning AI workloads-seeking standardized, performant networking equipment and cheaper alternatives to NVIDIA.
  • As the AI accelerator market becomes more competitive and AI workloads scale, demand for large-scale compute systems will grow. This heterogenous environment will only be possible with open-standard connectivity, driving investment and interest in UALink.
  • The level of interest and reporting around UALink and its potential will undoubtedly motivate NVIDIA to double down on its proprietary solutions and try to widen the performance gap (or slow convergence) between UALink and NVLink with increased investment in Research and Development (R&D).
  • A notable absence is semiconductor Intellectual Property (IP) designer Arm, shifting its focus from the embedded and client markets to more performant AI processing in data centers with its Neoverse portfolio. SoftBank, its AI-obsessed majority shareholder, recently acquired and rescued data center AI systems vendor Graphcore, with solutions that will benefit from UALink, so expect more members soon.

Recommendations:

  • AMD, with its Instinct accelerators-and Intel with Gaudi-should invest heavily and use its influence to garner wider support and grow UALink membership to move the standard forward and catch up to NVLink, while sharing the burden among other AI ecosystem players.
  • There should be strategic marketing collaboration and technical alignment with UXL, which addresses the software side of multiple accelerators in heterogenous computing systems. UXL attacks another of NVIDIA's wall's, namely CUDA.
  • Although it is early days, emphasis should be placed on the cost savings to implementers, as this will provide a differentiating front on which to challenge NVIDIA, while UALink's performance converges with NVLink.

By any measure of success, commercial deployments by 2026 will be a huge achievement for a new standard in the increasingly complex AI semiconductor industry. Regardless of when, the fruits of UALink's labors will challenge NVIDIA's proprietary solutions in scaling up and out of individual processing units.