AMD held its Advancing AI 2024 event last week, where it launched its latest datacenter silicon—the 5th Generation EPYC processor (codenamed “Turin”) and the MI325X AI accelerator. On the networking front, the company introduced Pensando Salina and Pensando Pollara to address front-end and back-end networking, respectively. As the silicon market gets hotter and hotter, AMD’s launches have become increasingly anticipated. Let’s dig into what AMD launched and what it means for the industry.

AI Is Still Top Of Mind

For those who thought the AI hype cycle was at its peak, guess again. This trend is stronger than ever, and with good reason. As the AI market starts to move from frontier models and LLMs to operationalizing AI in the enterprise, virtually every IT organization is focused on how to best support these workloads. That is, how does IT take a model or models, integrate and tune them using organizational data and use the output in enterprise applications?

Further, organizations that have already operationalized AI to some degree are now exploring the concept of agentic AI, where AI agents learn from each other and become smarter. This trend is still a bit nascent, but we can expect it to grow rapidly.

The point is that AI in the enterprise is already here for many companies and right around the corner for many more. With this comes the need for compute platforms tailored for AI’s unique performance requirements. In addition to handling traditional workloads, CPUs are required to handle the AI data pipeline, and GPUs are required to perform the tasks of training and inference. (CPUs can also be used to perform the inference task.)

Because of this, AI silicon market leader Nvidia has designed its own CPU (Grace) to tightly integrate and feed its GPUs. While the company’s GPUs, such as Hopper and Blackwell, will run with any CPU, their tight integration with Grace is designed to deliver the best performance. Similarly, Intel has begun to enter the AI space more aggressively as it builds tight integration among its Xeon CPUs, Gaudi AI accelerators and forthcoming GPU designs.

For AMD, the integration of CPU with GPU (and GPUs connected by DPUs) is the company’s answer to the challenges faced by enterprise IT and hyperscalers alike. This integration accelerates the creation, cleansing, training and deployment of AI across the enterprise.

5th Gen EPYC Scales Out And Scales Up

To meet the entire range of datacenter needs, AMD designed two EPYC Zen 5 cores—the Zen 5 and Zen 5c. The Zen 5, built on a 4nm process, is the workhorse CPU designed for workloads such as database, data analytics and AI. The Zen 5c is designed with efficiency in mind. This 3nm design targets scale-out cloud and virtualized workloads.

AMD has held a performance leadership position in the datacenter throughout the last few generations of EPYC. There are more than 950 cloud instances based on this CPU, and the reason is quite simple. Thanks to AMD’s huge advantages in terms of number of cores and performance of those cores, cloud providers can put more and more of their customers’ virtual machines on each server. Ultimately, this means the CSP can monetize those servers and processors in a much more significant way.

In the enterprise, even though servers are a budget line item instead of a contributor to revenue (and margin), the math still holds: those high-core-count servers can accommodate more virtual machines, which means less IT budget goes to infrastructure so that more can go to other initiatives like AI.

Having lots of cores doesn’t mean anything if they don’t perform well. In this regard, AMD has also delivered with Turin. Instructions per cycle is a measure of how many instructions a chip can process every clock cycle. This tells us how performant and efficient a CPU is. The fact that Turin has been able to deliver double-digit percentage increases in IPC—large ones—over its predecessor is significant.

How Does 5th Gen EPYC Stack Up Against Xeon 6?

Because the new EPYC launched a couple of weeks after Intel’s Xeon 6P CPU (see my deep analysis on Forbes), we haven’t yet seen head-to-head comparisons in terms of performance. However, we can do a couple of things to get a feel for how EPYC and Xeon compare. The first is to look at the side-by-side “billboard” specifications. When comparing these chips for scale-out workloads, the 5c CCD-based CPUs have up to 192 cores with 12 DDR5 memory channels (6,400 MT/s) and 128 lanes of PCIe Gen 5.

By comparison, Intel’s Xeon 6E (efficiency core) scales up to 144 cores with 12 DDR5 memory channels and 96 lanes of PCIe Gen 5. However, in the first quarter of 2025, Intel will launch its second wave of Xeon 6E, which will scale up to 288 cores.

It’s clear that on the performance side of the equation, EPYC and Xeon are close on specs—128 cores, 12 channels of memory and lots of I/O (128 lanes of PCIe for EPYC, 96 for Xeon). Here are some of the differences between the two:

  • EPYC now supports AVX-512 natively, boosting its use of this advanced vector extension, which will improve its HPC performance considerably.
  • Xeon 6P supports multiplex ranked memory (MRDIMM) that can boost memory throughput to 8,800 MT/s. So far, I have not seen that AMD is supporting this. To be clear, MRDIMM will not be used for traditional datacenter workloads.
  • EPYC can reach clock speeds of up to 5 gigahertz—a big boost for some HPC and AI workloads.
  • Xeon 6P has discrete accelerators integrated into its compute complex to speed up workloads such as AI and database.

Below are the many benchmarks that AMD provided to demonstrate Turin’s performance. I show this because the SPEC suite of benchmarks most closely and objectively measures a CPU’s core performance. In this test, the 5th Gen EPYC significantly outperforms the 5th Gen Xeon.

As I always say with any benchmark a vendor provides, take these results with a grain of salt. In the case of this benchmark, the numbers themselves are accurate. However, Xeon’s performance took a significant leap between 5th Gen and Xeon 6P, making it hard to truly know what the performance comparison looks like until both chips can be independently benchmarked. Mind you, AMD couldn’t test against Xeon 6P, so I do not fault the company for this. However, I’d like to see both companies perform this testing in the very near future.

Is The Market Responding To EPYC?

The market is responding positively to EPYC, and no doubt about it. In fact, in the five generations that EPYC has been on the market, AMD’s datacenter CPU share has climbed from less than 2% to about 34%. Given the slow (yet accelerating) growth of EPYC in the enterprise, this tells me that the CPU’s market share just for the cloud and hyperscale space must be well north of 50%. In fact, Meta recently disclosed that it has surpassed 1.5 million EPYC CPUs deployed globally—and that’s before we get to the CSPs.

I expect that Turin will find greater adoption in the enterprise datacenter, further increasing EPYC’s market share. In the last couple of quarters, I’ve noticed AMD CEO Lisa Su saying that enterprise adoption is beginning to accelerate for EPYC. Additionally, the rising popularity of the company’s Instinct MI300X series GPUs should help EPYC deepen its appeal. Which brings us to our next topic.

Instinct MI325X And ROCm 6.2 Close The Gap With Nvidia

While we look to the CPU to perform much of the work in the AI data pipeline, the GPU is where the training and inference magic happens. The GPU’s architecture—lots of little cores that enable parallelism, combined with high-bandwidth memory and the ability to perform matrix multiplications at high speeds—delivers efficiency. Combined with optimized libraries and software stacks, these capabilities make for an entire AI and HPC stack that developers and data scientists can employ more easily.

While Nvidia has long been the leader in the HPC and AI space, AMD has quietly made inroads with its Instinct MI300 Series GPUs. Launched at the inaugural Advancing AI event in 2023, the MI300X posed the first legitimate alternative to the Nvidia H100 and H200 GPUs for AI training through a combination of its hardware architecture and ROCm 6.0 software stack (competing with Nvidia’s CUDA).

Over the following few quarters, AMD went on to secure large cloud-scale wins with the likes of Meta, Microsoft Azure, Oracle Cloud Infrastructure and the largest independent cloud provider, Vultr, to name a few. This is important because these cloud providers modified their software stacks to begin the effort of supporting Instinct GPUs out of the box. No more optimizing for CUDA and “kind of” supporting ROCm—this is full-on native support for the AMD option. The result is training and inference on the MI300 and MI325 that rival Nvidia’s H100 and H200.

Introducing the Instinct MI325X is the next step for closing the gap on Nvidia. This GPU, built on AMD’s CDNA 3 architecture and boasting 256GB of HBM3E memory, claims to deliver orders of magnitude better performance over the previous generation as well as leadership over Nvidia.

As mentioned, hardware is only part of the equation in the AI game. A software stack that can natively support the most broadly deployed frameworks is critical to training data and operationalizing AI through inference. On this front, AMD has just introduced ROCm 6.2. With this release, the company is making bold claims about performance gains, including a doubling of performance and support for over a million models.

AMD Pensando Salina DPU And AMD Pensando Pollara 400 NIC

Bringing it all together is networking, which requires both connecting AMD’s AI cluster to the network and connecting all of this AI infrastructure on the back end. First, the company introduced its third-generation DPU—the Pensando Salina. Salina marries high-performance network interconnect capabilities and acceleration engines aimed at providing critical offload to improve AI and ML functions. Among the new enhancements are 2x400G transceiver support, 232 P4 match processing units, 2x DDR5 memory and 16 Arm Neoverse N1 cores.

Combined, these features should facilitate improved data transmission, enable programming for more I/O functions and provide compute density and scale-out—all within a lower power-consumption envelope—for hyperscale workloads. AMD claims that Salina will provide a twofold improvement in overall performance compared to its prior DPU generations; if it delivers on this promise, it could further the company’s design wins with public cloud service providers eager to capitalize on the AI gold rush.

Second, the AMD Pensando Pollara 400 represents a leap forward in the design of NICs. It is purpose-built for AI workloads, with an architecture based on the latest version of RDMA that can directly connect to host memory without CPU intervention. AMD claims that this new NIC, which employs unique P4 programmability and supports 400G interconnect bandwidth, can provide up to 6x improvement in performance when compared to legacy solutions using RDMA over Converged Ethernet version 2. Furthermore, the Pollara 400 is one of the industry’s first Ultra Ethernet-ready AI NICs, supported by an open and diverse ecosystem of partners within the Ultra Ethernet Consortium, including AMD, Arista, Cisco, Dell, HPE, Juniper and many others.

AMD’s new NIC design could position it favorably relative to Broadcom 400G Thor, especially since the company is the first out of the gate with a UEC design. Both the Salina DPU and Pollara 400 NIC are currently sampling with cloud service and infrastructure providers, with commercial shipments expected in the first half of 2025.

Putting It All Together

One of the understated elements of AMD’s AI strategy is seen in an image above: the acquisition of Silo AI. This Finnish company, the largest private AI lab in Europe, is filled with AI experts who spend all their time helping organizations build and deploy AI.

When looking at what AMD has done over the last year or so, it has built an AI franchise by bringing all of the critical elements together. At the chip level, the company delivered 5th Gen EPYC for compute, MI325X for GPU and Salina and Pollara for front-end and back-end networking. ROCm 6.2 creates the software framework and stack that enables the ISV ecosystem. The acquisition of ZT Systems last month delivers rack-scale integration that Silo AI can use to deliver the last (very long) mile to the customer.

In short, AMD has created an AI factory.

What Does All This Mean?

As I say again and again in my analyses of this market, AI is complex—and even that is an understatement. Different types of compute engines are required to effectively generate, collect, cleanse, train and use AI across hyperscalers, the cloud and the enterprise. This translates into a need for CPU, GPU and DPU architectures that are not only complementary, but indeed optimized to work with one another.

Over time, AMD has acquired the pieces that enable it to deliver this end-to-end AI experience to the market. At Advancing AI 2024, the company delivered what could be called its own AI factory. It is important to note that this goes beyond simply providing an alternative to Nvidia. AMD is now a legitimate competitor to Nvidia.

At the same time, AMD demonstrated a use for all of this technology outside of the AI realm, too. With the new EPYC, it has delivered a generation of processors that demonstrates continued value in the enterprise. And in the MI325X, we also see excellent performance across the HPC market.

Here is my final takeaway from the AMD event: The silicon market is more competitive than ever. EPYC and Xeon are both compelling for the enterprise and the cloud. On the AI/HPC front, the MI325X and H100/H200/B200 GPUs are compelling platforms. However, if I were to create a Venn diagram, AMD would be the only company strongly represented in both of these markets.

Game on.

Share.

Leave A Reply

Exit mobile version