Oracle to deploy 50,000 AMD GPUs for AI supercluster by 2026

Oracle Cloud Infrastructure will deploy 50,000 AMD GPUs by 2026 to build an AI supercluster, offering enterprises scalable and cost-efficient AI performance.

14 Oct 2025 15:37 IST

New Update

Oracle Cloud Infrastructure (OCI) will deploy 50,000 graphics processing units (GPUs) from Advanced Micro Devices (AMD) starting in the second half of 2026, marking one of the largest single-cloud GPU deployments to date. The decision highlights a broader shift among hyperscalers to adopt AMD’s GPU offerings as an alternative to Nvidia’s widely used chips for artificial intelligence workloads.

Advertisment

Announcing this at the Oracle AI World in Las Vegas, the company said that OCI will be the first hyperscaler to offer a publicly available AI supercluster based on AMD Instinct MI450 Series GPUs. The initial deployment is planned for the third quarter of 2026, with further expansion in 2027.

The announcement extends Oracle's ongoing collaboration with AMD and builds on recent deployments of AMD Instinct MI300X GPUs on OCI, as well as the upcoming availability of MI355X-powered instances. The new supercluster will be integrated into OCI’s zettascale infrastructure and is designed to meet growing demand for large-scale AI training and inference capabilities.

“Our customers are building some of the world’s most ambitious AI applications, and that requires robust, scalable, and high-performance infrastructure,” said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure.

Advertisment

AI Infrastructure for Scale and Speed

The AMD Instinct MI450 Series GPUs are engineered for large-scale generative AI, language models, and high-performance computing (HPC) applications. Each GPU delivers up to 432 GB of HBM4 memory and 20 TB/s of memory bandwidth, enabling in-memory training and inference of models up to 50% larger than those supported by earlier architectures.

The planned AI supercluster will use a vertically optimised, liquid-cooled “Helios” rack design that supports 72 GPUs per rack. This architecture is intended to maximise performance density and energy efficiency while minimising latency using UALoE scale-up connectivity and networking aligned with Ultra Ethernet Consortium standards.

At the head of each cluster will be next-generation AMD EPYC CPUs, codenamed “Venice,” which are designed to streamline orchestration and data processing while offering built-in security and confidential computing capabilities.

Advertisment

The networking architecture will incorporate AMD Pensando data processing units (DPUs) to support high-throughput data ingestion and secure traffic management. Each GPU can be connected with up to three 800 Gbps AMD Pensando “Vulcano” AI-NICs, enabling fast, lossless communication optimised for distributed AI training.

To further enhance performance, the system includes a rack-level memory-sharing interconnect, UALink, that allows GPUs to share memory and communicate without routing through CPUs. This hardware-coherent fabric reduces bottlenecks and latency during large-scale model orchestration.

The platform also integrates AMD’s open-source ROCm software stack, which supports widely used AI frameworks and simplifies workload migration through a flexible programming environment.

Advertisment

For cloud-scale operations, the system supports fine-grained partitioning and virtualisation of GPUs and pods, enabling secure multi-tenancy and efficient resource allocation based on workload requirements.

This vertically integrated and open-standards-based architecture aims to provide customers with the flexibility, performance, and scalability needed to handle increasingly complex AI workloads at cloud scale.

“With our AMD Instinct GPUs, EPYC CPUs, and advanced AMD Pensando networking, Oracle customers gain powerful new capabilities for training, fine-tuning, and deploying the next generation of AI,” said Forrest Norrod, executive vice president, AMD.

Advertisment

A Growing Market for AMD in AI Cloud

The move is the latest indicator that AMD is gaining traction in a market long dominated by Nvidia’s GPUs. AMD’s Instinct MI450 chips, unveiled earlier this year, are the company’s first to support rack-scale integration with 72 GPUs operating as a single unit. This capability is essential for training and deploying the most advanced AI models.

Karan Batta, senior vice president of Oracle Cloud Infrastructure, said AMD’s chips are particularly well-suited for inference workloads, a key requirement for enterprise AI applications. “We feel like customers are going to take up AMD very, very well — especially in the inferencing space,” Batta said.

Oracle’s investment in AMD-based infrastructure also reflects its broader strategy of offering open, secure, and price-efficient cloud solutions. The deployment further positions OCI as a high-performance platform tailored for enterprise AI workloads, as demand for flexible, cost-effective alternatives continues to grow.

Advertisment