Oracle Corporation

09/26/2024 | Press release | Distributed by Public on 09/26/2024 07:52

Announcing General Availability of OCI Compute with AMD MI300X GPUs

We're excited to announce the general availability of Oracle Cloud Infrastructure (OCI) Compute bare metal instances with AMD Instinct MI300X GPUs, BM.GPU.MI300X.8.
As AI adoption expands to support new use cases in inference, fine-tuning, and training, we want to provide more choice to customers with our first Compute instance powered by AMD Instinct accelerators. Today, applications require larger and more complex datasets, especially in the realm of generative AI and large language models (LLMs). AI infrastructure needs three critical elements to accelerate workloads: compute performance, cluster network bandwidth, and high GPU memory capacity and bandwidth. OCI's bare metal instances provide performance without the overhead of the hypervisor. OCI Supercluster with AMD Instinct MI300X accelerators provides high-throughput, ultra-low latency RDMA cluster network architecture for up to 16,384 MI300X GPUs. With 192GB of memory capacity per accelerator, AMD Instinct MI300X can run a 66-billion parameter Hugging Face OPT transformer LLM on a single GPU.
OCI Compute with AMD Instinct MI300X
This instance type provides competitive economics. It is offered at $6 per GPU/hour with the following specifications:
Instance Name
BM.GPU.MI300X.8
Instance Type Bare metal
Price (per GPU/hour) $6.00
Number of GPUs 8 x AMD Instinct MI300X Accelerators
GPU Memory 8 x 192GB = 1.5 TB HBM3
GPU Memory Bandwidth 5.3 TB/s
CPU Intel Sapphire Rapids 2x 56c
System Memory 2TB DDR5
Storage 8x 3.84TB NVMe
Front-end Network 1 x 100G
Cluster Network 8x (1x 400G)
As we updated in June, we partnered with AMD to validate their Instinct MI300X GPUs for serving LLMs. Based on our validation, the time to first token latency was within 65 milliseconds and average latency of 1.5 seconds for a batch size of one. As the batch size increased, the hardware was able to scale linearly and generate a maximum of 3,643 tokens across concurrent 256 user requests (batches). For more details, read the blog post, Early LLM serving experience and performance results with AMD Instinct MI300X GPUs.
Get started with BM.GPU.MI300X.8
BM.GPU.MI300X.8 is generally available now in the Oracle Cloud Console. Contact your Oracle sales representative or Kyle White, VP of AI infrastructure sales. Learn more about this bare metal instance in our documentation.