26/06/2024 | Press release | Distributed by Public on 26/06/2024 23:35
The recent Databricks Data+AI Conference included a presentation on deploying DBRX large language models (LLMs) on Oracle Cloud Infrastructure (OCI) using NVIDIA Inference Microservices (NIM) and Delta Sharing. This blog post summarizes the key points and highlights from the presentation, focusing on the architecture, benefits, and deployment process.
Let's start with the end architecture, where we aggregate data from disparate data sources into OCI for use in a classic retrieval-augmented generation (RAG) workflow. We're using the DBRX LLM deployed using NVIDIA NIM services on NVIDIA L40S on OCI. We have also validated the workflow on NVIDIA H100 running on OCI.
This diagram outlines a workflow involving data processing, embedding, querying, and inference within OCI services. This workflow highlights the integration of various components to process data, create embeddings, store, and query vector data, and perform inference using advanced NVIDIA GPUs within OCI. It moves through the following stages:
This NIM runs on clusters that you can equip with different GPUs, such as NVIDIA L40S, A100, H100, and H200, to produce results according to the performance that customers need. These GPUs are optimized for different performance levels, catering to specific needs.
Let's take a detour and discuss the complete Oracle AI Stack.
Oracle's AI stack is designed to integrate seamlessly with various applications, from Fusion applications and NetSuite to third-party applications and industry-specific solutions. It uses embedded generative and classic AI capabilities across its AI Infrastructure.
Oracle offers the following services for AI usage:
The following AI services have GenAI capabilities:
The following services offer machine learning (ML) and GenAI features for data platforms:
The following services provide AI infrastructure:
Applications connect to AI services, suggesting that these applications use the capabilities provided by AI services. AI services are built on top of ML and GenAI for data platforms, indicating that AI functionalities utilize ML and generative AI capabilities provided by these platforms. Data platforms are supported by underlying Database-centric applications and AI Infrastructure, highlighting the foundational role of high-performance computing and storage infrastructure in supporting advanced AI and ML functionalities.
NVIDIA NIM is a modular, containerized service optimized for deploying and scaling AI inference workloads. It includes the following benefits:
NIM enhances developer productivity and infrastructure efficiency, allowing enterprises to maximize their investments. For example, running Meta Llama 3-8B in NIM produces up to three times more generative AI tokens on accelerated infrastructure than without NIM.
Oracle Cloud Infrastructure
OCI is a Generation 2 cloud that offers a differentiated approach with high innovation, flexibility, and the lowest total cost of ownership (TCO) among other hyperscalers and cloud providers. It supports robust AI and ML innovation through a full-stack AI strategy across infrastructure, platform, and software as a service (IaaS, PaaS, and SaaS).
OCI includes the following key highlights:
DBRX is a transformer-based decoder-only LLM with a fine-grained mixture-of-experts (MoE) architecture, featuring 132 billion total parameters with 36 billion active on any input. It outperforms other open MoE models like Mixtral and Grok-1 because of its larger number of smaller experts and improved model quality.
DBRX uses advanced techniques, such as rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It was pretrained on 12 trillion tokens of text and code data, using the GPT-4 tokenizer for optimal performance.
Delta Sharing is an open protocol for secure and seamless data sharing across organizations and platforms, working with Delta Lake for reliability and performance. It prioritizes the following core concepts:
Key advantages of Delta Sharing include live data sharing, no data copying, platform agnosticism, and secure governance.
Deployment process on OCI
The deployment of DBRX LLM on OCI involves the following steps steps:
Key benefits
Deploying DBRX LLM on OCI with NVIDIA NIM offers the following benefits:
The OCI North America Cloud Engineering AI Solutions team has automated the entire deployment using APIs and terraform.
This presentation underscored the synergy between OCI and NVIDIA's AI capabilities, highlighting how this powerful combination facilitates efficient handling of large-scale data and advanced AI workloads. The deployment process on OCI ensures scalability, efficiency, and security, making it an ideal choice for enterprises looking to utilize cutting-edge AI solutions.
For more information and to try out NVIDIA NIM on Oracle Cloud Infrastructure, visit NVIDIA AI and Oracle Cloud Infrastructure.