Oracle Corporation

07/25/2024 | Press release | Distributed by Public on 07/25/2024 15:01

How to Run NVIDIA NeMo on Oracle Cloud Infrastructure

How to Run NVIDIA NeMo on Oracle Cloud Infrastructure

In this blog post, we demonstrate how easy and effective it is to run NVIDIA NeMo on Oracle Cloud Infrastructure (OCI) using the NGC container powered by NVIDIA. We focus on deploying the appropriate resources and running the DreamBooth tutorial.


What are NVIDIA NeMo and DreamBooth?

NVIDIA NeMo

NVIDIA NeMo is an end-to-end platform for developing custom generative AI, , including large language models (LLMs), multimodal, vision, and speech AI. It provides precise data curation, cutting-edge customization, retrieval-augmented generation (RAG), and accelerated performance to deliver enterprise-ready models. Here are some of the key benefits of NVIDIA NeMo for Generative AI:

1. Flexibility: Train and deploy generative AI anywhere, on Oracle Cloud, data centers, and the edge..
2. Increased ROI: Quickly train, customize, and deploy large language models (LLMs), vision, multimodal, and speech AI at scale, reducing time to solution and increasing ROI .
3. Accelerated Performance: Maximizes throughput and minimizes LLM training time with multi-node, multi-GPU training and inference..
4. End-to-end Pipeline: Provides a complete solution for the LLM pipeline-from data processing and training to inference of generative AI models..
5. Production Ready: Deploy into production with a secure, optimized, full-stack solution that offers support, security, and API stability.

NVIDIA NeMo is part of the NVIDIA AI Enterprise software platform. Alternatively, you can clone the GIT repository.

DreamBooth

DreamBooth is a method related to generative AI models, particularly focusing on fine-tuning generative adversarial networks (GANs) or diffusion models. DreamBooth enables the generation of personalized content by training a model on a small number of images of a specific subject, such as a person, pet, or object. This training allows the model to learn the characteristics of the subject and generate new content that includes it in various scenarios.


DreamBooth's Role Across Industries

As banks navigate the complex landscape of diverse customer demographics and ambitious growth goals, fraud detection remains a top priority. Traditional methods often lag behind sophisticated fraud techniques. Here, DreamBooth offers a dynamic solution to combat fraud.

DreamBooth represents a significant leap in AI application to financial security. Unlike standard AI models that need extensive datasets, DreamBooth can understand and generate realistic synthetic data from a small number of examples. For instance, OBank uses DreamBooth to enhance its fraud detection framework. By training on authentic and fraudulent documents, DreamBooth generates synthetic documents with potential fraud indicators. This rich dataset helps train OBank's algorithms to detect subtle and sophisticated fraud techniques, resulting in a more robust detection system.

Beyond banking, DreamBooth's technology is also applicable in various verticals. It enhances data privacy and security by using synthetic data, reducing the need for large datasets. Industries such as healthcare, retail, and insurance can leverage DreamBooth to generate synthetic examples for training detection systems, ensuring adaptive and effective responses to emerging threats.


Deploying Dreambooth

In the section below, we will explore a simplified example of an inference scenario. The goal of this example is to emphasize how straightforward it is to deploy a generative AI model using NeMo on OCI, rather than focusing on the specific scenario discussed earlier.

In this demonstration, we will illustrate the ease of using a Stable Diffusion model to run inference on OCI. First, we will deploy the necessary resources required to proceed with the inference process. This includes setting up the OCI environment and configuring the infrastructure to support our model deployment.

Next, we will prepare a dataset using Crogis images. This involves gathering and organizing the images to ensure they are suitable for training and inference. Preparing the dataset is a crucial step as it ensures the model has high-quality data to learn from, which in turn, improves the accuracy and reliability of the inferences made.

Following the dataset preparation, we will fine-tune the stable-diffusion model. Fine-tuning involves adjusting the model parameters to better fit our specific dataset and objectives. This step is essential to enhance the model's performance and ensure it is well-adapted to the unique characteristics of the Crogis images.

Finally, we will run the inference. This step demonstrates how the model processes new data to generate predictions or outputs based on what it has learned during training. By running the inference, we showcase the entire workflow from deployment to real-world application, highlighting the simplicity and efficiency of using NeMo on OCI for image generation tasks.


Deploying on OCI

We perform the following steps:

1. Deploy a bare metal node equipped with NVIDIA A100 Tensor Core GPUs.
2. Run the DreamBooth tutorial that comes with NVIDIA NeMo.
3. Prepare the dataset.
4. Fine-tune the stable-diffusion model.
5. Test inferencing.


Deploy the bare metal node:


Start the container and Jupyter:


Follow the DreamBooth tutorial:

Prepare the dataset:

Fine-tune the model with our dataset and NeMo:

Test the inference output:

Conclusion:

In a few steps, you can deploy using NVIDIA NGC on OCI and run any of the NVIDIA tutorials. This process allows you to experiment with various models, including the one we explored.

The DreamBooth experiment is one example of what you can achieve by combining the power of OCI and NVIDIA. This strategic partnership opens up a world of possibilities for AI and machine learning (ML) workloads, providing significant benefits across various aspects, from inference to training.

By using OCI and NVIDIA's advanced technologies, you gain access to a highly efficient and scalable platform designed to meet the demanding requirements of modern AI and machine learning applications. This powerful combination allows you to enhance the performance and accelerate the development of your AI and ML projects.

Whether you're conducting complex model training or deploying inference solutions, the integration of Oracle Cloud Infrastructure and NVIDIA ensures optimal resource utilization and improved processing capabilities. The synergy between these technologies not only boosts your productivity but also offers the flexibility and support needed for innovative and large-scale AI and ML endeavors.
Experience the robust capabilities and advanced tools provided by this collaboration, and unlock new potential for your AI and ML workloads. Enjoy the strategic advantages and comprehensive support that come with harnessing the combined strengths of OCI and NVIDIA.
For more information, see the following resources:
NVIDIA NeMo page
NVIDIA AI Enterprise on Oracle Cloud Marketplace
OCI Supercluster and AI Infrastructure
Oracle Cloud Infrastructure
OCI GPU Compute Shapes