Oracle Corporation

07/19/2024 | Press release | Distributed by Public on 07/19/2024 07:29

Understanding the Performance of OCI Block Volume with Oracle Cloud ...

A year ago, we transformed the landscape of scalable VMware solutions in the public cloud by introducing Standard shapes for Oracle Cloud VMware Solution. This innovation empowered users to independently scale storage from compute by using the Block Volume Service from Oracle Cloud Infrastructure (OCI) as the primary storage solution. This decoupling provided greater flexibility and the power to fine-tune performance based on specific workload demands.
Recently, OCI Block Volume Service has announced several enhancements in their service without any extra cost to our customers. This blog post dives deep into how these enhancements from OCI Block Volume can significantly enhance the performance of your standard shape clusters in Oracle Cloud VMware Solution. We explore how Volume Performance Units (VPUs) influence data transfer speeds within your VMware Virtual Machine File System (VMFS) datastore.
Enhancing performance with OCI Block Volume
Using OCI Block Volume for Oracle Cloud VMware Solution gives users the following key benefits:
Fast, reliable performance: OCI Block Volumes use NVMe SSDs to deliver excellent performance, guaranteed by a service level agreement (SLA). This configuration ensures smooth and consistent operation for even your most demanding workloads.
Always-available data: Block volumes provide persistent storage that survives even if the Compute instance terminates, simplifying ESXi host management. Start a new host, attach the block volumes, and access your information without data movement.
Highly durable: Block volumes are built for exceptional durability, with multiple copies of your data stored to minimize the risk of data loss. You dont need to scope more storage for fault tolerance. This setup doesn't replace backups, we always recommend taking appropriate backups of your VMware workloads to counter any availability domain failures.
Cost-effective: Pay only for the storage you use with OCI Block Volume. Block Volume auto-tuning also lets you set performance limits and adjust IOPS based on workload needs, potentially leading to cost savings and meeting your peak workloads with ease.
Security you can trust: Data encryption (at rest and in-transit) ensures the highest level of security for your applications. OCI Block Volume encrypts all data using the robust AES-256 encryption standard for supported bare metal instances.
Effortless scaling: Increase your storage space as needed without downtime, keeping your operations running smoothly.
Frees up ESXi resources: By handling storage tasks, OCI Block Volume allows your ESXi hosts to focus on core computing needs.
Important considerations
While OCI Block Volumes offer significant advantages, keep the follwoing key points in mind:
Performance sharing: In VMware environments, we perform multi-attach of the volumes to create VMFS datastores. The block volume performance is shared among all connected ESXi hosts. The SLA for block volume performance applies only to Balanced, Higher Performance, and Ultra High-Performance levels, not the Lower Cost level.
Instance performance impact: The maximum IOPS and throughput of the underlying Compute instance can limit the effective performance of an attached block volume. This includes volumes configured for Ultra High Performance (UHP) where total IOPS and throughput are capped when combining UHP and non-UHP attachments.
Availability domain access: Block volumes can only be accessed by instances within the same availability domain. Multi-Availability Domain Cluster configurations can't create VMFS datastore spread across availability domains.
Attaching and detaching volumes: You can't attach block volumes to multiple instances simultaneously unless they're configured as read-write shareable. Detaching a volume from all instances is required before deletion or reattachment to a new instance in a different configuration.
Device paths for UHP volumes: While device paths aren't important for VMware services with UHP volumes, you must still choose one from the menu. As a workaround, you can initially deploy the volume as Balanced or High-Performance and then increase the VPU level to UHP after attaching it.
Now that we have a good overview of the OCI Block Volume service and the impacts it brings to the VMFS datastore within your VMware environment, let's look at the performance of the block volume.
Optimizing performance with VPUs and OCI Block Volumes
Oracle Cloud VMware Solution standard shapes use OCI Block Volumes for VMFS datastores. Choosing the right VPU level is crucial for optimal performance. Standard-shaped ESXi hosts can connect to a maximum of 32 block volumes, each with VPU levels ranging from 0 - 120. In a unified management cluster, one attachment is used for the management datastore, allowing for a maximum of 31 data volumes. But a good understanding of how VPU works and how the Block Volumes maximums play a role is crucial to finding the right balance to get the best value for money while meeting your performance requirements.
Selecting the right VPU for your needs
In this section, we review the following VPU levels and their performance characteristics:
Performance characteristics of each VPU levels
Elastic performance level Volume Performance Units (VPUs) IOPS per GB Max IOPS per volume Size for Max IOPS (GB) KBPS per GB Max MBPS per volume
Balanced 10 60 25,000 417 480 480
Higher Performance (HP)
20
75
50,000
667
600
680
Ultra High Performance (UHP)
30 to 120
90 to 225
75,000 to 300,000
883 to 1,333
720 to 1,800
88 to 2,680
Consider the following factors:
VPU: Higher VPU levels generally translate to significantly increased IOPS and throughput capabilities. However, a VPU level determines a block volume's potential for IOPS or throughput, not necessarily both simultaneously.
Workload focus: For workloads emphasizing random reads (like databases), prioritize higher IOPS. For sequential workloads, such as large file transfers, prioritize throughput.
Bare metal instance limits: Consider the maximum performance limits of your bare metal instances (BM.Standard3.64 or BM.Standard.E4.128 in this example) to understand the overall performance scalability of your ESXi cluster.
Understanding maximum block volume performance
While VPUs are a major determinant of performance, the following factors also influence the maximum achievable performance of your Oracle Cloud VMware Solution cluster:
ESXi host performance limit: An individual ESXi host like BM.Standard3.64 or BM.Standard.E4.128 has a maximum IOPS capacity of 1,300,000 and a throughput of 6,000 MB/s. Assigning volumes exceeding these limits doesn't improve performance.
UHP volumes: If your cluster has enough hosts (e.g., 6 ESXi hosts), attaching a VPU 120 volume can achieve the combined maximum IOPS of 300,000 offered by the volume.
Block Volume IOPS Limit: Each attachment of a block volume to an ESXi host has a maximum IOPS limit of 50,000. For example, consider a three-node BM.Standard3.64 cluster with a VPU 120 volume. With three attachments, the maximum achievable IOPS from the attachment perspective is 150,000, even though the volume itself can deliver up to 300,000 IOPS. However, with six ESXi hosts, each attachment could reach 50,000 IOPS, allowing you to fully utilize the VPU 120 volume's potential.
By understanding these factors with VPUs, you can strategically configure your OCI block volumes to optimize performance for your specific workloads within the Oracle Cloud VMware Solution environment. This ensures you get the most out of your OCI block volume as VMFS.
Simulating real-world workloads
To showcase the performance capabilities of OCI block volumes with Oracle Cloud VMware Solution Standard shapes, we configured a comprehensive test environment mimicking real-world application scenarios with the following specifications:
Hardware: Three BM.Standard3.64 hosts, each with 64 OCPUs enabled.
Storage: We conducted two sets of tests, one with a consolidated large volume and another with a distributed environment of 31 block volumes of the same size and VPU.
For consolidation tests: To establish baseline performance, we also ran tests with a single, large volume (32 TB, maximum volume size possible), varying the VPU level from 10 - 50 between the tests.
For distribution tests: We connected 31 block volumes to the hosts. Each volume had either VPU 10 or 20, depending on the specific test. With a size of 1.43 TB per volume, the total VMFS datastore cluster capacity within vCenter reached 44.33 TB.
Virtual machines:
For consolidation tests: We deployed a total of 165 VMs, each configured with two vCPUs, 12 GB of RAM, and 10 disks, each sized at 15 GB to fit within the 32 TB volume.
For distribution tests: We deployed a total of 155 VMs, spread evenly across 31 volumes. The CPU and RAM configuration remained the same, but we used slightly larger disks of 20 GB to fill the larger datastore cluster.
This translates to a significant test workload, totaling to 310-330 vCPUs, 1860-1980 GB of RAM, and 24.2-30.2 TB of storage.
Workload scenarios: The workload scenarios encompassed various read-write mixes and block sizes to simulate real-world patterns, such as:
Database workloads with online transaction processing (OLTP) with a mix of reads and writes, like typical database operations.
Data warehousing applications with primarily read-heavy access patterns
Batch processing scenarios involving a high volume of write operations
We evaluated performance across the following workload scenarios:
Block size: 4K, 8K, 16K, 128K
Read-Write mix: 70/30, 100/0, 0/100
Each test ran for 30 minutes. By testing these diverse workload scenarios, we were able to gain a comprehensive understanding of how OCI Block Volume performance scales under various conditions on Oracle Cloud VMware Solution standard shapes.
Performance in focus: Test results analyzed
Now that we've explored the test environment and understand how VPU works, let's delve into the test results! This section analyzes the performance of OCI block volumes with Oracle Cloud VMware Solution standard shapes under various conditions.
Consolidation tests
We first ran tests with a single, large block volume (32 TB, the maximum supported size for an OCI block volume) configured with varying VPU levels (10 - 50). The following figures show a representation of the performance of the datastore at different VPU levels for various workloads, as explained earlier.
Total IOPS from one OCI block volume at different VPU levels for consolidated analysis
Total throughput from one OCI block volume at different VPU levels for consolidated analysis
Latency for one OCI block volume at different VPU levels for consolidated analysis.
We had the following observations:
VPU impact: As expected, the performance of the datastore correlates with the performance or chosen VPU (IOPS and throughput) of the underlying block volumes. This correlation confirms the significant influence of VPUs on performance.
Workload influence: Performance varied depending on the workload type. Workloads with a high emphasis on random reads like databases achieved higher IOPS, while sequential workloads like large file transfers showed better throughput.
Latency: With 165 VMs on this single OCI block volume, the latency is notably high. While higher VPU volumes exhibit lower latency, optimal performance is best achieved by distributing the VMs or workload across multiple volumes, as demonstrated in the next section of distribution test results.
Distribution tests
To understand the performance of the VMFS datastore cluster backed by several OCI block volumes, we conducted a few scalability tests by attaching 31 block volumes to the cluster. This is the maximum number of attachments possible for a standard shape cluster as one volume is utilized for the management datastore. The following graphics show the tests and their results:
Distribution analysis: Combined total IOPS from 31 OCI block volumes.
Distribution analysis: Combined total throughput from 31 OCI block volumes.
Distribution analysis: Average latency for 31 OCI block volumes.
We made the following observations:
Increased IOPS and throughput: When multiple smaller block volumes are combined to build a datastore cluster, you can achieve cumulative IOPS of higher magnitude. A single 32TB VPU 10 block volume can achieve a maximum of 25,000 IOPS and 480 MB/s throughput. When we spread a similar amount of storage across multiple VPU 10 volumes, as seen in our scalability tests, we can achieve a combined performance of up to 775,000 IOPS and 14,800 MB/s throughput.
Scaling VPUs: Increasing the VPU level to 20 further enhances performance. Our tests showed that with VPU 20, combined performance could reach approximately 1.3 million IOPS and nearly 20,000 MB/s throughput.
Reduced latency: Distributing the workload across several volumes has significantly reduced latency. We observed an average latency ranging from 2-5 ms per volume, enhancing the overall performance and responsiveness of the system.
Bare metal compute limits: While VPU scaling improves datastore cluster performance, it is essential to consider the maximum capabilities of the bare metal Compute instances to understand the true scaling potential of the cluster. Properly aligning the performance capabilities of block volumes with the Compute instance limits ensures optimal performance and avoids bottlenecks.
How to scale your storage
To effectively scale your storage, we suggest starting with volumes around 2 TB each and setting up multiple balanced or VPU 10 volumes. This approach allows you to aggregate performance across all volumes at the datastore cluster level. As your storage needs grow, you can add more volumes, scaling up to 25-28 volumes initially, with spare attachments reserved for future expansion. At this stage, you can seamlessly increase the size of the underlying block volumes in OCI and expand your datastore capacity in vCenter to boost the overall capacity of your datastore cluster.
While you can scale up a volume, scaling down isn't permitted. If reducing total capacity is necessary, you can gracefully unmap some unused volumes following VMware's official documentation by Broadcom. For enhanced performance, we also recommend scaling the VPU of the underlying volumes uniformly. For example, in one scenario, we scaled the performance of a six-node cluster to achieve approximately 2 million IOPS.
The following graphics show the tests and their results:
Our cluster comprised six BM.Standard3.64 hosts with 20 volumes of VPU 120 attached to create a VMFS datastore cluster.
We conducted tests with a total of 400 VMs, each equipped with 10 disks of 15 GB.
Aggregate IOPS for 20 volumes with VPU 120.
Aggregate Throughput for 20 volumes with VPU 120.
Average latency per volume with VPU 120.
The results across various workload types showed the following key observations:
Throughput and IOPS: We achieved a cumulative throughput of about 43,000 MB/s, with volumes capable of a maximum of around 53,000 MB/s. The achieved 2 million IOPS can potentially reach up to 6 million with appropriate workload planning and more hosts in the cluster.
Latency: For a 4K block size, latency observed at the datastore cluster level for all 400 VMs averaged as low as 3-4 ms.
Performance metrics: This setup was in a lab environment and not under full production load. However, with more volumes, you can attain higher performance. Notably, performance metrics are cumulative across all 20 block volumes.
Putting your knowledge into action
This blog post has provided the insights needed to make informed decisions when configuring OCI Block Volumes for optimal performance in your Oracle Cloud VMware Solution environment. The key is to choose the right VPU level based on your specific workload demands. If your applications require frequent random reads, prioritize higher IOPS. For sequential workloads, focus on maximizing throughput. This approach not only helps you achieve peak performance but also manages costs effectively. Remember, always distribute the total required performance onto multiple block volumes. For example, to achieve about 1 Million IOPS, consider the following approaches:
Consolidation: For example, attaching eight volumes of VPU 50 (each offering 125,000 IOPS) to reach 1 million IOPS.
Distribution: Spread the workload across multiple volumes, such as a combination of VPU 10 and VPU 20 volumes (approximately 25 volumes) to achieve 1 million IOPS.
You should prefer distribution over consolidation for the following reasons:
Lower latency: Consolidation concentrates all workload activity on fewer volumes, potentially causing bottlenecks and higher read-write latency. Distribution balances the workload, keeping latency lower.
Cost optimization: While the total max IOPS and the required capacity are the same, distributing the workload onto lower VPU volumes can offer a cost advantage.
In essence, distribution provides the same performance such as 1 million IOPS, with potentially better latency and lower costs. When the underlying block volumes are properly sized and workloads are distributed, they offer high-performance, reliable, and scalable block storage. These factors make them perfect for demanding workloads and the seamless operation of mission-critical applications in VMware environments.
Looking ahead
The future of OCI block volumes and Oracle Cloud VMware Solution is promising, with continuous innovations aimed at delivering even more powerful features and functionalities. Stay tuned for exciting updates that can further enhance your storage experience!
Still have questions? Our OCI experts are here to assist you. Contact us today to discuss your specific storage needs and explore how Oracle Cloud Infrastructure Block Volume service can empower your Oracle Cloud VMware Solution environment.