Quobyte has trumpeted its first public stab at running at the MLPerf Storage benchmark across its eponymous parallel file system, claiming to have achieved a distinct edge on the critical 3D-Unet benchmark.
The benchmark is one of three tests first unveiled last year by ML Commons to measure how storage systems fare supplying data when a model is being trained.
Needless to say, that’s a critical factor when ensuring costly and power hungry GPUs are fully utilized – right now, they rarely come anywhere close.
Quobyte said: “Of the three benchmarks, 3D U-Net is particularly interesting as it is the most dependent on storage performance.”
Or, as Quobyte cofounder and CEO Bjorn Kolbeck said, it’s the one that “really exercises and tortures the storage system.”
To pass, Quobyte pointed out, “MLPerf Storage requires a utilization of 90 percent or above. The utilization directly translates into a specific throughput, as it determines the speed in which the workload issues IO requests.”
“The goal is to support 8 GPUs per client at a high utilization (above 90 percent) with a minimal set of storage resources (as they cost money, floor space, and energy),” it added.
For 3D U-Net, “each simulated H100 GPU at full speed requires approximately 2.8 GBps throughput. With the 200G network of a DGX, this means at most seven GPUs can be kept utilized above 90 percent.”
How much torture?
Quotbyte’s setup was an eight node cluster, connected with a 2x100G RoCE network, with four client machines (Supermicro), and four server machines (Supermicro) with 4x PCI 4.0 NVMs each.
For v1.0.1 of MLPerf Storage, with a H100, Quobyte said it was able to support “six GPUs per client (per DGX) at a 90% efficiency.” It aims to increase that to seven GPUs, at higher utilization with client machines with faster CPUs.
“On the server side, we provide this performance with a modest amount of four standard servers connected with 200G RoCE. This setup provides high availability, and can be scaled linearly with more DGX clients.”
Quobyte claimed this makes it the fastest and most efficient file system in the MLPerf 3D-Unet test, supporting the largest number of GPUs per client machine. Equally significantly, it also claimed to achieve the lowest cost and energy consumption per performance unit.
Cofounder and CTO Felix Hupfeld said: “Where we differ is how much resources you need on the other side to deliver that performance added.” The resources needed to saturate the GPUs translate into more power draw, more floorspace, and of course – more cost.
While Nvidia is grabbing all the attention in the AI world, Kolbeck said storage was critical, and NFS-based systems were never designed for scaleup.
Pick the wrong system, he continued, “And you don’t get the efficiency that you need for the GPUs, and then you’re stuck with the solution.
“You spend millions of dollars on the storage system and can’t deliver the performance that your GPUs need, basically ruining the GPU investment.”
A new iteration of the MLPerf Storage benchmarks is due to be published shortly. While Quobyte didn’t take part in last year’s “official” round of submissions, the company said it fully intends to this time around.