CoreWeave tops new GPU cloud rankings from SemiAnalysis

Research firm SemiAnalysis has launched its ClusterMAX rating system to evaluate GPU cloud providers, with performance criteria that include networking, management software, and storage capabilities.

SemiAnalysis aims to help organizations evaluate GPU cloud providers – both hyperscalers like AWS, Azure, GCP, and Oracle Cloud, and what it calls “Neoclouds,” a group of newer GPU-focused providers. The initial list includes 131 companies. There are five rating classifications: Platinum, Gold, Silver, Bronze, and Underperforming. It classifies GPU cloud suppliers into trad hyperscalers, neocloud giants, emerging and sovereign neoclouds, and adds brokers, platforms, and aggregators to the GPU cloud market along with management software and VC clusters:

The research company states: “The bar across the GPU cloud industry is currently very low. ClusterMAX aims to provide a set of guidelines to help raise the bar across the whole GPU cloud industry. ClusterMAX guidelines evaluate features that most GPU renters care about.”

VAST Data co-founder Jeff Denworth commented that the four neocloud giants “have standardized on VAST Data” with the trad hyperscalers using “20-year-old technology.”

SemiAnalysis says the two main storage frustration areas “are when file volumes randomly unmount and when users encounter the Lots of Small File (LOSF) problem.” A program called “autofs” will automatically keep a file system mounted.

“The LOSF problem can easily be avoided as it is only an issue if you decide to roll out your own storage solution like an NFS-server instead of paying for a storage software vendor like WEKA or VAST. An end user will very quickly notice an LOSF problem on the cluster as the time even to import PyTorch into Python will lead to a complete lag out if an LOSF problem exists on the cluster.”

The report reckons that “efficient and performant storage solutions are essential for machine learning workloads, both for training and inference” and “high-performance storage is needed for model checkpoint loads” during training. It mentions Nvidia’s Inference Transfer Library (NIXL) as helping here.

During training, “managed object storage options are equally crucial for flexible, cost-effective, and scalable data storage, enabling teams to efficiently store, version, and retrieve training datasets, checkpoints, and model artifacts.”

On the inference side, “performance-oriented storage ensures that models are loaded rapidly from storage production scenarios. Slow or inefficient storage can cause noticeable delays, degrading the end-user experience or reducing real-time responsiveness of AI-driven applications.”

“It is, therefore, vital to assess whether GPU cloud providers offer robust managed parallel file system and object storage solutions, ensuring that these options are optimized and validated for excellent performance across varied workloads.”

In general, SemiAnalysis sees that “most customers want managed high-performance parallel file systems such as WEKA, Lustre, VAST Data, DDN, and/or want a managed S3-compatible object storage.”

The report also examines the networking aspects of GPU server rental.

Ratings

There is only one cloud in the top-rated Platinum category, CoreWeave. “Enterprises mainly rent GPUs from Hyperscalers + CoreWeave. Enterprises rarely rent from Emerging Neoclouds,” the report says.

Gold tier providers are Crusoe, Nebius, Oracle, Azure, Together AI, and LeptonAI. The silver tier providers are AWS, Lambda, Firma/Sustainable Metal Cloud, and Scaleway. The bronze tier includes Google Cloud, DataCrunch, TensorWave, and other unnamed suppliers. The report authors say: “We believe Google Cloud is on a Rocketship path toward ClusterMAX Gold or ClusterMAX Platinum by the next time we re-evaluate them.”

The underperformers, such as Massed Compute and SaladCloud, are described as “not having even basic security certifications, such as SOC 2 or ISO 27001. Some of these providers also fall into this category by hosting underlying GPU providers that are not SOC 2 compliant either.”

Full access to the report is available to SemiAnalysis subscribers via the company’s website.