AI performance benchmarking is essential for providing a level playing field for vendors and ensuring transparent reporting in the industry. MLCommons, a vendor-neutral organization, has been at the forefront of developing MLPerf benchmarks to evaluate different aspects of AI performance. The latest updates to MLPerf Inference 3.1 benchmarks introduce testing for large language models (LLMs) and a new benchmark for measuring storage systems’ performance for machine learning workloads.
MLPerf is committed to evolving its benchmark suite to stay aligned with the advancements in the AI landscape. With each update, MLPerf benchmarks have witnessed continuous performance improvements. The participants of the MLPerf Inference 3.1 benchmarks demonstrated significant progress, with many achieving performance gains of over 20% compared to the 3.0 benchmark.
Unlocking the Potential of Large Language Models
The MLPerf Inference 3.1 benchmarks introduce a brand new benchmark for large language models (LLMs), reflecting the rapid growth of generative AI. While MLCommons previously included LLMs in the MLPerf 3.0 Training benchmarks, it acknowledges the fundamental differences between training and inference tasks. Inference with LLMs involves performing generative tasks, such as writing multiple sentences, while training focuses on text summarization.
Wider Applicability of Inference Benchmark
MLPerf’s inference benchmark with LLMs represents a wider set of use cases that organizations can deploy, catering to those without extensive computing resources or data to support large models. The primary task performed with the inference benchmark is text summarization. This approach enables more organizations to harness the power of AI without overwhelming computational requirements.
While high-end GPU accelerators have traditionally dominated MLPerf’s rankings for both training and inference, MLPerf Inference 3.1 highlights the diversity of compute options. Intel’s silicon, including Habana Gaudi accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series processors, performs well in the benchmarks. Intel emphasizes the importance of deploying AI in production using diverse computing resources.
AI Scalability and Market Trends
Intel’s senior director of AI products, Jordan Plawner, emphasizes that AI inference needs to be deployable across various types of computing platforms. He believes that showcasing representatives from both software and hardware domains in the MLPerf Inference 3.1 benchmarks indicates the industry’s shift towards scaling out AI models rather than just building them. This market trend signifies the growing importance of scalability and adaptability in AI deployments.
Nvidia’s Contribution to MLPerf Inference 3.1
While Intel highlights the value of CPUs for inference tasks, Nvidia continues to make significant contributions to the MLPerf Inference 3.1 benchmarks. Nvidia’s GH200 Grace Hopper Superchip, a combination of an Nvidia CPU and GPU, delivers impressive performance gains of up to 17% compared to previous GPU submissions. The Grace Hopper superchip caters to the most demanding AI workloads, reflecting Nvidia’s commitment to pushing the boundaries of AI performance.
Leveraging Nvidia’s L4 GPUs
Nvidia’s L4 GPUs also play a crucial role in the MLPerf Inference 3.1 benchmarks. These GPUs provide up to 6 times better performance than the best x86 CPUs submitted in this round. The exceptional performance of Nvidia’s L4 GPUs showcases their capabilities in handling AI workloads efficiently.
MLPerf’s continuous efforts to expand its benchmark suite align with the evolving landscape of AI technologies. The addition of LLM benchmarks and the focus on inference tasks reflect the industry’s growing demand for generative AI models and wider applicability in real-world scenarios. The contributions from both Intel and Nvidia underline the versatility of computing resources available for AI inference, paving the way for scalable and adaptable AI deployments. MLPerf benchmarks serve as valuable performance milestones and provide valuable insights into the growth and potential of AI technologies.