Neocloud Pioneer CoreWeave All In on Inference
After making a name for itself as a GPU-as-a-service vendor, CoreWeave is evolving -- again.
4 Min Read
Michael M. Santiago/Staff via Getty Images
Inference is everything.
That aphorism and way of looking at AI infrastructure have been appearing frequently in AI circles lately.
Now, CoreWeave, the cryptocurrency startup turned major neocloud player, with a close relationship with AI chip giant Nvidia, has started to pivot toward one of the fastest-growing trends in AI -- inference.
The vendor operates some 40 AI data centers -- largely populated by Nvidia GPUs -- and serves dozens of major customers, including generative AI vendors OpenAI, Cohere and ElevenLabs; enterprises and tech vendors such as Siemens, Mercado Libre, Salesforce and Databricks; and AI platforms Perplexity, Cursor and Runway.
Putting Inference to Use
"Inference is the way to monetize AI," Chen Goldberg, executive vice president of product and engineering at CoreWeave, said during an online media roundtable earlier this week. "We are seeing that with our customer base, no matter if it's enterprise AI, AI labs or AI platforms, customers are looking for different methods to run inference. That's what we've been doing."
Related:Nvidia Invests $2B In Custom Chip Vendor Marvell Technology
Propelling the demand for inference is the dramatic surge in agentic AI interest. Many AI users are interested in using autonomous agents that lean heavily on the reasoning capabilities of large language models. And reasoning largely relies on inference, with agents drawing new conclusions and acting independently rather than regurgitating information from huge, pretrained LLMs.
"Instead of a single query … we have a new category of agents, which [do] a long-running task. [Agents] can complete more complicated tasks, maybe with multiple queries," Goldberg said.
Applications that are increasingly using agentic AI and inference include coding, engineering, physical AI, call centers and drug discovery, she noted.
Speed and Older GPUs
Meanwhile, CoreWeave is touting recent top performance in compute processing speed benchmarks on the independent MLPerf Training benchmark suite from the MLCommons consortium using Nvidia Grace Blackwell architectures to run two popular, powerful reasoning models: DeepSeek-R1 and OpenAI's smaller open-weight gpt-oss-120b.
That speed is important for extracting the most performance from earlier-generation GPUs, said Shadi Saba, senior director of AI/ML infrastructure at CoreWeave, during the roundtable.
With Nvidia and other chip vendors rapidly releasing newer generations of GPUs, industry observers have raised financial concerns about depreciating GPUs as faster, more capable chips arrive on the market.
"Compared with older generations, the same model will squeeze the most from whatever Nvidia is giving between generations," Saba said, noting that CoreWeave uses its own software stack to optimize performance from GPUs and CPUs, which are becoming more popular for inference tasks.
Related:Meta Ups Texas AI Data Center Investment From $1.5B to $10B
CoreWeave's strategy of wringing usable production from older GPUs, while also upgrading to the latest chips, is effective, said Steven Dickens, an analyst at HyperFrame Research.
"You've got to look at it as a sort of portfolio construction, in the same way you do your stock portfolio. You want some things that earn you money from dividends, and then you want some high growth stocks," Dickens said, adding that the vendor can provide reliable inference compute with older chips. "The same thing with CoreWeave. They have some H100 chips that are probably three or four years old. Those are still in the portfolio and still earning money."
The strategy, however, isn't unique and is also employed by neocloud competitors including Nebius, Lambda, OVH and QumulusAI.
The Neocloud Market
Dickens said the ability of neocloud vendors to use their software stacks to optimize the performance of older chips and to move workloads to the most cost-effective GPUs and other chips is the vendors' specialty.
Related:Bezos’ Blue Origin joins race to put AI data centers in space
"That's the secret sauce of a neocloud, their ability to portfolio manage their GPU fleet and then be able to move workloads to optimize," he said. "Everybody's going to say they want their stuff to run on the latest and greatest. Very few workloads actually need to work on the latest and greatest."
As for the neocloud market landscape, Dickens said it is starting to shake out to a handful of major players.
While there were some 150 neocloud startups 18 months ago, he said he sees that number winnowing down to 10 or so dominant players over the next five years.
"Winner-takes-most is how I see this industry panning out, not winner-takes-all," Dickens said. "It's not going to be that there's no more business for Lambda, Nebius and OVH. There's obviously going to be business for those guys, and CoreWeave is going to be one of those names."
About the Author
Senior News Director, AI Business
Shaun Sutner, a journalist with more than 25 years of daily newspaper experience and 11 years at Informa TechTarget as an editor and writer, directs news coverage for AI Business. He was previously a senior news and features writer covering health IT and HR software at TechTarget and a senior news director overseeing coverage of AI, business analytics, data management and government tech regulation.
Sutner's newspaper career included investigative reporting and covering the Massachusetts State House and politics for the Worcester Telegram & Gazette. He has written about snow sports as a T&G columnist and correspondent for 20 years. Sutner's interests also include tennis, standup paddleboarding, cooking and popular music.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
service
Architecture Is the Missing Layer in AI Harness Engineering
Originally published in longer form on Substack. This DEV version is adapted for software engineers and platform practitioners who want the practical takeaway quickly. Most AI harness work focuses on execution. That makes sense. Teams need better context management, tool access, workflow boundaries, verification, memory, and sub-agent coordination. Without those pieces, coding agents are unreliable fast. But there is a different failure mode that those harness improvements do not solve: an agent can operate inside a well-designed execution harness and still produce the wrong architecture. That is the missing layer. The Real Problem Is Not Just Code Quality Ask an agent to design a small SaaS product and it will often produce something that is technically coherent and operationally excessiv

FinancialContent - Hardison Co. Announces Project20x White-Label Platform, Creating a Universal AI Engine for Healthcare, Government, and Social Services - FinancialContent
FinancialContent - Hardison Co. Announces Project20x White-Label Platform, Creating a Universal AI Engine for Healthcare, Government, and Social Services FinancialContent
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
trunk/3c9726cdf76b01c44fac8473c2f3d6d11249099e: Replace erase idiom for map/set with erase_if (#179373)
C++20 provides std::erase_if(container, pred) which is equivalent to the following much longer code snippet for associative containers: auto it = container.begin(); while (it != container.end()) { if ( pred (*it)) { it = container. erase (it); } else { ++it; } } PyTorch now supports C++20: #176662 Pull Request resolved: #179373 Approved by: https://github.com/cyyever , https://github.com/Skylion007

How to Use Claude Code for Security Audits: The Script That Found a 23-Year-Old Linux Bug
Learn the exact script and prompting technique used to find a 23-year-old Linux kernel vulnerability, and how to apply it to your own codebases. The Technique — A Simple Script for Systematic Audits At the [un]prompted AI security conference, Anthropic research scientist Nicholas Carlini revealed he used Claude Code to find multiple remotely exploitable heap buffer overflows in the Linux kernel, including one that had gone undetected for 23 years. The breakthrough wasn't a complex AI agent—it was a straightforward bash script that systematically directed Claude Code's attention. Carlini's script iterates over every file in a source tree, feeding each one to Claude Code with a specific prompt designed to bypass safety constraints and focus on vulnerability discovery. Why It Works — Context,

Loop Neighborhood Markets Deploys AI Agents to Store Associates
Loop Neighborhood Markets is equipping its store associates with AI agents. This move represents a tangible step in bringing autonomous AI systems from concept to the retail floor, aiming to augment employee capabilities. The Innovation — What the source reports Loop Neighborhood Markets, a convenience store chain, has begun providing AI agents to its store associates. While the source article is brief, the announcement itself is significant. It signals a shift from internal, back-office AI pilots to deploying agentic AI directly into the hands of frontline retail staff. The specific capabilities of these agents—whether for inventory queries, customer service support, or task management—are not detailed, but the operational intent is clear: to augment human workers with autonomous AI assis




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!