Nvidia Buys the Future of Inference
The $20-billion deal that turns yesterday’s rebel into today’s rocket engine
The $20-billion deal that turns yesterday’s rebel into today’s rocket engine
Silicon Valley has seen its share of eye-popping acquisitions, but few that rewrite the rules of the game overnight. On a quiet Monday morning Nvidia announced a non-exclusive licensing agreement with Groq—valued by people close to the talks at roughly twenty billion dollars—that immediately becomes the largest transaction in the 31-year history of the GPU giant. The agreement covers Groq’s entire Language Processing Unit (LPU) portfolio, the secretive silicon that claims to run large language models ten times faster than today’s best GPUs while sipping a fraction of the energy. As part of the same pact, Groq co-founder and CEO Jonathan Ross and company president Sunny Madra will join Nvidia, effectively returning Ross to the cathedral he once tried to burn down.
From TPU to LPU: the making of a renegade
Jonathan Ross does not fit the profile of a serial acquirer’s trophy hire. A soft-spoken engineer who keeps a whiteboard marker in his shirt pocket the way other executives keep Montblanc pens, Ross spent eight years at Google building the Tensor Processing Unit, the custom accelerator that powers everything from Search to Bard. When he left in 2016 he carried away more than trade secrets; he carried a conviction that the future of AI would be defined not by training ever-larger models but by how cheaply and consistently those models could be served at scale. Groq was founded the same year with a radical thesis: strip away every vestige of graphics heritage, abolish caches, and build a deterministic, single-threaded, compiler-driven chip whose performance could be predicted cycle-by-cycle before silicon ever touched a server rack. The result was the LPU, a narrow, lightning-fast rectangle of math that does almost nothing except inference—and does it, according to the company’s benchmarks, with 90 % lower energy per token than an H100.
Jonathan Ross does not fit the profile of a serial acquirer’s trophy hire. A soft-spoken engineer who keeps a whiteboard marker in his shirt pocket the way other executives keep Montblanc pens, Ross spent eight years at Google building the Tensor Processing Unit, the custom accelerator that powers everything from Search to Bard. When he left in 2016 he carried away more than trade secrets; he carried a conviction that the future of AI would be defined not by training ever-larger models but by how cheaply and consistently those models could be served at scale. Groq was founded the same year with a radical thesis: strip away every vestige of graphics heritage, abolish caches, and build a deterministic, single-threaded, compiler-driven chip whose performance could be predicted cycle-by-cycle before silicon ever touched a server rack. The result was the LPU, a narrow, lightning-fast rectangle of math that does almost nothing except inference—and does it, according to the company’s benchmarks, with 90 % lower energy per token than an H100.
The startup kept a low profile until 2022, when it emerged from stealth with a $300-million Series C led by Tiger Global. Three months ago Groq closed a $750-million Series D that valued the company at $6.9 billion, bringing total funding to just over $1.1 billion from BlackRock, Samsung, Cisco, and others. Customers range from Saudi Aramco’s research arm to a handful of hyperscalers that asked not to be named. Still, few analysts expected the six-year-old company to fetch nearly three times its last private valuation in a licensing stroke—let alone surrender its two most visible leaders.
Why Nvidia is writing the biggest check it has ever written
For Jensen Huang, the agreement is less a shopping spree than a strategic inoculation. Nvidia’s data-center revenue has grown 206 % year-over-year, but every earnings call now carries the same subtext: how long can the GPU moat last? Amazon’s Trainium and Inferentia chips are already deployed on millions of nodes; Google’s TPU v5e pods are rented to external customers; Microsoft is rumored to be taping out its own 5-nanometer accelerator. Buying Groq’s technology gives Nvidia a second lever: an ultra-efficient inference engine that can sit beside, or eventually inside, future GPU packages, extending the company’s reach from the training super-cluster to the long tail of real-time workloads—chatbots, recommendation engines, autonomous driving, industrial vision—where latency and power trump raw FLOPS.
For Jensen Huang, the agreement is less a shopping spree than a strategic inoculation. Nvidia’s data-center revenue has grown 206 % year-over-year, but every earnings call now carries the same subtext: how long can the GPU moat last? Amazon’s Trainium and Inferentia chips are already deployed on millions of nodes; Google’s TPU v5e pods are rented to external customers; Microsoft is rumored to be taping out its own 5-nanometer accelerator. Buying Groq’s technology gives Nvidia a second lever: an ultra-efficient inference engine that can sit beside, or eventually inside, future GPU packages, extending the company’s reach from the training super-cluster to the long tail of real-time workloads—chatbots, recommendation engines, autonomous driving, industrial vision—where latency and power trump raw FLOPS.
Equally important is the talent raid. Ross arrives as senior vice-president of inference products, reporting directly to Huang, while Madra takes the newly created title of VP of strategic ecosystem. Their arrival mirrors Nvidia’s 2020 absorption of Mellanox’s networking team and its 2022 capture of Bright Computing’s cluster-software talent: every time a vertical layer of the AI stack becomes commoditized, Nvidia buys the smartest people in that layer and folds their roadmap into its own. The difference this time is the symbolism. Ross helped birth the very class of chips that now threaten Nvidia’s dominance; bringing him back inside the walls is the silicon equivalent of hiring the architect who designed the siege engines.
What Groq gains: survival, scale, and a second act
Under the terms of the deal Groq remains an independent operating company, helmed by CFO Simon Edwards, a former Dell Technologies executive who joined in 2021. The startup keeps its brand, its 350 employees, and its customer contracts. Nvidia receives a non-exclusive license to every patent, every compiler trick, and every future LPU revision. Non-exclusive is the critical clause: Groq can—and says it will—continue to sell LPUs to anyone, including cloud providers that compete directly with Nvidia’s DGX cloud. The arrangement gives Groq the balance-sheet oxygen to tape out its 5-nanometer successor, code-named “Sapphire,” without the dilution of another private round, while Nvidia gains first-mover integration rights and a hedge against the remote possibility that Sapphire outruns Blackwell in specific workloads.
Under the terms of the deal Groq remains an independent operating company, helmed by CFO Simon Edwards, a former Dell Technologies executive who joined in 2021. The startup keeps its brand, its 350 employees, and its customer contracts. Nvidia receives a non-exclusive license to every patent, every compiler trick, and every future LPU revision. Non-exclusive is the critical clause: Groq can—and says it will—continue to sell LPUs to anyone, including cloud providers that compete directly with Nvidia’s DGX cloud. The arrangement gives Groq the balance-sheet oxygen to tape out its 5-nanometer successor, code-named “Sapphire,” without the dilution of another private round, while Nvidia gains first-mover integration rights and a hedge against the remote possibility that Sapphire outruns Blackwell in specific workloads.
Investors are hardly mourning. The Series D cohort triples its money overnight, and employee stock options that were underwater at the $6.9-billion valuation are now, by definition, in the money. Perhaps more valuable is the psychological relief: building a chip company is a capital guillotine; every new node costs another zero on the budget line. With Nvidia’s cash infusion, Edwards can focus on execution instead of fundraising, a luxury few semiconductor startups ever enjoy.
The technical chessboard: how the pieces fit together
Initial integration will be surgical. Nvidia plans to port Groq’s compiler stack—nicknamed “GroqComposer”—to CUDA, allowing developers to profile an entire inference pipeline on LPU and GPU side-by-side within the same Jupyter notebook. Longer term, engineers hint at a hybrid package: an H100-style giant surrounded by a ring of LPUs that handle the embarrassingly parallel post-processing steps of transformer decoding. Such a module could, in theory, drop into an existing SXM socket, giving hyperscalers an incremental upgrade path rather than a forklift swap.
Initial integration will be surgical. Nvidia plans to port Groq’s compiler stack—nicknamed “GroqComposer”—to CUDA, allowing developers to profile an entire inference pipeline on LPU and GPU side-by-side within the same Jupyter notebook. Longer term, engineers hint at a hybrid package: an H100-style giant surrounded by a ring of LPUs that handle the embarrassingly parallel post-processing steps of transformer decoding. Such a module could, in theory, drop into an existing SXM socket, giving hyperscalers an incremental upgrade path rather than a forklift swap.
Energy savings are the easiest metric to verify. A single Groq LPU card consumes 320 watts at peak; an H100 draws 700. In Groq’s own demonstrations, a 576-LPU cluster served the 70-billion-parameter Llama-2 model at 1,600 tokens per second per user, roughly 10× the throughput of a comparable GPU cluster. Nvidia has not confirmed those numbers, but even a 3-4× improvement would shift the economics of millions of real-time applications. The wild card is programmability: Groq’s deterministic model works only when the entire network graph is known at compile time, making it ill-suited for dynamic control flow or models that mutate on the fly. Nvidia’s challenge is to loosen that constraint without sacrificing the efficiency that made the LPU attractive in the first place.
Market ripples: who wins, who worries, who copies
Cloud providers reacted to the news with the diplomatic equivalent of a poker face. AWS reiterated its commitment to “multiple silicon suppliers” and pointed to its own Inferentia roadmap. Google Cloud simply wished both parties “continued innovation,” a statement that translates roughly to “we are not amused.” AMD, whose MI300 accelerators are still sampling, saw its stock dip 4 % on the rumor before recovering by close. Startups that bet on inference-only silicon—Cerebras, SambaNova, Tenstorrent—now confront a landscape where the richest player has telegraphed its next move. Venture capitalists promptly circulated memos warning portfolio companies to “differentiate or die,” a phrase that has already become a Slack emoji.
Cloud providers reacted to the news with the diplomatic equivalent of a poker face. AWS reiterated its commitment to “multiple silicon suppliers” and pointed to its own Inferentia roadmap. Google Cloud simply wished both parties “continued innovation,” a statement that translates roughly to “we are not amused.” AMD, whose MI300 accelerators are still sampling, saw its stock dip 4 % on the rumor before recovering by close. Startups that bet on inference-only silicon—Cerebras, SambaNova, Tenstorrent—now confront a landscape where the richest player has telegraphed its next move. Venture capitalists promptly circulated memos warning portfolio companies to “differentiate or die,” a phrase that has already become a Slack emoji.
For enterprise buyers, the deal legitimizes the LPU in one stroke. CTOs who last month asked “Groq who?” can now pencil the acronym into 2025 budgets without fear of orphaned hardware. Nvidia’s sales force, already the most feared in the data-center world, gains another SKU to slide into RFPs that might otherwise have gone to custom ASICs. The combination could accelerate the tipping point where inference workloads migrate from general-purpose GPUs to specialized silicon, shrinking the addressable market for everyone who is not Nvidia or Groq.
The bigger picture: vertical integration versus horizontal openness
Huang likes to say that Nvidia is “a full-stack accelerated-computing company,” a sentence that sounds like marketing fluff until you catalogue the acquisitions: Mellanox for networking, Cumulus for network software, Bright for cluster orchestration, Arm (attempted) for CPUs, now Groq for inference silicon. Each layer bought is a layer closed off to competitors. Critics argue that the strategy invites regulatory scrutiny; the FTC is already probing Nvidia’s cloud-service agreements. Yet the Groq transaction may dodge antitrust turbulence because it is a licensing rather than an acquisition, and because AMD, Intel, and a constellation of startups still offer credible alternatives.
Huang likes to say that Nvidia is “a full-stack accelerated-computing company,” a sentence that sounds like marketing fluff until you catalogue the acquisitions: Mellanox for networking, Cumulus for network software, Bright for cluster orchestration, Arm (attempted) for CPUs, now Groq for inference silicon. Each layer bought is a layer closed off to competitors. Critics argue that the strategy invites regulatory scrutiny; the FTC is already probing Nvidia’s cloud-service agreements. Yet the Groq transaction may dodge antitrust turbulence because it is a licensing rather than an acquisition, and because AMD, Intel, and a constellation of startups still offer credible alternatives.
Still, the philosophical question lingers: does the industry benefit when the dominant vendor absorbs every disruptive threat? Groq’s open-roadmap pledge is reassuring, but history shows that once inside the mothership, priorities have a way of drifting. Ross insists that Sapphire will remain “a Groq chip, not an Nvidia chip,” and that customers will always be able to buy it bare-metal. The proof will arrive in 2025, when the first Sapphire silicon tapes out and the purchase orders are finally public.
Looking ahead: three scenarios for 2027
In the bullish case, Nvidia packages an LPU tile inside every Grace-Blackwell superchip, cutting inference energy by half and cementing a decade-long lead. Groq becomes the inference equivalent of Arm’s big-little architecture, indispensable but invisible, and Ross succeeds Huang as CEO in a carefully choreographed succession. Semiconductor rivals double down on training-focused designs, ceding the inference battlefield the way Intel once ceded mobile.
In the bullish case, Nvidia packages an LPU tile inside every Grace-Blackwell superchip, cutting inference energy by half and cementing a decade-long lead. Groq becomes the inference equivalent of Arm’s big-little architecture, indispensable but invisible, and Ross succeeds Huang as CEO in a carefully choreographed succession. Semiconductor rivals double down on training-focused designs, ceding the inference battlefield the way Intel once ceded mobile.
In the neutral case, integration proves harder than advertised. CUDA and GroqComposer never fully merge; hybrid packages arrive late and overheat; customers stick with pure-play GPUs for all but the most latency-sensitive jobs. Groq survives as a boutique supplier, profitable but no longer existential to Nvidia, and the $20-billion headline becomes a cautionary tale about buying compilers, not just transistors.
In the bear case, regulators wake up. The FTC blocks future Nvidia expansions, hyperscalers accelerate in-house silicon, and open-source compilers emerge that replicate Groq’s deterministic tricks on RISC-V cores. Nvidia still owns the training tier, but inference collapses into a commodity, taking gross margins with it. The Groq deal is remembered as the moment the empire peaked.
For now, the industry is betting on the bullish script. Nvidia’s stock added $120 billion in market cap the day the licensing terms leaked, essentially paying for the deal before the ink dried. In coffee shops around Stanford and Tel Aviv, chip architects are already updating their pitch decks: “We are the next Groq—only this time Nvidia will have to pay even more.” Whether that bravado is justified depends on how quickly the combined team can turn a rebellious compiler into the quiet engine that answers every Alexa prompt, drives every robot arm, and finishes every ChatGPT sentence. If they succeed, the twenty-billion-dollar price tag will look like a bargain. If they fail, the same number will be taught in business schools as the cost of forgetting that in Silicon Valley, yesterday’s weapon is tomorrow’s commodity—and today’s hero is only as good as the next breakthrough still sitting on someone else’s whiteboard.