Tech & Culture · chips compute

A Startup Says It Can Beat Nvidia at Inference With Addition

Tensordyne's Napier processor replaces the multiply-heavy math at the core of AI with logarithms — and claims a single rack does the work of nine Nvidia ones.

Flux Desk·2026-06-18·5 min read

Every frontier model you have ever used runs on the same arithmetic primitive: multiply, then add, billions of times per token, over and over until words appear. Nvidia's empire is built on doing that multiplication faster and in greater parallel than anyone else. On June 15, 2026, a startup with offices in Sunnyvale and Munich said the smarter move is to stop multiplying altogether. Tensordyne unveiled Napier — its inference processor, marketed as the TDN — and made a claim large enough to be either a turning point or a cautionary tale: against Nvidia's flagship GB300 NVL72 rack, Napier delivers 13× the throughput in tokens per second and 17× the tokens per watt.

The trick is in the number system

The headline numbers come from a single, unglamorous design decision. Neural networks are, at the level of silicon, oceans of multiply-accumulate operations — and multiplication is expensive in transistors, energy, and heat. Tensordyne's bet is that you can convert most of that multiplication into addition by working in a logarithmic number system. In log space, multiplying two numbers becomes adding their exponents, and addition is cheap. Build the whole datapath around that representation — "optimising math, compute, memory, and networking from first principles," in CEO Marc Bolitho's framing — and you get more useful work out of every watt and every square millimeter.

This is not a new idea in the abstract; logarithmic arithmetic has lived in academic papers for decades, defeated each time by the precision losses that creep in when you leave the clean grid of standard floating point. What's new is a company claiming it has tamed those losses well enough to run real inference workloads — the comparison benchmark was DeepSeek-R1 — and packaged it into a rack you can buy.

What a rack looks like

Napier scales in pods. The flagship TDN72 packs 72 processors, and a full rack stacks four pods into 288 processors. Tensordyne says the system stays air-cooled at that density — itself a notable claim in an era when Nvidia's top racks demand elaborate liquid loops — and that a single rack can sustain more than 1,000 tokens per second per user. To match that, the company argues, you'd need roughly nine racks of Nvidia Rubin paired with Groq inference chips. Translate the efficiency into a data-center operator's spreadsheet and Tensordyne's pitch sharpens to a number that gets meetings booked: up to $33 million more in annual revenue per rack, simply from serving more tokens with the same power envelope.

The economics are the whole argument. Inference — not training — is where the AI industry now spends most of its compute, because every query from every user runs through a model forever, while training happens in bursts. Power and cooling, not chip supply, are increasingly the binding constraint on how much intelligence a company can actually deliver. A processor that is dramatically more efficient per watt isn't just cheaper; in a grid-constrained world, it's the difference between scaling and stalling.

The credibility ledger

There are real reasons to take this seriously and real reasons to wait. On the credible side: Napier has completed tape-out and is in production at TSMC on its 3nm node — meaning this is silicon, not a slide deck. The company lists Broadcom and HPE Juniper Networks among its partners, names that don't attach themselves to vaporware lightly, and says it's holding more than a dozen letters of intent representing over $200 million in forecasted demand, with a Series D expected later this year.

On the skeptical side: every performance figure here is a vendor benchmark, not an independent one, and the industry's graveyard is full of "Nvidia killer" chips whose advantages evaporated once real models, real batch sizes, and real software stacks arrived. Logarithmic math's historical weakness is accuracy, and Tensordyne has not yet published the kind of third-party evaluations that would prove its precision holds across diverse workloads. And the timeline matters: initial shipments are expected late in 2026, with volume production targeted for mid-2027. By then Nvidia will have moved its own goalposts — Rubin is already ramping — so the right comparison isn't today's Nvidia rack but the one shipping alongside Napier.

Why it matters even if the numbers shrink

The deeper signal sits above any single spec. For three years the AI hardware conversation has been a monologue, and the only question was how much Nvidia silicon you could get and when. What's changing in 2026 is that the bottleneck has visibly migrated from can you get chips to can you power and cool them — and that shift opens the door to architectures optimized for efficiency rather than raw peak FLOPS. Tensordyne is one answer; Groq's deterministic streaming chips and a wave of inference-specialized startups are others. They won't all be right. But the category exists now in a way it didn't, and incumbents tend to be vulnerable precisely where they're strongest — Nvidia's general-purpose GPU is a marvel partly because it does everything, which means a chip that does one thing, inference, can in principle do it better.

Even discount Tensordyne's claims by half and the story holds: someone has shipped real 3nm silicon built on the premise that the way to win the inference war is to change the math, not just the transistor count. That premise is now testable in the field within a year. The most expensive assumption in AI — that compute means Nvidia — is, for the first time in a while, something a customer can put a purchase order against and find out.

#tensordyne#nvidia#ai-inference#chips#logarithmic-math