Nvidia's Moat is Cracking: TSMC and Inference Creates an Opening
Leading Edge Shortages Are Creating an Opportunity for Competitors
Three quick things at the top:
The audiobook edition of my book What You Need to Know About AI is available now! Check it out on Audible if you’re interested.
This Saturday (tomorrow!) meet me at Barnes & Noble in San Jose, CA on February 7th from 1-3pm. No RSVP required, but if you happen to be around feel free to swing by and say hi!
This article I’m trying a voiceover version where I narrate the article. Check it out and let me know if you enjoy it.
And now, back to our regularly scheduled programming.
A foundation, a private equity fund, a hedge fund, a journalist, and a pastor. That’s not the start of a “walked into a bar” joke. Those are entities that have had in-depth questions for me about Nvidia losing its moat. As such, it seems like the right time to write an update about my thoughts on Nvidia and its CUDA moat.
In early 2024, I wrote about CUDA’s moat and argued it was as high a barrier as the hardware itself. Higher, in fact. The argument was simple: NVIDIA controls the entire stack, stuff just magically works, and years of tools, libraries, PhD blog posts, and guides all assume you’re using NVIDIA GPUs. Even with a lot of capital at stake, it’s hard to magically catch up to nearly two decades of battle-tested scientific computing, with the researcher community helping report problems and help develop the Nvidia’s platform.
Even if alternatives improved, I said, “it would take quite a while for the impacts of such a heavy, long-term strategy to go away.”
I still believe that. But “quite a while” could be getting shorter. There’s at least a window for competitors. The reason isn’t that AMD or Google suddenly got better at software (the former has fumbled the ball consistently and the latter was never bad).
The reason is that TSMC can’t make enough chips, Nvidia can’t buy its way out of the constraint, and customers are being forced to try alternatives—specifically for inference.
The Supply Constraint Nobody Chose
TSMC’s most advanced node is sold out through 2026. Now, you might ask: isn’t being sold out through next year pretty normal for a leading-edge foundry? Actually, no. Not in the way that it’s sold out now.
At the Semiconductor Industry Association Awards in November 2025, TSMC CEO C.C. Wei gave a blunt assessment: “Not enough, not enough, still not enough.” He estimated that TSMC’s advanced-node capacity is roughly three times short of what its major customers want to consume. For context, in late 2022—right before ChatGPT launched—wafer starts had actually collapsed because the smartphone, PC, and server markets all went into recession. TSMC had excess capacity. That’s a far cry from where we find ourselves today.
The desperation is visible in tech leader pilgrimages to TSMC to beg for more chips. Sam Altman has made multiple secret visits to Taiwan to meet with TSMC executives about chip supply. Apple’s Tim Cook explicitly called out TSMC capacity as a reason they couldn’t fulfill demand for the new iPhone 17 Pro this quarter—and probably next quarter too.
Finally, we come to the company we care most about in this article.
Nvidia has secured over 50% of TSMC’s expanded CoWoS (Chip-on-Wafer-on-Substrate) capacity through 2026. Is that normal? Absolutely not.
Apple has been TSMC’s largest customer for over a decade. That’s now changing: Nvidia has overtaken Apple as TSMC’s biggest customer.
In 2023, Nvidia was maybe 5-10% of TSMC’s revenue. By 2026, analysts project Nvidia at $33 billion (22% of TSMC revenue) versus Apple at $27 billion (18%). Jensen Huang confirmed the shift publicly.
This is an incredible realignment—and it’s still not enough to meet demand for Nvidia chips.
I’ve criticized Nvidia for squeezing the market with suboptimal VRAM (high bandwidth “video RAM” or, basically, GPU memory), because it creates space for competitors to come in and seize the market. But maybe that critique has been unfair.
Perhaps this is the best option Nvidia has—since they cannot choose “give the market all it wants to keep everyone hooked on CUDA.” Not that it was ever my argument, but they’re not merely artificially constraining supply to maximize profit extraction. They literally cannot get enough capacity from the only foundry capable of making their chips at the required specs.
Now, realistically, few will shift off of CUDA—even under these conditions—for training frontier models. This is, after all, what all of those AI labs have been securing all those Nvidia GPUs for. There is, however, a place where a more diversified approach can make sense.
Inference.
Why Inference is the Soft Underbelly
Let’s be clear about the market structure. Training—“teaching” AI models—is maybe 20-45% of total workload volume, but it’s where the money is dearest and switching costs are highest. Training is also where CUDA’s advantages are most acute. The performance gap versus alternatives is substantial, the optimization work is deep, and the developers who specialize in training (and, if not them, their tooling) are still predominantly CUDA-trained. Nvidia isn’t losing training anytime soon—and I don’t really see any serious-minded arguments to the contrary, even with various experiments by AI labs with other platforms.
Inference is different. Inference is running a model that’s already trained—asking ChatGPT a question, generating an image, running a recommendation engine. By compute volume, inference hit ~50% in 2025 and is projected to reach two-thirds by 2026. McKinsey projects inference growing at 35% annually versus 22% for training through 2030. And by lifetime cost, inference can account for 80-90% of total AI spending in production systems—training is a one-time investment, but inference runs continuously.
More importantly, inference costs are collapsing.
Guido Appenzeller at a16z coined the term “LLMflation”: for an LLM of equivalent performance, the cost is decreasing by roughly 10x every year. ARK Invest cites a similar figure of 85-90% per year.
When inference gets that cheap, two things happen. First, inference becomes economical everywhere—edge devices, internal tools, use cases that would never have existed at $20 per million tokens. Second, when the workload becomes price-sensitive, customers start asking harder questions about whether they really need Nvidia hardware for it.
Even worse, some customers can’t get Nvidia hardware even if they wanted it. In which, they’re forced off Nvidia by default.
The Ecosystem Flywheel—In Reverse
I wrote that CUDA’s moat wasn’t just technical superiority—it was ecosystem lock-in. PhD students learn CUDA. They write their dissertations using CUDA. They publish blog posts about buying used RTX 2090s (well, back then). They write libraries that assume CUDA. Those libraries get used by the next generation of researchers. The cycle reinforces itself.
That entire ecosystem is the real lock-in and a network effect. It’s just not worth it to go where other people in this niche aren’t. Once people starting going somewhere else though...
Let’s think about what happens when a company—AI lab, hyperscaler, or anything else—can’t get H100s for inference and decides to try Google TPUs or AMD’s MI300X or Amazon Trainum or Cerberus or (... you get the idea) instead:
They hire or train engineers who know how to deploy on that platform
Those engineers write internal tools and scripts optimized for non-CUDA
Some of them blog about it, write guides, contribute to open-source projects (or open source what they wrote)
The next company trying to deploy inference sees those guides and decides that going not-Nvidia is not that bad
Libraries improve because there are now real users filing bugs and contributing patches
Switching costs drop because the ecosystem is maturing
More companies try alternatives… and then we get back to step 1
This doesn’t necessarily break training’s moat, but with the size of inference, losing that market is already bad enough for Nvidia.
But the more people get experience and build things for non-NVIDIA hardware, the easier it will get. That two decade advantage of battle-testing for Nvidia’s ecosystem chips away, little by little.
Midjourney reportedly saved 65% on inference costs by switching from Nvidia GPUs to Google TPUs—dropping from $2.1 million monthly to under $700,000. AMD’s ROCm now has first-class PyTorch support. Llama.cpp—the inference library that’s become standard for running open-source models locally—now has official ROCm support.
(Speaking of, Babbage has a great article on TPUs that’s a must-read on the topic.)
How important are all of these? Less than their headlines, honestly.
Google TPUs are unlikely to catch on with the major labs, despite some high profile announcements, most notably, Anthropic. Google’s kind of a competitor to them all. AMD’s ROCm, the butt of many of my jokes, has famously shot itself in the face for years and I have faith in their ability to continue to do so. But obviously folks are trying out alternatives and a big part of it is difficulty getting Nvidia GPUs.
Eventually, everyone would prefer not to be locked-in and if the cost of switching falls enough... well, that’s the end of the moat.
Nvidia Isn’t Sitting Idle
Nvidia isn’t stupid. They obviously can read the same trends as I can. That is, of course, why it’s interesting that Nvidia (and Apple) is reportedly shifting some 2028 production to Intel’s fabs. This follows on from a large strategic investment from Nvidia into Intel and reportedly aiming to move significant amounts of production of the upcoming “Feynman” architecture to Intel as well.
Obviously, this is also useful for diversifying from Taiwan and its geopolitical complications. If we do see more movement to Intel—or even Samsung, the other leading-edge fab—we could see Nvidia potentially reversing their fortunes in the future.
The question is whether they can close it fast enough.
Intel’s fabs won’t be at full production until 2027 at the earliest. TSMC’s expansion takes time. And meanwhile, every quarter that customers can’t get Nvidia hardware is another quarter of forced experimentation with alternatives.
What About China?
While it isn’t an elephant in the room for me, I know it is for some readers. What about Chinese AI companies?
Despite U.S. on-again-off-again export bans, Chinese companies remain heavily Nvidia-dependent. Roughly 75% of AI training chips in Chinese data centers still run on CUDA. And there are workarounds to the US’s mercurial policy on these chips. Tencent signed a $1.2 billion deal with Japanese neocloud Datasection to rent access to Nvidia’s B200 chips—which is a fairly common route for Chinese companies to get access to Nvidia GPUs. DeepSeek reportedly operates a fleet of 50,000 Hopper GPUs stockpiled before restrictions tightened. Companies are going to extraordinary lengths to stay on Nvidia hardware—enough so that the Chinese government has done some on-and-off restrictions, including discouraging/banning H200s.
Huawei’s Ascend chips and CANN, its version of CUDA, are the domestic alternative. (Grace Shao has done a good job covering the topic.) And there is a lot of real progress there.
For example, ByteDance signed a $5.6 billion order for Ascend chips, which is a little bit too much to be pure window-dressing for the sake of looking like you’re supporting the China option. But CANN still has plenty of problems. Again, you can’t catch up to two decades of refinement overnight. Chinese companies are dragging their feet on switching despite government pressure. (As I’ve said before, the Chinese ecosystem is not is not a monolithic community or hive mind. Contrary to what some Western observers think, if individual companies can get an edge on other Chinese companies, they’ll do it.)
The China situation is a great demonstration of the continued durability of CUDA’s moat: even with maximum government pressure and supply restrictions, Nvidia still not only exists but dominates. Still, enough forced adoption, progress from Huawei, and (again) movement of the broader AI community towards other platforms does create a real threat.
So, is Nvidia doomed to lose their moat?
No, not really. Despite all the doomsaying, it isn’t inevitable. After all, this story could reverse quickly if Nvidia instead finds capacity to flood the market with their chips, backed by their superior ecosystem of libraries, experienced researchers/developers, and battle-tested software. It is a window of opportunity for competitors, though.
TSMC might or might not make enough investments to ease constraints by 2027. Intel’s capacity will come online (theoretically) around the same time. We might end up with overcapacity. The semiconductor industry, including fabs, have an infamous tendency for large boom and bust cycles—its inevitable given the huge lead times and massive capex inherent to the industry.
In the meantime, however, we might see a significant ecosystem diversification from Nvidia. And that starts in inference... which could eventually cause the loss of Nvidia’s moat everywhere else.
Thanks for reading!
I hope you enjoyed this article. If you’d like to learn more about AI’s past, present, and future in an easy to understand way, I’ve published a book titled What You Need to Know About AI, now also available in audiobook format!
You can learn more and order it on Amazon, Audible, Barnes and Noble, or select indie bookstores in the Bay Area and beyond.








Great post! About the "Nvidia can’t buy its way out of the constraint": that is true. However, they can, and, I expect, will throw their weight and cash around and push others to further back down the wait-list.
Nvidia famously has an enormous net profit margin, so they can always negotiate with TSMC along the line of carrot and stick. Carrot: we (Nvidia ) pay you (TSMC) a lot more if you sell us a lot more capacity. Stick: If you don't, we'll do our best to get Samsung Foundry and Intel to build out their advanced nodes, which would make them a lot more competitive so you (TSMC) lose a lot of business down the line. Not an idle threat!
Apparently, even Apple is getting squeezed when it comes to getting capacity in TSMC's most advanced nodes, so this current shortage is both beneficial (in $$$) and potentially dangerous for TSMC. Rumors have it that Apple and Intel had intense discussions over a PDK for an entry level SoC recently.