I used to be one of those finance guys who could almost code, but not really. CUDA was awesome. Stuff just worked and one did not need a CS degree to do it. Furthermore, like 15 years ago one could get guys at Nvidia to help you. It was easy. Microsoft used to be the same way with SQL Server. They would send you a dll and other crap and your stuff would just work. I built a huge financial modeling software thingy with a SQL Server backend and I did it with a history degree (I am good at math, or at least I used to be).
CUDA is the ultimate lock-in. AMD has made more powerful cards than Nvidia in the past, but even in gaming it would take them years to optimize the drivers to get the performance out of the cards. In some ways it was kind of crazy. You could buy a card, and if you kept your drivers up to date, that AMD card would double its performance over a couple years. The problem is, you want the performance NOW.
For business, AMD was never serious about general purpose compute for non-graphicy problems. Even with Tensorflow, I think 99% of the people using it were using some other dude's docker image with tensorflow and python running a quardro or even GeForce card.
AMD is catching up, and they finally have the money to hire the staff to fix this. Let's all hope that they do. But even then, we need easy solutions guys like me can use. That takes an ecosytem, not simply the tools.
The issue is not only if CUDA is better / faster than the alternatives, but the existing software requiring CUDA. It's not about the technology, but the agreements between corporations. At the end, that's always the bottleneck.
Why isn't there a DirectX module to abstract the underlying technology from the applications? Why won't the applications consume an interface no matter what's at the lower levels? Not because of technology limitations but politics $.
And why is Nvidia allowed to sell dummy bricks at rocket price if what they are actually selling is software in a dumb box? People are buying "hardware" that has not much more than what they bought 8 years ago. So we're almost buying software updates, ala FIFA games.
Again, why? Because of $. Why no regulations? Because who regulates gets enough $ not to really regulate.
Big still, yes. It puzzles me that AMD hasn't yet found its way around this moat. They didn't have competitive hardware for tensor ops until mi300, but now they do, and perhaps they will turn their attention to seriously addressing the software gap. CUDA isn't really important. What is important is PyTorch, and the python numerical ecosystem generally.
If you're using PyTorch mainly for ML, you're already outside of the audience NVIDIA needs for lock-in. Not an insult or whatever—I do agree industry practitioners almost always are using these higher-level tools, in a similar way that industry programmers don't need to be mucking around in assembler. What I'm referring to mainly is levels below PyTorch or similar higher-level libraries (or even numerical Python itself, where a lot of bottleneck code or just NumPy itself is built in C/C++/etc... including code that uses CUDA... and not pure Python).
When I was doing projects in these classes, I could never use PyTorch, because it didn't support a ton of stuff that I wanted to do well. Bayesian Hierarchical Models, LDAs, whatever. A lot of stuff was (horrifyingly) actually only available in R code, as said, but that became the basis for porting over to C++ or Python.
Anyway, if you're at that point, you will often need to write parallel code for bottlenecks yourself—and the least painful and most performant way is probably using CUDA.
That's part of the point I was making—the low-level libraries that don't randomly bubble up numerical problems (... which ROCm's ecosystem absolutely does), and the "high-level" of bleeding-edge new models that researchers are making absolutely use CUDA, and pretty much nothing else (not ROCm, not OpenCL, etc)
I used to be one of those finance guys who could almost code, but not really. CUDA was awesome. Stuff just worked and one did not need a CS degree to do it. Furthermore, like 15 years ago one could get guys at Nvidia to help you. It was easy. Microsoft used to be the same way with SQL Server. They would send you a dll and other crap and your stuff would just work. I built a huge financial modeling software thingy with a SQL Server backend and I did it with a history degree (I am good at math, or at least I used to be).
CUDA is the ultimate lock-in. AMD has made more powerful cards than Nvidia in the past, but even in gaming it would take them years to optimize the drivers to get the performance out of the cards. In some ways it was kind of crazy. You could buy a card, and if you kept your drivers up to date, that AMD card would double its performance over a couple years. The problem is, you want the performance NOW.
For business, AMD was never serious about general purpose compute for non-graphicy problems. Even with Tensorflow, I think 99% of the people using it were using some other dude's docker image with tensorflow and python running a quardro or even GeForce card.
AMD is catching up, and they finally have the money to hire the staff to fix this. Let's all hope that they do. But even then, we need easy solutions guys like me can use. That takes an ecosytem, not simply the tools.
Excellent insights..!
Great article. It described Nvidias Software/ecosystem Moat qualitatively. Thank you for covering this topic 😃
An interesting area might be an attempt to capture it quantitatively, number of projects in CUDA, Google search Trends, research (impact factor) etc.
PS: You were referred by Devansh from AI Made Simple.
The issue is not only if CUDA is better / faster than the alternatives, but the existing software requiring CUDA. It's not about the technology, but the agreements between corporations. At the end, that's always the bottleneck.
Why isn't there a DirectX module to abstract the underlying technology from the applications? Why won't the applications consume an interface no matter what's at the lower levels? Not because of technology limitations but politics $.
And why is Nvidia allowed to sell dummy bricks at rocket price if what they are actually selling is software in a dumb box? People are buying "hardware" that has not much more than what they bought 8 years ago. So we're almost buying software updates, ala FIFA games.
Again, why? Because of $. Why no regulations? Because who regulates gets enough $ not to really regulate.
NVDA market capital will be $5 trillion within 2 years
Big still, yes. It puzzles me that AMD hasn't yet found its way around this moat. They didn't have competitive hardware for tensor ops until mi300, but now they do, and perhaps they will turn their attention to seriously addressing the software gap. CUDA isn't really important. What is important is PyTorch, and the python numerical ecosystem generally.
If you're using PyTorch mainly for ML, you're already outside of the audience NVIDIA needs for lock-in. Not an insult or whatever—I do agree industry practitioners almost always are using these higher-level tools, in a similar way that industry programmers don't need to be mucking around in assembler. What I'm referring to mainly is levels below PyTorch or similar higher-level libraries (or even numerical Python itself, where a lot of bottleneck code or just NumPy itself is built in C/C++/etc... including code that uses CUDA... and not pure Python).
When I was doing projects in these classes, I could never use PyTorch, because it didn't support a ton of stuff that I wanted to do well. Bayesian Hierarchical Models, LDAs, whatever. A lot of stuff was (horrifyingly) actually only available in R code, as said, but that became the basis for porting over to C++ or Python.
Anyway, if you're at that point, you will often need to write parallel code for bottlenecks yourself—and the least painful and most performant way is probably using CUDA.
That's part of the point I was making—the low-level libraries that don't randomly bubble up numerical problems (... which ROCm's ecosystem absolutely does), and the "high-level" of bleeding-edge new models that researchers are making absolutely use CUDA, and pretty much nothing else (not ROCm, not OpenCL, etc)