This is true for making things just work, however to really squeeze performance out of a GPU, you need to go lower level and that is tied to the architecture.
This has happened before and it will probably go the same way. Software and compilers will make up the difference, or hardware will become so cheap and ubiquitous it wont super matter.
In 3-5 years what will a 10% performance difference matter to you? Then calculate how much that 10% performance difference is going to cost in real dollars to run on nvidia hw and then the fun math should start.
Performance for GPUs isn't just speed, but also power efficiency. The complexity of GPUs doesn't lend itself to just being solved with better tooling. They are also not going to get cheaper... especially the high end ones with tons of HBM3 memory.
Given that data centers only have so much power and AI really needs to be in the same data center as the data, if you can squeeze out a bit more power efficiency so you can fit more cards, you are getting gains there as well.
When I was mining ethereum, the guy who wrote the mining software used an oscilloscope to squeeze an an extra 5-10% out of our cards and that was after having used them for years. That translated to saving about 1 MW of power across all of our data centers.
Let me also remind you that GPUs are silicon snowflakes. No two perform exactly the same. They all require very specific individual tuning to get the best performance out of them. This tuning is not even at the software level, but actual changes to voltage/memory timings/clock speeds.
You are right to worry about power efficiency. Though do keep in mind that power is also fungible with money, especially in a data centre.
I suspect a lot of AI inference (thought probably not the majority) will happen on mobile devices in the future. There power is also at a premium, and less fungible with money.
> Though do keep in mind that power is also fungible with money, especially in a data centre.
Untrue. I have filled 3 very large data centers where there was no more power to be had. Data centers are constrained by power. At some limit, you can't just spend more money to get more power.
It also becomes a cooling issue, the more power your GPUs consume, the more heat they generate, the more cooling that is required, the more power that is required for cooling. Often measured in PUE.
Hate to break it to you. That is getting harder and harder. Certainly you can get a couple mw, but if you want 50, you are going to find that to be extremely challenging, if not impossible.
The large ones being built today are spoken for already. That is how crazy the demand is right now. People aren't talking about it in the general news. They can't build them fast enough cause things like transformers, generators and cooling are all having supply issues.
Even still, power is limited. You can build DC's but if you can't power them... what are you going to do? This isn't just throw more money at the problem.
Have you noticed that data center stock, like EQIX, are at all time highs?
> Have you noticed that data center stock, like EQIX, are at all time highs?
Though the FTSE All-World index (or the S&P 500) is also at all time highs, so I would expect most stocks to be at all time highs, too.
> Even still, power is limited. You can build DC's but if you can't power them... what are you going to do? This isn't just throw more money at the problem.
I guess you can try to outbid other people? But thanks: I didn't know the data-centre-building industry was so supply constrained at the moment.
I knew you'd say that. SMCI though. It isn't just the macro. It is this AI stuff and has been going on quietly behind the scenes for the last 1.5-2 years now. Unless you're deep in the business, it just doesn't make the news headlines because it is all so intrinsically hush hush.
> I didn't know the data-centre-building industry was so supply constrained at the moment.
The whole supply chain is borked. Try to buy 800G mellanox networking gear. 52 week lead time. I've got a fairly special $250 cable I need that I can't get until April. I could go on and on...
I've seen some of that playing out in a business that was using GPUs for deep learning as applied to financial market making. They were throwing a lot of money at nvidia, too.
I wonder if it's enough money in total in AI to show up in countrywide GDP figures anytime soon. Because either AI's hunger for ever more computing power has to slow down, or world GDP has to increase markedly.
Well, I'm talking about the rate of increasing slowing.
Given the speed of light as an upper limit, in the very long run we can at most have a cubic growth, not an exponential growth. Something will have to give eventually.
(OK, you also probably need to Bekenstein bound. Otherwise, you could try sticking more and more into information into the same amount of space. But there's a limit to that, before things turn into black holes.)
We are so, so, so far away from compilers that could automatically help you, say, rewrite an operation to achieve high warp occupancy. These are not trivial performance optimizations - sometimes the algorithm itself fundamentally changes when you target the CUDA runtime, because of complexities in the scheduler and memory subsystems.
I think there is no way that you will see compilers that advanced within 3 years, sadly.