It doesn't stop there though. OpenAI is currently mired in a capital crunch. Their last round just about sucked all the dry powder out of the private markets. Folks are now starting to ask difficult questions about their burn rate and revenue. It is increasingly looking like they might not commit to the purchase order they made which kick-started this whole panic over RAM.
Soo ... how sure are we that the memory makers themselves are not going to be the ones holding the bag?
If they could make this stuff and sell it to regular people a decade ago for very palatable prices, why do they come up with the idea that this is the technology of the gods, unaffordable by mere mortals?
That is, memory capacity is reserved for datacenters yet to be built, but this will do weird things if said datacenter construction is postponed or cancelled altogether.
The real issue is everyone wanting to upgrade to hbm, ddr5, and nvme5 at the same time.
There’s virtually infinite capital: if needed, more can be reallocated from the federal government (funded with debt), from public companies (funded with people’s retirement funds), from people’s pockets via wealth redistribution upwards, from offshore investment.
They will be allowed to strangle any part of the supply chain they want.
Another point is I often see the money argument - like country X has more money, so they can afford to do more and better R&D, make more stuff.
This stuff comes out of factories, that need to be built, the machinery procured, engineers trained and hired.
> more can be reallocated from the federal government (funded with debt)
While this is the most reliable funding, it's still not very accessible. OpenAI is a money pit, and their demands are growing quickly. The US government has started a bunch of very expensive spending. If OpenAI were to require yearly bundles of it's recent "$120B" deal, that's 6% of the US' discretionary budget. 12.5% of the non-military discretionary budget. (And the military is going to ask for a lot more money this year) Even the idea of just issuing more debt is dubious because they're going to want to do that to pay for the wars that are rapidly spiralling out of control.
None of this is saying that the US government can't or wouldn't pay for it, but it's non trivial and it's unclear how much Altman can threaten the US government "give me a trillion dollars or the economy explodes" without consequences.
Further deficit-spending isn't without it's risks for the US government either. Interests rates are already creeping up, and a careless explosion of deficit may well trigger a debt crisis.
> from public companies (funded with people’s retirement funds)
This would be at great cost. OpenAI would need to open up about it's financial performance to go public itself. With it's CFO being put on what is effectively Administrative Leave for pushing against going public, we can assume the financials are so catastrophic an IPO might bomb and take the company down with it. Nobody's going to be investing privately in a company that has no public takers.
Getting money through other companies is also running into limits. Big Tech has deep pockets but they've already started slowing down, switching to debt to finance AI investment, and similarly are increasingly pressured by their own shareholders to show results.
> from people’s pockets via wealth redistribution upwards
The practical mechanism of this is "AI companies raise their prices". That might also just crash the bubble if demand evaporates. For all the hype, the productivity benefit hasn't really shown up in economy-wide aggregates. The moment AI becomes "expensive", all the casual users will drop it. And the non-casual users are likely to follow. The idea of "AI tokens" as a job perk is cute, but exceedingly few are going to accept lower salary in order to use AI at their job.
There's simply not much money to take out of people's pockets these days, with how high cost of living has gotten.
> from offshore investment.
This is a pretty good source of money. The wealthy Arabian oil states have very deep slush funds, extensively investing in AI to get ties to US businesses and in the hope of diversifying their resource economies.
...
...
"Was". Was a good source of money.
We aren't. The remaining memory manufacturers fear getting caught in a "pork cycle" yet again - that is why there's only the three large ones left anyway.
Oh no!
Given that TurboQuant results in a 6x reduction in memory usage for KV caches and up to 8x boost in speed, this optimization is already showing up in llama.cpp, enabling significantly bigger contexts without having to run a smaller model to fit it all in memory.
Some people thought it might significantly improve the RAM situation, though I remain a bit skeptical - the demand is probably still larger than the reduction turboquant brings.
Current "TurboQuant" implementations are about 3.8X-4.9X on compression (w/ the higher end taking some significant hits of GSM8K performance) and with about 80-100% baseline speed (no improvement, regression): https://github.com/vllm-project/vllm/pull/38479
For those not paying attention, it's probably worth sending this and ongoing discussion for vLLM https://github.com/vllm-project/vllm/issues/38171 and llama.cpp through your summarizer of choice - TurboQuant is fine, but not a magic bullet. Personally, I've been experimenting with DMS and I think it has a lot more promise and can be stacked with various quantization schemes.
The biggest savings in kvcache though is in improved model architecture. Gemma 4's SWA/global hybrid saves up to 10X kvcache, MLA/DSA (the latter that helps solve global attention compute) does as well, and using linear, SSM layers saves even more.
None of these reduce memory demand (Jevon's paradox, etc), though. Looking at my coding tools, I'm using about 10-15B cached tokens/mo currently (was 5-8B a couple months ago) and while I think I'm probably above average on the curve, I don't consider myself doing anything especially crazy and this year, between mainstream developers, and more and more agents, I don't think there's really any limit to the number of tokens that people will want to consume.
For example Gemma 4 32B, which you can run on an off-the-shelf laptop, is around the same or even higher intelligence level as the SOTA models from 2 years ago (e.g. gpt-4o). Probably by the time memory prices come down we will have something as smart as Opus 4.7 that can be run locally.
Bigger models of course have more embedded knowledge, but just knowing that they should make a tool call to do a web search can bypass a lot of that.
That is the sad reality of the future of memory.
I hate to mention Jevons paradox as it has become cliche by now, but this is a textbook such scenario
[0] https://techwireasia.com/2026/04/chinese-memory-chips-ymtc-c...
Assuming China takes TSMC in one piece (unlikely without internal sabotage in the best case scenario), it would still probably take years before it produces another high end GPU or CPU.
We would probably be stuck with the existing inventory of equipment for a long time…
The risk with China taking over Taiwan is that they mostly expedite their own production research by a couple of years.
Anyone trying to spin up a competitor to TSMC would have to first overcome a significant financial hurdle: the capital investment to build all the industrial equipment needed for fabrication.
Then they'd have to convince institutions to choose them over TSMC when they're unproven, and likely objectively worse than TSMC, given that they would not have its decades of experience and process optimization.
This would be mitigated somewhat if our institutions had common-sense rules in place requiring multiple vendors for every part of their supply chain—note, not just "multiple bids, leading to picking a single vendor" but "multiple vendors actively supplying them at all times". But our system prioritizes efficiency over resiliency.
A wealthy nation-state with a sufficiently motivated voter base could certainly build up a meaningful competitor to TSMC over the course of, say, a decade or two (or three...). But it would require sustained investment at all levels—and not just investment in the simple financial sense; it requires people investing their time in education and research. Dedicating their lives to making the best chips in the world. And the only reason that would work is that it defies our system, and chooses to invest in plants that won't be finished for years, and then pay for chips that they know are inferior in quality, because they're our chips, and paying for them when they're lower quality is the only way to get them to be the best chips in the world.
We have RAM shortage now, we will have very cheap RAM tomorrow. It’s not like production is bottlenecked by raw materials. Chip companies just need to assess if the demand by AI companies will last so it’s better to scale up, or perhaps they should wait it out instead of oversupplying and cutting into their profits.
The lawsuits in the past prove that statement to not be basically but actually.
Think I will scrap my PC and sell its parts.
I wonder if there are any niche companies building decent rigs with DDR3 and 5/6th generation Intel CPUs out there, it is cheap and might be a business opportunity?