93 points by fcpguru 5 hours ago | 15 comments
maz1b 3 hours ago
Cerebras has been a true revelation when it comes to inference. I have a lot of respect for their founder, team, innovation, and technology. The colossal size of the WS3 chip, utilizing DRAM to mind-boggling scale, it's definitely ultra cool stuff.

I also wonder why they have not been acquired yet. Or is it intentional?

I will say, their pricing and deployment strategy is a bit murky and unclear. Paying $1500-$10,000 per month plus usage costs? I'm assuming that it has to do with chasing and optimizing for higher value contracts and deeper-pocketed customers, hence the minimum monthly spend that they require.

I'm not claiming to be an expert, but as a CEO/CTO, there were other providers in the market that had relatively comparable inference speed (obviously Cerebras is #1), easier onboarding, better response from people that worked there (all of my experience with Cerebras have been days/weeks late or simply ignored). IMHO, if Cerebras wants to gain more mindshare, they'll have to look into this aspect.

oceanplexian 3 hours ago
I’ve been using them as a customer and have been fairly impressed. The thing is, a lot of inference providers might seem better on paper but it turns out they’re not.

Recently there was a fiasco I saw posted on r/localllama where many of the OpenRouter providers were degraded on benchmarks compared to base models, implying they are serving up quantized models to save costs, but lying to customers about it. Unless you’re actually auditing the tokens you’re purchasing you may not be getting what you’re paying for even if the T/s and $/token seems better.

dlojudice 1 hour ago
OpenRouter should be responsible for this quality control, right? It seems to me to be the right player in the chain with the duties and scale to do so.
aurareturn 1 hour ago

  I also wonder why they have not been acquired yet. Or is it intentional?
A few issues:

1. To achieve high speeds, they put everything on SRAM. I estimated that they needed over $100m of chips just to do Qwen 3 at max context size. You can run the same model with max context size on $1m of Blackwell chips but at a slower speed. Anandtech had an article saying that Cerebras was selling a single chip for around $2-3m. https://news.ycombinator.com/item?id=44658198

2. SRAM has virtually stopped scaling in new nodes. Therefore, new generations of wafer scale chips won’t gain as much as traditional GPUs.

3. Cerebras was designed in the pre-ChatGPT era where much smaller models were being trained. It is practically useless for training in 2025 because of how big LLMs have gotten. It can only do inference but see above 2 problems.

4. To inference very large LLMs economically, Cerebras would need to use external HBM. If it has to reach outside for memory, the benefits of a wafer scale chip greatly diminishes. Remember that the whole idea was to put the entire AI model inside the wafer so memory bandwidth is ultra fast.

5. Chip interconnect technology might make wafer scale chips more redundant. TSMC has a roadmap for glueing more than 2 GPU dies together. Nvidia’s Feynman GPUs might have 4 dies glued together. IE, the sweet spot for large chips might not be wafer scale but perhaps 2, 4, 8 GPUs together.

6. Nvidia seems to be moving much faster in terms of development and responding to market needs. For example, Blackwell is focused on FP4 inferencing now. I suppose the nature of designing and building a wafer scale chip is more complex than a GPU. Cerebras also needs to wait for new nodes to fully mature so that yields can be higher.

There exists a niche where some applications might need super fast token generation regardless of price. Hedge funds and Wallstreet might be good use cases. But it won’t challenge Nvidia in training or large scale inference.

addaon 47 minutes ago
> SRAM has virtually stopped scaling in new nodes.

But there are several 1T memories that are still scaling, more or less — eDRAM, MRAM, etc. Is there anything preventing their general architecture from moving to a 1T technology once the density advantages outweigh the need for pipelining to hide access time?

aurareturn 38 minutes ago
I’m pretty sure that HBM4 can be 20-30x faster in terms of bandwidth than eDRAM. That makes eDRAM not an option for AI workloads since bandwidth is the main bottleneck.
throw123890423 1 hour ago
> I will say, their pricing and deployment strategy is a bit murky and unclear. Paying $1500-$10,000 per month plus usage costs? I'm assuming that it has to do with chasing and optimizing for higher value contracts and deeper-pocketed customers, hence the minimum monthly spend that they require.

Yeah wait, why rent chips instead of sell them? Why wouldn't customers want to invest money in competition for cheaper inference hardware? It's not like Nvidia has a blacklist of companies that have bought chips from competitors, or anything. Now that would be crazy! That sure would make this market tough to compete in, wouldn't it. I'm so glad Nvidia is definitely not pressuring companies to not buy from competitors or anything.

aurareturn 51 minutes ago
Their chips weren’t selling because:

1. They’re useless for training in 2025. They were designed for training prior to LLM explosion. They’re not practical for training anymore because they rely on SRAM which is not scalable.

2. No one is going to spend the resources to optimize models to run on their SDK and hardware. Open source inference engines don’t optimize for Cerebras hardware.

Given the above two reasons, it makes a lot of sense that no one is investing in their hardware and they have switched to a cloud model selling speed as the differentiator.

It’s not always “Nvidia bad”.

OkayPhysicist 2 hours ago
The UAE has sunk a lot of money into them, and I suspect it's not purely a financial move. If that's the case, an acquisition might be more complicated than it would seem at first glance.
nsteel 1 hour ago
> utilizing DRAM to mind-boggling scale

I thought it was the SRAM scaling that was impressive, no?

liuliu 3 hours ago
They were acquisition target since 2017 (from the OpenAI internal emails). So lacking of acquisition is not because lacking of interests. Let you wonder what happened in these due-diligence.
Shakahs 2 hours ago
Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast. Even if it takes more LLM calls to complete a task, those calls are all happening in a fraction of the time.
nerpderp82 2 hours ago
We must have very different workflows, I am curious about yours. What tools are you using and how are you guiding Qwen3-Coder? When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.
CaptainOfCoit 2 hours ago
> When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.

Indirectly, it sounds like you're aware about the inference speed? Imagine if it took 2 minutes instead of 10 minutes, that's what the parent means.

yodon 51 minutes ago
2 minutes is the worst delay. With 10 minutes, I can and do context switch to something else and use the time productively. With 2 min, I wait and get frustrated and bored.
ripped_britches 1 hour ago
Do you use cursor or what? Interested in how you set this up
mythz 2 hours ago
Running Qwen3 coder at speed is great, but would also prefer to have access to other leading OSS models like GLM 4.6, Kimi K2 and DeepSeek v3.2 before considering switching subs.

Groq also runs OSS models at speed which is my preferred way to access Kimi K2 on their free quotas.

fcpguru 5 hours ago
Their core product is the Wafer Scale Engine (WSE-3) — the largest single chip ever made for AI, designed to train and run models much faster and more efficiently than traditional GPUs.

Just tried https://cloud.cerebras.ai wow is it fast!

OGEnthusiast 3 hours ago
I'm surprised how under-the-radar Cerebras is. Being able to get near-instantaneous responses from Qwen3 and gpt-oss is pretty incredible.
JLO64 2 hours ago
My experience with Cerebras is pretty mixed. On the one hand for simple and basic requests, it truly is mind blowing how fast it is. That said, I’ve had nothing but issues and empty responses whenever I try to use them for coding tasks (Opencode via Openrouter, GPT-OSS). It’s gotten to a point where I’ve disabled them as a provider on Openrouter.
divmain 3 minutes ago
I experienced the same, but I think it is a limitation of OpenRouter. When I hit Cerebra’s OpenAI endpoint directly, it works flawlessly.
lvl155 1 hour ago
Last I tried, their service was spotty and unreliable. I would wait maybe a year or so to retry.
arjie 2 hours ago
I just tried out Qwen-3-480B-Coder on them yesterday and to be honest it's not good enough. It's very fast but has trouble on lots of tasks that Claude Code just solves. Perhaps part of it is that I'm using Charm's Crush instead of Claude Code.
ramshanker 3 hours ago
I am not able to guess, what is preventing Cerebras from replacing few of the cores in the Wafer-Scale package with HBM memory? It seems the only constraint with their WSE3 is memory capacity. Considering the size of NVDA chips, Only a small subset of wafer area should easily exceed the memory size of contemporary models.
xadhominemx 2 hours ago
I don’t think so. The reason why Cerebras is so fast for inference is that the KV cache sits in the SRAM.
aurareturn 1 hour ago
If you replace some cores with HBM on package, you basically get the traditional GPU + HBM model.
reliabilityguy 2 hours ago
DRAMs (core of the HBM memories) use different technology nodes than logic and SRAM. Also, stacking that many DRAMs on waver will complicate the packaging quite a bit I think.
redwood 3 hours ago
Would be interesting if IBM were to acquire. Seems like the big iron approach to GPUs
fcpguru 1 hour ago
does Guillaume Verdon from https://www.extropic.ai/ have thoughts on on cerebras?

(or other people that read the litepaper https://www.extropic.ai/future)

landl0rd 58 minutes ago
Beff has shipped zero chips and shitposted a lot. It is a cool idea but he has made tons of promises and it's starting to seem more like vaporware. Don't get me wrong, I hope it works, but doubt it will. Less podcasts more building please.

He reads to me like someone who markets better than he does things. I am disinclined to take him as an authority in this space.

How do you believe this is related to Cerebras?

allisdust 2 hours ago
If the idiots at AMZN have any brains left, they would acquire this and make it the center of their inference offerings. But considering how lackluster their performance and strategy as a company has been off late, I doubt that.

Disappointed quite a bit with this fund raise. They were expected to IPO this year and give us poor retail investors a chance at investing in them.

onlyrealcuzzo 2 hours ago
It would be hard to beat designing their own in-house offering that is 50% as good, at 20% the cost.

That's the problem.

Unless the majority of the value is on the other end of the curve, it's a tough sell.

reliabilityguy 2 hours ago
Amazon has their own chips for inference and training: Trainium1/2.
allisdust 2 hours ago
Nothing (may be except groq ?) comes even close to Cerebras in inference speed. I seriously don't get why these guys aren't more popular. The difference in using them as a inference provider vs anything else for any use case is like night and day. I hope more inference providers focus on speed. And this is where AMZN will benefit a lot since their entire cloud model is to have something people would anyway want and mark it up by 3x. God forbid if AVGO acquires this.
xadhominemx 2 hours ago
Cerebras hasn’t made any technical breakthroughs, they are just putting everything in SRAM. It’s a brute force approach to get very high inference throughput but comes at extremely high cost per token per second and is not useful for batched inferencing. Groq uses the same approach.

Memory hierarchy management across HBM/DDR/Flash is much more difficult but necessary to achieve practical inference economics.

twothreeone 1 hour ago
I don't think you realize the history of wafer-scale integration and what it means for the chip industry [1]. The approach was famously taken by Gene Amdahl's Trilogy Systems in the 80ies, but failed dramatically leading to (among others) deployment of "accelerator cards" in the form of.. the NVIDIA GeForce 256, the first GPU in 1999. It's not like NVIDIA hasn't been trying to integrate multiple dies in the same package, but doing that successfully has been a huge technological hurdle so far.

[1] https://ieeexplore.ieee.org/abstract/document/9623424

averne_ 1 hour ago
The main reason a wafer scale chip works there is because their cores are extremely tiny, and silicon area that gets fused off in the event of a defect is much lower than on NVIDIA chips, where a whole SM can get disabled. AFAIU this approach is not easily applicable to complex core designs.
xadhominemx 1 hour ago
I understand that topic well. They stitched top metal layers across the reticle - not that challenging, and the foundational IP is not their own.

Everyone else went the CoWoS direction, which enables heterogeneous integration and much more cost effective inference.

dgfitz 1 hour ago
Valued at 8.1 billion dollars.

https://www.cerebras.ai/pricing

$50/month for one person for code (daily token limit), or pay per token, or $1500/month for small teams, or an enterprise agreement (contact for pricing).

Seems high.

tibbydudeza 2 hours ago
Damm they are fast.
rvz 2 hours ago
Sooner or later, lots of competitors including Cerebras are going to take apart Nvidia's data center market share and it will cause many AI model firms to question the unnecessary spend and hoarding of GPUs.

OpenAI is still developing their own chips with Broadcom, but they are not operational yet. So for now, they're buying GPUs from Nvidia to build up their own revenue income (to later spend it on their own chips)

By 2030, eventually many companies will be looking for alternatives to Nvidia like Cerebras or Lightmatter for both training and inference use-cases.

For example [0] Meta just acquired a chip startup for this exact reason - "An alternative to training AI systems" and "to cut infrastructure costs linked to its spending on advanced AI tools.".

[0] https://www.reuters.com/business/meta-buy-chip-startup-rivos...

onlyrealcuzzo 2 hours ago
There's so much optimization to be made when developing the model and the hardware it runs on, most of the big players are likely to run a non-trivial percentage of their workloads on proprietary chips eventually.

If that's 5 years into the future, that looks bad for Nvidia, if it's >10 years in the future, that doesn't affect Nvidia's current stock price very much.