You could spend 80% doing the work and 20% writing about it, or 99% doing the work and 1% copy-pasting Claude's writeup about it into a blog.
There is nothing wrong with writing if you are into it, and yes you can probably do better than Claude, but I can related to engineers who just want to build.
Looking at the size, and its shared nature, it feels far more natural to compare with the L2 cache, which is also shared across the entire GPU and is in the same order of size (40MB on the listed A100).
In this case, google has already done it, and that will be true for high resourced accelerator companies like Google working with the most popular operations like attention.
As long as you use those operations, you are okay. But if you do something different, you need to be prepared to do all of this yourself.
Many sentences are 3x as long as it normally would be in subtle ways (to wit: "My flash attention is 35x slower than the fused standard at n=4096. Not a little worse. Catastrophically worse."), it really wears on attention. (pun intended) It brings literary voice to a technical blog post, and a very difficult process-oriented technical blog post. I have to reallocate my unfortunately-limited brain cells from "maintaining state of where we are in the process" to "is this cutesy fluff or important" and I've never had to do that in 37 years with technical blog posts.
The Markdown gets bad. Bolding is used for important phrases (like a human would), then, all of a sudden, after the "Inside a TPU chip" header its being used every other sentence, on anything that is a proper noun/would have a Wikipedia article. It got so weird that at some point I was like "a human definitely didn't let this through...they must be links?" and tried clicking them.
It's doubly bad at that point, because markdown tables start coming in hot and heavy too. So you're left with "It's pretty apparent the LLM did it from here, and I can't keep trying to keep the state of the process in my head while trying to figure out if the bolding is important, reflexive close tab