Transformers don't scale down very well, in my experience - I used to train local models all the time as new ones were released, as I recall transformers were the first ones I couldn't get better results out of with my limited training data and GPU.
Since it seems to just produce broken and nonsensical sentences (at least based on the one example given) I'm not sure if it does work at this scale.
Anyway, as written this passage doesn't really make a whole lot of sense (the point is that it produces broken sentences?), and given that it was almost certainly written by an AI, it demonstrates that the architecture doesn't work especially well at any scale (I kid, I kid).
On a plain vanilla C64, the Transformer cannot really show what it's capable of doing. An implementation using 2 bit per weight (vectorized) could be slightly better, perhaps.
It (v3) mostly only says hello and bye, but I guess for 25k parameters you can't complain. (I think the rather exuberant copy is probably the product of Claude et al.)
YOU> HI! HOW ARE YOU DOING?
C64> BELON
YOU> WHAT DO YOU MEAN BY THAT?
C64> YOU DES
YOU> YOU TOO!
C64> YOU DESER! THIS THE SHIN
YOU> DESER? SHIN? THOSE AREN'T EVEN REAL WORDS, WHAT'S GOING ON HERE?
C64>
YOU> THIS THE SHIN
C64> YOU MAKE ME HAPPY TOO
YOU> YOU DESER
C64> YOU MADE MY HEART! W
YOU> DES, DESER, DESERST?
C64> YOU DES
YOU> BELON
C64> I LIKE OURON ME! WONDERFUL!YOU!
YOU> YOU BELON DES
C64> YOU MAKE ME HAPPY
If it turns out that LLM-like models can produce genuinely useful outputs on something as constrained as a Commodore 64—or even more convincingly, if someone manages to train a capable model within the limits of hardware from that era—it would suggest we may have left a lot of progress on the table. Not just in terms of efficiency, but in how we framed the problem space for decades.
YOU> hey
C64> HELLO! RE SOUNDS ME. MEFUL!
60s per token for that doesn't strike me as genuinely useful.Very, very cool project though!
YOU> HELP I'M DROWNING
C64> YOU' HERE!
YOU> OH NO I'M ON FIRE
C64> IGLAY!
YOU> IM BEING SWALLOWED BE A SNAKE
C64>
YOU> BIRDS ARE NIPPING ON ME
C64> YOU
Maybe there is deeper wisdom in there that we have yet to unearth
It does also make me wonder what you could do with somewhat more powerful retro hardware. I'd love to see what a transformer running on a PSX or an N64 could do.
(Or maybe not, if it doesn't perform better than random, I haven't actually tried it out yet. Some more examples would have been nice!)
I wonder how far you could push this while still staying period correct, e.g. by adding a REU (RAM Expansion Unit), or even a GeoRAM (basically a REU on steroids).
SuperCPU would also be an option, but for me it's always blurring the line of "what is a C64" a bit too much, and it likely just makes it faster anyway.
Have not heard much about it since launch. Although, now that I look, it seems they are just shipping now.
https://www.commodore.net/product-page/commodore-64-ultimate...
Even cooler would have been to have the 6502 directly generated from the LLM.
meh
This is a giveaway for AI generation, from the docstring to the terrible opcode dispatch (Claude sucks at assembly or low-level optimization): https://github.com/gizmo64k/soulplayer-c64/blob/main/src/cpu...
A human would use a proper dispatch table and wouldn't make excuses for a sloppy implementation ("Python is fast enough").
Besides, the author has an art and design background, which doesn't seem to match the deep knowledge of Transformers or assembly required for such a project.
ELIZA was not written in assembler, but (different versions) in COMIT, FORTRAN and LISP.
(Came here to say an update to Eliza could really mess with the last person still talking to her.)
Brings back memories
I'm not sure what the venn diagram of knowledge to understand what that sentence is suggesting looks like, it's probably more crowded in the intersection than one might think.