Fwiw nothing beats ‘implement the game logic in full (huge amounts of work) and with pruning on some heuristics look 50 moves ahead’. This is how chess engines work and how all good turn based game ai works.
I’ve tried throwing masses of game state data at latest models in pytorch. Unusable. It Makes really dumb moves. In fact one big issue is that it often suggests invalid moves and the best way to avoid this is to implement the board game logic in full to validate it. At which point, why don’t i just do the above scan ahead X moves since i have to do the hard parts of manually building the world model anyway?
One area where current ai is helping is on the heuristics themselves for evaluating best moves when scanning ahead. You can input various game states and whether the player won the game or not in the end to train the values of the heuristics. You still need to implement the world model and look ahead to use those heuristics though! When you hear of neural networks being used for go or chess this is where they are used. You still need to build the world model and brute force scan ahead.
One path i do want to try more: In theory coding assistants should be able to read rulebooks and dynamically generate code to represent those rules. If you can do that part the rest should be easy. Ie. it could be possible to throw rulebooks at ai and it play the game. It would generate a world model from the rulebook via coding assistants and scan ahead more moves than humanly possible using that world model, evaluating to some heuristics that would need to be trained through trial and error.
Of course coding assistants aren’t at a point where you can throw rulebooks at them to generate an internal representation of game states. I should know. I just spent weeks building the game model even with a coding assistant.
In Go, for instance, it does not help much to look 50 moves ahead. The complexity is way too high for this to be feasible, and determining who's ahead is far from trivial. It's in these situations where modern AI (reinforcement learning, deep neural networks) helps tremendously.
Also note that nobody said that using AI is easy.
The big fundamental blocker to a generic ‘can play any game’ ai is the manual implementation of the world model. If you read the alphago paper you’ll see ‘we started with nothing but an implementation of the game rules’. That’s the part we’re missing. It’s done by humans.
> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.
In any case, for Go - with a mild amount of expert knowledge - this limitation is most likely quite irrelevant unless in very rare endgame situations, or special superko setups, where a lack of moves or solutions push some probability to moves that look like wishful thinking.
I think this is not a significant limitation of the work (not that any parent claimed otherwise). MuZero is acting in an environment with prescribed actions, it’s just “planning with a learned model” and without access to the simulation environment.
—-
What I am less convinced by was the claim that MuZero reaches higher performance than previous AlphaZero variants. What is the comparison based on? Iso-flops, Iso-search depth, iso self play games, iso wallclock time? What would make sense here?
Each AlphaGo paper was trained on some sort of embarrassingly parallel compute cluster, but all included the punchlines for general audiences that “in just 30 hours” some performance level was reached.
After a few moves they get hopelessly lost and just start wandering back and forth in a loop. Even when I prompt them explicitly to serialize a state representation of the maze after each step, and even if I prune the old context so they don't get tripped up on old state representations, they still get flustered and corrupt the state or lose track of things eventually.
They get the concept: if I explain the challenge and ask to write a program to solve such a maze step-by-step like that, they can do that successfully first-try! But maintaining it internally, they still seem to struggle.
* https://www.sciencedirect.com/science/article/pii/S009286742...
Presuming these are 'typical' mazes (like you find in a garden or local corn field in late fall), why not have the bot run the known-correct solving algorithm (or its mirror)?
Similarly if you ask to write a Sudoku solver, they have no problem. And if you ask an online model to solve a sudoku, it'll write a sudoku solver in the background and use that to solve it. But (at least the last time I tried, a year ago), if you ask to solve step-by-step using pure reasoning without writing a program, they start spewing out all kinds of nonsense (but humorously cheat: they'll still spit out the correct answer at the end).
I make Claude do that on every project. I call them Notes for Future Claude and have it write notes for itself because of how quickly context accuracy erodes. It tends to write rather amusing notes to itself in my experience.
But yeah, that's one of the things I tried. "Your turn is over. Please summarize everything you have learned about the maze so someone else can pick up where you left off". It did okay, but it often included superfluous information, it sometimes forgot to include current orientation (the maze action options were "move forward", "turn right", "turn left", so knowing the current orientation was important), and it always forgot to include instructions on how to interpret the state: in particular, which absolute direction corresponded to an increase or decrease of which grid index.
I even tried to coax it into defining a formal state representation and "instructions for an LLM to use it" up-front, to see if it would remember to include the direction/index correspondence, but it never did. It was amusing actually; it was apparent it was just doing whatever I told it and not thinking for itself. Something like
"Do you think you should include a map in the state representation? Would that be useful?"
"Yes, great idea! Here is a field for a map, and an algorithm to build it"
"Do you think a map would be too much information?"
"Yes, great consideration! I have removed the map field"
"No, I'm asking you. You're the one that's going to use this. Do you want a map or not?"
"It's up to you! I can implement it however you like!"
Just wondering would it help to ask it to write to someone else? Because model itself wasn't in its training set, this may be confusing.
[1]: https://entropicthoughts.com/getting-an-llm-to-play-text-adv...
You have a tiny, completely known, deterministic rule based 'world'. 'Reasoning' forwards for that is trivial.
Now try your approach for much more 'fuzzy', incomletely and ill defined environments, e.g. natural language production, and watch it go down in flames.
Different problems need different solutions. While current frontier llm's show surprising results in emergent shallow and linguistic reasoning, they are far away from deep abstract logical reasoning. A sota theorem prover otoh, can excel at that, but can still struggle to produce a coherent sentence.
I think most have always agreed that for certain tasks, an abstraction over which one can 'reason' is required. People differ in opinion over wether this faculty is to be 'crafted' in or wether it is possible to have it emerge implicitly and more robust from observations and interactions.
There's a lisp variant involved, and IIRC even a parser that reads the card text to auto-generate the rules code for most of the cards.
Handicapping traditional tree search produces really terrible results, imo. It's common for weak chess engines to be weak for stupid reasons (they just hang pieces, make random unnatural moves, miss blatant threats etc). Playing weak versions of Leela chess really "feels" like a (bad) human opponent by contrast.
Maybe the juice isn't worth the squeeze. It's definitely a ton of work to get right.
It sounds like you need RL. You could try setting up some reward functions with evaluators. I’m not sure what your architecture is, but something to try.
To have a true thinking, you need an internal adversary challenging thoughts and beliefs. To look 50 moves ahead, you need to simulate the adversary's moves... Duality
All strongest chess engine have at least one neural network to evaluate positions, including Stockfish, and this impact the searching window.
>how all good turn based game ai works
That's not really true, just think of Go.
Just read the parent comment.
Some new ideas in world models are beginning to work. Using Gaussian splatting as a world model has had some recent success.[1] It's a representation that's somewhat tolerant of areas where there's not enough information. Some of the systems that generate video from images work this way.
Obviously you can't actually use this feature as a true world model. There's just too much stuff you have to codify, and basing such a system on tokens is inherently limiting.
The basic principle sounds like what we're looking for, though: a strict automata or rule set that steers the model's output reliably and provably. Perhaps a similar kind of thing that operates on neurons, rather than tokens? Hmm.
In the examples I've seen, it's not something you can define an entire world model in, but you can sure constrain the immediate action space so the model does something sensible.
As a complete amateur who works in embedded: I imagine the restriction to a linear, ordered input stream is fundamentally limiting as well, even with the use of attention layers.
But I don't think we have a hint of a proposal for how to incorporate even the first part of that into our current systems.
Humans don't exactly have a full representation of board space in their head either. Notably, chess masters and amateurs can memorize completely random board positions as well as the other. I'd think neither could memorize 64 chess pieces in random positions on a board.
Notably this is also true for MuZero, though at that scale the heuristics become "dense" enough that an apparent causal understanding seems to emerge. But it is quite brittle: my favorite example involves the arcade game Breakout, where MuZero can attain superhuman performance on Level 1 and still be unable to do Level 2. Healthy human children are not like this - they figure out "the trick" in Level 1 and quickly generalize.
Evaluating AI's World Models (https://www.youtube.com/watch?v=hguIUmMsvA4)
Goes into details about several of the challenges discussed.
The challenge comes from the problem of finding a set of axioms that tell you how to make predictions about what changes a particular action will cause in the world. Naively, we might suppose that the laws of physics would be suitable axioms but this immediately turns out to be computationally intractable. So then we're stuck trying to find a set of heuristics, as alluded to in the article.
Without being a neuroscientist, I think it's likely that at least some of the axioms of our own world models (as human beings) are built into the structure of our brains, rather than being knowledge that we learn as we grow up. We know, for example, that our visual systems have a great deal of built-in assumptions about the way light works and how objects appear under different lighting conditions, a fact revealed to us by optical illusions such as the checker shadow illusion [2]. Building a complete set of heuristics such as this does not sound impossible, just somewhat obscure and unexplored as an engineering problem, and does not seem to be related whatsoever to currently popular means of building and training AI models.
Of course, it is being optimized. People are working on increasing the sample efficiency. A simple search on Google Scholar will confirm it.
https://garymarcus.substack.com/p/how-o3-and-grok-4-accident...
I could see this being the domain of fleets of robots, many different styles, compositions, materials, etc. Send ten robots in to survey a room - drones, crawlers, dogs, rollers, etc - they'll bang against things, knock things off shelves, illuminate corners, etc. The aggregate of their observations is the useful output, kinda like networked toddlers.
And yeah, unfortunately, sometimes this means you just need to send a swarm of robots to attack a city bus... or a bank... to "learn how things work." Or an internment camp. Don't get upset, guy, we're building a world model.
Anybody wanna give me VC money to work on this?
A Harry Potter book doesn't ruin an AI's world model by contaminating reality with fantasy. It gives it valuable data points on human culture and imagination and fiction tropes and commercially successful creative works. All of which is a part of the broader "reality" the AI is trying to grasp the shape of as it learns from the vast unstructured dataset.
The AI is trying to grasp nothing.
If you think that what your own brain doing isn't fancy statistics plugged into a prediction engine, I have some news for you.
While biological systems (or other physical agents) do need to model the world around them to be able to operate.
People didn’t give the later seasons enough credit even if they didn’t rise tot he same dramatic effect as the first.