I watched the video, and it seemed like everything it was saying, you could have just pre-programmed for the very limited state space of Pong. It reminded me a little bit of the stock John Madden and Pat Sumerall sound bites that would play during 90s / early 2000s Madden games.
Could you apply the same idea to chess or Texas Hold 'Em? I feel like the additional complexity of those games could lead to more interesting commentary.
Overall, the simplicity of this project has helped me test the waters before diving into more complex territories. The underlying pipeline isn't bad - the approach of collecting events, periodically generating metrics from them, prioritizing them, generating commentary text, queuing those outputs, and then synthesizing speech should serve as the core for similar work.
It's also given me some intuition on how I can construct an "ecosystem" of data surrounding live action, to add a layer of realism to the narratives.
https://m.youtube.com/@JellesMarbleRuns
Greg Woods' commentary really brings this world of marble racing to life.
I’m curious to know more about how you retrieve from this ecosystem of data to add color. You mentioned nearest neighbor search, is that over game state? How is the data stored and queried?
The code starts by simulating 15 tournament years (like from 2010 to 2024), with each year containing 4 grand slam tournaments - held in a knockout format. There are 64 players in the pool, all starting with an initial ELO score.
These players compete in the tournaments, with outcomes predicted based on their ELO ratings. ELO is then updated after each match. We rank players solely based on their ELO. Once the simulation completes, it generates a wealth of data. For each game, details such as points scored, points allowed, fastest ball speed, number of aces, point-by-point results, and more are simulated.
We can then cache and use this information for a ton of color commentary. For example, we can identify the GOATs of the game, highlight players who are performing exceptionally well, pinpoint underdogs, find matches similar to the one currently being played, etc.
However, I am just scratching the surface. Imagine having a function that considers "age" alongside ELO. Then, you could simulate performance based on age as well - and show things like the younger generation overtaking older players, or veterans still competing despite being past their prime. With a fn like this, you could simulate matches that span the past 75-100 years, generating a ton of nice data to analyze.
Data itself is not fun - you need nice metrics too - for fun correlations! See https://en.wikipedia.org/wiki/Baseball_statistics. The metrics don’t have to be perfect, after all, humans aren’t perfect. The key is engagement.
To find similar games, I store and cache all historical matches in a KD-tree, then use a NN search to find similar games - that's quite fast!
Some commentary can also be dynamically generated at runtime - for example, locker-room whispers. It is important to provide GPT with a decent historical window to avoid generating contradictory info in such cases.
The additional complexity in something like hold 'em lends itself extremely well to LLM generated commentary.
I couldn’t get it under 250ms though (for rocket league), but the tech should be better now than 2024.
"Here we see Peter copying and pasting in some generic quick sort algorithm from.. somewhere. Stack Overflow? ChatGPT? Who knows. And he goes for the compile without writing any tests! Let's see if it compiles first time. And it's a noooooo! Bad luck, let's see how he gets out of this pickle. (I told you he should have written some tests.)"
I wonder if NotebookLM's podcast function could be used for this, to comment on code with the spirit of a Latin American soccer commentator. Because having it comment code is already pretty useful if you don't want to explain others what you have been doing. It can do that pretty well for you.
dolphin mistral output:
"In the digital ecosystem, where binary code intertwines with human cognition, there exists an important ritual known as the Coding Review. This intricate dance is not dissimilar to how our ancestors gathered around a communal fire, sharing stories and experiences in order to pass on wisdom and understanding of their world.
The coding review takes place in a carefully-crafted digital habitat - often referred to as a development team's workspace. Here, the code, akin to DNA that carries the blueprint for all life forms, is meticulously examined by a group of highly specialized creatures known as developers and quality assurance analysts."
Though tbh I found it still pretty annoying. Maybe just the tone of voice though, and it's clearly not actually connected to what's happening in the game.
I imagine the major sports game players are working on this.
https://github.com/pncnmnp/xpong/blob/main/main.py#L289:
"- **Shot Angles:** Derive each shot's angle from the (vx, vy) vector:\n"
" • Steep angles (>45°) become daring corner lobs or sharp cross-courts.\n"
" • Moderate angles (15°-45°) look like graceful arcs that test court coverage.\n"
" • Shallow angles (<15°) play out as direct, flat drives down the line.\n"
Didn't find where the balls motion is communicated to the LLM.It does need some pointless anecdotes about past statistics, history of the game, training regimes, new managers and so on!
Hah, my next startup is an AI-Assist Pick-Up Artist. But that's the "Lamborghini-desiring Crypto-Bro" package that's 49.95 USD/month, the entry level feature would encourage you to go to the gym and eat your vegetables.
AI voices talking to you... now the hallucinations are actually in your head!
Seriously though, the entire graphics display is much more hi res than the original, and it’s not trying to emulate the original resolution. So one slightly more serious way to answer the question is, all the graphics are higher resolution, it’s just that you notice it more when it comes to the ball.
If you're jokingly imitating filler from bad commentary I understand but I think I'd like more play by play and less color, but of course pong has a limited amount of inputs to work with for that commentary.
One thing that could very well work for the latency issue some commenters post is to just send the events and receive commentary outside of the rendering and playback so that it, within some max delay, can look more immediate and in sync.
Very fun idea. Hope to see it with more complex things with more inputs.
Commentator 2 (Marsha “Two Coats” Hernandez): Greg, I still remember the way you wept in aisle 7. But let’s talk about today’s masterpiece—Disney Princess Pink, the shade officially inspired by the collective inner glow of Aurora, Cinderella, and, dare I say, Ariel's clam-bikini energy.
Greg: Absolutely, Marsha. And look at that glorious semi-damp sheen—like a freshly glazed donut at sunrise. It’s got a dreamy undertone of "your niece’s birthday party at 10 a.m. with a bouncy castle and too much Capri Sun."
Marsha: Oh-ho, what’s this? Is that… yes, I think the lower left quadrant is beginning to matte. Ladies and gentlemen, we may be witnessing the first signs of Stage 3: The Settling of the Pigment.
Greg (choked up): My god… I haven’t seen a transition like this since Elsa’s Let It Go phase. Remember that? How she emotionally dried her entire personality over a solo in under three minutes? Iconic.
Marsha: Speaking of queens, this paint owes everything to Belle’s bedroom in the lost “Live Laugh Library” deleted scene. That’s the shade they were going to use until someone spilled tea on the concept art. Literally. It was Chip. That kid is a menace.
Greg: I’m sorry but—hold on—this is huge. That patch near the window just tightened. We are witnessing micro-shrinkage. It’s subtle, it’s refined, it’s got the attitude of Mulan at a dim sum buffet. She came hungry, and this paint came to DRY.
Marsha: Greg, if this drying pace keeps up, we’re on track for a Suburban First-Timer Finish Time. I haven’t seen Disney Pink behave like this since the infamous 2017 "Frozen Themed Daycare Hallway Incident." They had to repaint in Tiana Teal—the shame.
Greg: And oh! There it is! That final middle patch—she’s going matte, folks. This wall is becoming a canvas of completion, a poetic stillness in a chaotic world. I feel like I just watched Cinderella get her slipper and a Roth IRA.
Marsha (tearfully): This… is why I do this job. For moments like this. For the shimmerless silence. For the slow, glorious commitment to finality.
Greg: And so we leave you, dear viewers, staring into a flat, fully-dry future. The room has changed… and so have we.
According to Google, you’re only the second person in recorded human history to use these two words together.
> Marsha (tearfully): This… is why I do this job. For moments like this. For the shimmerless silence. For the slow, glorious commitment to finality.
> Greg: And so we leave you, dear viewers, staring into a flat, fully-dry future. The room has changed… and so have we.
I’m getting major Broomshakalaka vibes in the best possible way.