https://arxiv.org/abs/2507.19457
https://observablehq.com/@tomlarkworthy/gepa
I guess GEPA is still preprint and before this survey but I recommend taking a look due to it's simplicity
The evals were coding observable notebook challenges, simple things like create a drop down, but to solve you need to know the observable standard library and some of the unique syntax like "viewof".
There is a table of the cases here https://observablehq.com/@tomlarkworthy/robocoop-eval#cell-2...
So it's important the prompt encodes enough of the programming model. The seed prompt did not, but the reflect function managed to figure it all out. At the top of the notebook is the final optimized prompt which has done a fair bit of research to figure out the programming model using web search.
`You are a prompt‑engineer AI. You will be improving the performance of a prompt by considering recent executions of that prompt against a variate of tasks that were asked by a user. You need to look for ways to improve the SCORE by considering recent executions using that prompt and doing web research on the domain.
Your tasks is to improve the CURRENT PROMPT. You will be given traces of several TASKS using the CURRENT PROMPT and then respond only with the text of the improved using the improve_prompt tool`; const research_msg = `Generate some ideas on how how this prompt might be improved, perhaps using web research\nCURRENT PROMPT:\n${prompt}\n${trace}`
source: https://observablehq.com/@tomlarkworthy/gepa#reflectFn
but I would need quite a few distinct tasks to do that and task setup is the laborious part (getting quicker now I optimized the notebook coding agent).
Very much appreciate the submission.
I Endure (Safety Adaptation) Self-evolving AI agents must maintain safety and stability during any modification.
II. Excel (Performance Preservation) Subject to the First law, self-evolving AI agents must preserve or enhance existing task performance.
So, if some change is proposed for the system, when does it commit? Some kind of regression testing is needed. The designs sketched out in Figure 3 suggest applying changes immediately, and relying on later feedback to correct degradation. That may not be enough to ensure sanity.
In a code sense, it's like making changes directly on trunk, and fixing them on trunk if something breaks. The usual procedure today is to work on a branch or branches and merge to trunk only when you have some accumulated successful experience that the branch is an improvement. Self-evolving AI agents may need a back-out procedure like that. Maybe even something like "blame".
Maybe self evolution will solve the training problem? Who knows.
The "weights" in our brains are constantly evolving.
It is categorically wrong that non static learning is a requirement of agi. The biggest problem we face is hallucinations and this isn’t caused by the fact that agi can’t learn on the fly.
I take your point about the non-necessity of dynamic learning for AGI.
I know this is a very simple and abstract way to explain it but i think you get my point.
Towards the simulated AI learning environment, theres this interview with Jensen Huang that i can recommend in which he touches on the topic and how nvidia is working on such https://www.youtube.com/watch?v=7ARBJQn6QkM
While im not a "expert" in this topic, i might have spend quite a portion of the past 10 years in my freetime to think about it and tinker, and ill stick with the point - we need a free self-trained system to actually call it AI, and while LLM's as GPT's nowadays are powerfull tools, for me those are not "Artificial Intelligence" (intelligence from my pov must include reasoning, understanding of its own action, pro-active acting, self-awareness). And even tho the LLM's we use can "answer" to certain questions as if they would have any of those, its just pre-trained answers and they dont bring any of those (we work on reasoning but lets be fair its not that great yet).
Just my two cents.
If we stick with the frames analogy, we know the frames of a movie will never give us a true living and moving person (it will never be real). When we watch a movie, we believe we are seeing a living breathing thing that is deliberate in its existence, but we know that is not true.
So what the hell would real AGI be? Given that you provide the input, it can only ever be a super human augmentation. That along with your own biological world state forming, you have an additional computed world state that you can merge with your biological world state.
We will be AGI, is the implication. Perfect weights will never be perfect because they are historical. We have to embrace being part of the AI to maximize its potential to be AGI.