Also, shoutout SkyPilot! It's been a huge help for going multi-cloud with our training and inference jobs (getting GPUs is still a nightmare...)!
Does the agent have access to arxiv (a brief skim of the README didn't have an answer)? If not, it could be that the current approach of relying on the model's weights only is resulting in the perceived local optimum of hyperparameter tuning.
Anecdotally, we built a little MCP for arxiv to help with our internal research, noticed a significant boost in the diversity of methods (architecture or otherwise) Claude and friends were able to reference.
Re: OpenCogPrime:EconomicAttentionAllocation https://news.ycombinator.com/item?id=45518074 and something about eWASM (edit) https://news.ycombinator.com/item?id=47171887 .. from https://news.ycombinator.com/item?id=46825026 re: eWASM and costed opcodes for agent efficiency
Most people are optimizing for terrible benchmarks and then don't really understand what the model did anyone and just assume it did something good. It's the blind leading the blind basically, and a lot of people with an AI-psychosis or delusion.
Probably would cut the number of runs down by a significant number (as far as I can tell it's doing a grid search once it decides to mess with a knob or section of the architecture).
The next step are: - give the agent the whole deep learning literature research and do tree search over the various ideas that have been proposed in the past. - have some distributed notepad that any of these agents can read and improve upon.
It's really hard to imagine that they __won't__ exceed the human value for that efficiency parameter rather soon given that 1. there are plenty of scalar value functions that can represent research efficiency, of which a subset will result in robust training, and 2. that AI labs have a massive incentive to increase their research efficiency overall, along with billions of dollars and really good human researchers working on the problem.
All of science is "gather inputs, make hypothesis, test, analyse" on repeat.
There's plenty to critique in the particular guidance approach, but the overall method is the same.
People have been doing this for a year or more, Ralph loops etc.
I hate the weird strange Twitter world of hero-worship for folks that seems to arise just out of large followings.
Joe no-followers does this six months ago, nobody cares. Karpathy writes a really basic loop and it's now a kind of AI miracle prompting tons of grifters, copy-cats, weird hype.
I do wonder if LLMs have just made everyone seriously, seriously dumber all of a sudden. Most of the "Autoresearch" posts I see are completely rubbish, with AI optimizing for nonsense benchmarks and people failing to understand the graphs they are looking at. So yes, the AI made itself better at a useless benchmark while also making the code worse in 10 other ways you don't actually understand.