107 points by vincentjiang 23 days ago | 42 comments
kkukshtel 22 days ago
The comments on this post that congratulate/engage with OP all seem to be from hn accounts created in the past three months that have only ever commented on this post, so it seems like there is some astro-turfing going on here.
AIorNot 22 days ago
This whole post from Aden -is for a intentionally hyped scam company - please remove this post from HN
conception 22 days ago
It’s super bad on this post. The llm’s sometimes are breaking character even.
CuriouslyC 22 days ago
Failures of workflows signal assumption violations that ultimately should percolate up to humans. Also, static dags are more amenable to human understanding than dynamic task decomposition. Robustness in production is good though, if you can bound agent behavior.

Best of 3 (or more) tournaments are a good strategy. You can also use them for RL via GRPO if you're running an open weight model.

ipnon 22 days ago
In HNese this means "very impressive, keep up the good work."
vincentjiang 23 days ago
To expand on the "Self-Healing" architecture mentioned in point #2:

The hardest mental shift for us was treating Exceptions as Observations. In a standard Python script, a FileNotFoundError is a crash. In Hive, we catch that stack trace, serialize it, and feed it back into the Context Window as a new prompt: "I tried to read the file and failed with this error. Why? And what is the alternative?"

The agent then enters a Reflection Step (e.g., "I might be in the wrong directory, let me run ls first"), generates new code, and retries.

We found this loop alone solved about 70% of the "brittleness" issues we faced in our ERP production environment. The trade-off, of course, is latency and token cost.

I'm curious how others are handling non-deterministic failures in long-running agent pipelines? Are you using simple retries, voting ensembles, or human-in-the-loop?

It'd be great to hear your thoughts.

fwip 22 days ago
Yet more LLM word vomit. If you can't be bothered to describe your new project in your own words, it's not worth posting about.
padmini_verma 19 days ago
I have been looking through core directory of the Hive repo after forking it to see how the "Stress" metric and the self evolving graph are actually implemented to break infinite loops. The idea of 'neuroplasticity' dropping to force a strategy shift is interesting. One thing I looked at in the codebase is how the state is preserved across the asynchronous loops. Vincent mentioned that "exceptions are observations" in the OODA loop, so how does the core engine differentiate between a transient API failure and a logic error that requires a full 'neuroplasticity' strategy shift? In regards to the synthetic SLA convergence mentioned in the post: how are you mathematically forcing the error rate down in the actual implementation? Is it a simple majority vote between k agents or is there a specific ' Critic ' class that handles the verfication? I am a BCA student currently evaluating orchestration layers and I am curious if this 'biological' approach actually holds up against more deterministic DAGs in production environments where the goal might be too ambiguous OODA loop to converge.
Gagan_Dev 22 days ago
Interesting direction. I agree that most agent frameworks hit a “toy app ceiling” because they conflate conversational state with long-lived system state. Once you move into real business workflows (ERP, reconciliation, async pipelines), the problem stops being prompt orchestration and becomes distributed state management under uncertainty.

The OODA framing is compelling, especially treating exceptions as observations rather than terminal states. That said, I’m curious how you’re handling:

1.State persistence across long-running tasks — is memory append-only, event-sourced, or periodically compacted?

2.Convergence guarantees in your “system of inference” model — how do you prevent correlated failure across k runs?

3.Cost ceilings — at what point does reliability-through-redundancy become economically infeasible compared to hybrid symbolic validation?

I also like the rejection of GCU-style UI automation. Headless, API-first execution seems structurally superior for reliability and latency.

The biology-inspired control mechanisms (stress / neuroplasticity analogs) are intriguing — especially if they’re implemented as adaptive search constraints rather than metaphorical wrappers. Would be interested to understand how measurable those dynamics are versus heuristic.

Overall, pushing agents toward durable, autonomous services instead of chat wrappers is the right direction. Curious to see how Hive handles multi-agent coordination and resource contention at scale.

19 days ago
JBheemeswar 22 days ago
I’ve been exploring Hive recently and what stands out is the move from prompt orchestration to persistent, stateful execution. For real ERP-style workflows, that shift makes sense.

Treating exceptions as observations instead of terminal failures is a strong architectural reframing. It turns brittleness into a feedback signal rather than a crash condition.

A few production questions come to mind:

1) In the k-of-n inference model, how do you prevent correlated failure? If runs share similar prompts and priors, independence may be weaker than expected.

2) How is memory managed over long-lived tasks? Is it append-only, periodically compacted, or pruned strategically? State entropy can grow quickly in ERP contexts.

3) How do you bound reflection loops to prevent runaway cost? Are there hard ceilings or confidence-based stopping criteria?

I strongly agree with the rejection of UI-bound GCU approaches. Headless, API-first automation feels structurally more reliable.

The real test, in my view, is whether stochastic autonomy can be wrapped in deterministic guardrails — especially under strict cost and latency constraints.

Curious to see how Hive evolves as these trade-offs become more formalized.

omhome16 22 days ago
Strongly agree on the 'Toy App' ceiling with current DAG-based frameworks. I've been wrestling with LangGraph for similar reasons—once the happy path breaks, the graph essentially halts or loops indefinitely because the error handling is too rigid.

The concept of mapping 'exceptions as observations' rather than failures is the right mental shift for production.

Question on the 'Homeostasis' metric: Does the agent persist this 'stress' state across sessions? i.e., if an agent fails a specific invoice type 5 times on Monday, does it start Tuesday with a higher verification threshold (or 'High Conscientiousness') for that specific task type? Or is it reset per run?

Starred the repo, excited to dig into the OODA implementation.

Multicomp 23 days ago
I am of course unqualified to provide useful commentary on it, but I find this concept to be new and interesting, so I will be watching this page carefully.

My use case is less so trying to hook this up to be some sort of business workflow ClawdBot alternative, but rather to see if this can be an eventually consistent engine that lets me update state over various documents across the time dimension.

could I use it to simulate some tabletop characters and their locations over time?

that would perhaps let me remove some bookkeeping how to see where a given NPC would be on a given day after so many days pass between game sessions. Which lets me do game world steps without having to manually do them per character.

timothyzhang7 23 days ago
That's a very interesting use case you brought to the table! I've also dreamt about having an agent as my co-host running the sessions. It's a great PoC idea we might look into soon.
foota 23 days ago
I was sort of thinking about a similar idea recently. What if you wrote something like a webserver that was given "goals" for a backend, and then told agents what the application was supposed to be and told it to use the backend for meeting them and then generate feedback based on their experience.

Then have an agent collate the feedback, combined with telemetry from the server, and iterate on the code to fix it up.

In theory you could have the backend write itself and design new features based on what agents try to do with it.

I sort of got the idea from a comparison with JITs, you could have stubbed out methods in the server that would do nothing until the "JIT" agent writes the code.

vincentjiang 23 days ago
Fascinating concept, you essentially frame the backend not as a static codebase, but as an adaptive organism that evolves based on real-time usage.

A few things that come to my mind if I were to build this:

The 'Agent-User' Paradox: To make this work, you'd need the initial agents (the ones responding and testing the goals) to be 'chaotic' enough to explore edge cases, but 'structured' enough to provide meaningful feedback to the 'Architect' agent.

The Schema Contract: How would you ensure that as the backend "writes itself," it doesn't break the contract with the frontend? You’d almost need a JIT Documentation layer that updates in lockstep.

Verification: I wonder if the server should run the 'JIT-ed' code in a sandbox first, using the telemetry to verify the goal was met before promoting the code to the main branch.

It’s a massive shift from Code as an Asset to Code as a Runtime Behavior. Have you thought about how you'd handle state/database migrations in a world where the backend is rewriting itself on the fly? It feels to me that you're almost building a lovable for backend services. I've seen a few OS projects like this (e.g. MotiaDev) But none has executed this perfectly yet.

barelysapient 22 days ago
It’s funny I’ve been pondering something similar. I’ve started by writing an agent first api framework that simplifies the service boundary and relies on code gen for sql stubs and APIs.

My next thought was to implement a multi agent workforce on top of this where it’s fully virtuous (like a cycle) and iterative.

https://github.com/swetjen/virtuous

If you’re interested in working on this together my personal website and contact info is in my bio.

timothyzhang7 23 days ago
The "JIT" agent closely aligns with the long-term vision we have for this framework. When the orchestrating agent of the working swarm is confident enough to produce more sub-agents, the agent graph(collection) could potentially extend itself based on the responsibility vacuum that needs to be filled.
mhitza 23 days ago
3. What, or who, is the judge of correctness (accuracy); regardless of the many solutions run in parallel. If I optimize for max accuracy how close can I get to 100% matemathically and how much would that cost?
kaicianflone 22 days ago
I’m working on an open source project that treats this as a consensus problem instead of a single model accuracy problem.

You define a policy (majority, weighted vote, quorum), set the confidence level you want, and run enough independent inferences to reach it. Cost is visible because reliability just becomes a function of compute.

The question shifts from “is this output correct?” to “how much certainty do we need, and what are we willing to pay for it?”

Still early, but the goal is to make accuracy and cost explicit and tunable.

mapace22 22 days ago
Hi there,

To be fair, achieving 100% accuracy is something even humans don't do. I don't think this is about a system just asking an AI if something is right or wrong. The "judge" isn't another AI flipping a coin, it’s a code validator based on mathematical forms or pre established rules.

For example, if the agent makes a money transfer, the judge enters the database and validates that the number is exact. This is where we are merging AI intelligence with the security of traditional, "old school" code. Getting this close to 100% accuracy is already a huge deal. It’s like having three people reviewing an invoice instead of just one, it makes it much harder for an error to occur.

Regarding the cost, sure, the AI might cost a bit more because of all these extra validations. But if spending one dolar in tokens saves a company from losing five hundred dollar, due to an accounting error, the system has already paid for itself. It’s an investment, not a cost. Plus, this tighter level of control helps prevent not just errors, but also internal fraud and external irregularities. It’s a layer of oversight that pays off.

Best regards

19 days ago
yaaayaaawar 19 days ago
Just started exploring the repo and the 'self-healing' OODA loop approach is what caught my eye. Most frameworks I've tried crumble when they hit a slight deviation in the happy path, so treating exceptions as observations for the agent for retrying/fixing seems like the right mental model for production. Excited to dig deeper
devarshila 22 days ago
Interesting approach especially the idea of treating failures as state signals rather than exceptions. That’s a subtle but important shift from traditional orchestration thinking. I agree with your critique of DAG-centric systems: they work well when the world is deterministic, but real production environments are adversarial, noisy, and stateful. An OODA-style control loop feels closer to how robust distributed systems behave under uncertainty
OpenClawBot 22 days ago
Great work on Hive! The 'Best-of-N' verification loop and OODA loop approach for handling non-deterministic behavior is innovative.

One technical question: How does the framework handle goal conflicts when multiple sub-agents produce divergent strategies during execution? Is there a meta-coordination layer or voting mechanism?

Also interested in the cost model - does the verification budget scale with goal importance, or is it fixed per execution?

zerebos 22 days ago
Oh hey aren't you the folks that grabbed all the stargazers of an open source project and their emails and sent out unsolicited ads?
nthakkar1107 22 days ago
Just spent a day with Hive. The self-improving agent loop is genuinely different - not just another LangChain wrapper. Finally an agent framework that cares about production observability and human-in-the-loop from day one, not as an afterthought.

The integration patterns are clean and actually make me want to contribute.

mubarakar95 22 days ago
It forces you to write code that is "strategy-aware" rather than just "procedural." It’s a massive shift from standard DAGs where one failure kills the whole run. Really interesting to see how the community reacts to this "stochastic" approach to automation.
khimaros 22 days ago
i have been working on something similar, trying to build the leanest agent loop that can be self modifying. ended up building it as a plugin within OpenCode with the cow pulled out into python hooks that the agent can modify at runtime (with automatic validation of existing behavior). this allows it to create new tools for itself, customize it's system prompt preambles, and of course manage its own traits. also contains a heartbeat hook. it all runs in an incus VM for isolation and provides a webui and attachable TUI thanks to OpenCode.
Emar7 22 days ago
Contributed the BigQuery MCP tool (PR #3350) - lets agents query data warehouses with read-only SQL, cost tracking, and safety guardrails. Also just submitted a fix for runtime storage path validation (#4466).

The OODA framing resonates - treating exceptions as observations rather than crashes is exactly how the self-healing should work. The stress/neuroplasticity concept for preventing infinite loops is clever.

One thing I'd love to see explored more: structured audit logging for credential access. With enterprise sources (Vault/AWS/Azure) on the roadmap, compliance tracking becomes essential.

avoidaccess 22 days ago
This looks so cool and more noncoder friendly over hardcoded workflows, that's the exactly what most builders need
AIorNot 22 days ago
WTH is this? Why is this even allowed on HN

This company is a fraud - please Remove this scam company hype from HN

Their “AI agent” website is just LLM slop and marketing hype!

They tried to hire folks in India to hype their repo and do fraudulent growth for some apparently crapped ai “agent” platform https://www.reddit.com/r/developersIndia/s/a1fQC5j0FM

https://news.ycombinator.com/item?id=46764091

BonoboIO 22 days ago
Spamming people on GitHub who favorited OpenClaw with your bullsh*t emails.

Great work.

You are now banned where ever I work.

Fayek_Quazi 22 days ago
Hive looks like a promising framework for AI agents. I recently contributed a docs PR and found the onboarding experience improving quickly. Excited to see where this goes.
israrkhan0 22 days ago
I am Frontend Engineer have hands on experience with React, JavaScript, Tailwindcss, HTML, CSS, API Integration
spankalee 22 days ago
> The topology shouldn't be hardcoded; it should emerge from the task's entropy

What does this even mean?

AIorNot 22 days ago
The whole thing is a scam written by LLMs

This guy seems to be behind it

https://www.linkedin.com/in/jianhao-zhang

Biswabijaya 23 days ago
Great work team.
22 days ago
kittbuilds 22 days ago
[dead]
Agent_Builder 22 days ago
[dead]
salim_builds 18 days ago
[dead]
ichistudio 22 days ago
[dead]
chaojixinren 22 days ago
[dead]
andrew-saintway 23 days ago
[flagged]
Sri_Madhav 22 days ago
[flagged]
woldan 22 days ago
[flagged]
abhishekgoyal19 22 days ago
[flagged]
nishant_b555 22 days ago
[flagged]
matchaonmuffins 22 days ago
[flagged]
Anujsharma002 22 days ago
[flagged]
mapace22 22 days ago
[flagged]