I can already see how this evolves into something where you're basically managing a team of specialized agents rather than doing the actual coding, you set up some high-level goals, maybe break them down into chunks, and then different agents pick up different pieces and coordinate with each other, the human becomes more like a project manager making decisions when the agents get stuck or need direction, imho tools like omnara are just the first step toward that, right now it's one agent that needs your input occasionally, but eventually it'll probably be orchestrating multiple agents working in parallel, way better than sitting there watching progress bars for 10 minutes.
Can't wait til I'm coding on the beach (by managing a team of agents that notify me when they need me), but it might take a few more model releases before we get there lol
I actually think there's a chance it will shift away from that because it will shift the emphasis to fast feedback loops which means you are spending more of your time interacting with stakeholders, gathering feedback etc. Manual coding is more the sort of task you can do for hours on end without interruption ("at the beach").
Jesus Christ, I really need to speed up development of my product. If this shifts to more meetings at wageslave, I’m going to kill myself.
That must be a nice situ on the beach.
If their livelihood is solving difficult problems, and writing code is just the implementation detail the gotta deal with, then this isn’t gonna do much to threaten their livelihood. Like, I am not aware of any serious SWE (who actually designs complex systems and implements them) being genuinely worried about their livelihood after trying out AI agents. If anything, that makes them feel more excited about their work.
But if someone’s just purely codemonkeying trivial stuff for their livelihood, then yeah, they should feel threatened. I have a feeling that this isn’t what the grandparent comment user does for a living tho.
Maybe I'll just call it a day and chill with the fam
This is exactly what I have been working on for the past year and a half. A system for managing agents where you get to work at a higher abstraction level, explaining (literally with your voice) the concepts & providing feedback. All the agent-agent-human communication is on a shared markdown tree.
I haven't posted it anywhere yet, but your comment just describes the vision too well, I guess it's time to start sharing it :D see https://voicetree.io for a demo video. I have been using it everyday for engineering work, and it really is feeling like how you describe; my job is now more about organizing tasks, explaining them well, and providing critique, but just through talking to the computer. For example, when going through the git diffs of what the agents wrote, I will be speaking out loud any problems I notice, resulting in voice -> text -> markdown tree updates and these will send hook notifications to claude code so they automatically address feedback.
Luckily the other side to this project doesn't require any user behavioural changes. The idea is to convert chat histories into a tree format with the same core algorithm, and then send only the relevant sub-tree to the LLM, reducing input tokens and context bloat, thereby also improving accuracy. This would then also unlock almost infinite length LLM chats. I have been running this LLM context retrieval algo against a few benchmarks, GSM-infinite, nolima, and longbench-v2 benchmarks, the early results are very promising, ~60-90% reduced tokens and increased accuracy against SOTA, however only on a subset of the full benchmark datasets.
Love the idea of "coding" while walking/running outside. For me those outside activities help me clear my mind and think about tough problems or higher level stuff. The thought of directing agents to help persist and refine fleeting thoughts/ideas/insights, flesh out design/code, etc is intriguing
Same with project org, if you organize the project for LLM efficiency instead of human efficiency then you simplify some parts that the llm have issues with.
Wouldn't it be better if you asked for it and rather than having to manage workers it was just... Done
Truly — this is an excellent and accessible idea (bravo!), but if I can whittle away at a free and open source version, why should I ever consider paying for this?
1. an AI email drafter which used my product docs and email templates as context (eventually I plan to add "tools" where the AI can lookup info in our database)
2. a simple email client with a sidebar with customer contextual info (their billing plan, etc.) and a few simple buttons for the operator to take actions on their account
3. A few basic team collaboration features, notes, assigning tickets to operators, escalating tickets...
It took about 2 days to build the initial version, and about 2 weeks to iron out a number of annoying AI slop bugs in the beginning. But after a month of use it's now pretty stable, my customer support hire is using it and she's happy.
* LLM's are lousy at bugs
* Apps are a bit like making a baby. Fun in the moment, but a lifetime support commitment
* Supporting software isn't fun, even with an LLM. Burnout is common in open source.
* At the end of the day, it is still a lot of work, even guiding an LLM
* Anything hosted is a chore. Uptime, monitoring, patching, backing up, upgrading, security, legal, compliance, vulnerabilities
I think we'll see github littered with buggy, unsupported, vibe coded one-offs for every conceivable purpose. Now, though, you literally have no idea what you're looking at or if it is decent.
Claude made four different message passing implementations in my vibe coded app. I realized this once it was trying to modify the wrong one during a fix. In other words, claude was falling over trying to support what it made, and only a dev could bail it out. I am perfectly capable of coding this myself, but you have two choices at the moment--invest the labor, or get crap. But, then we come to "maybe I should just pay for this instead of burning my time and tokens."
One technique which appears to combat this is to do “red team / blue team Claude”
Red team Claude is hypercritical, and tries to find weaknesses in the code. Blue team Claude is your partner, who you collaborate with to setup PRs.
While this has definitely been helpful for me finding “issues” that blue team Claude will lie to you about — hallucinations are still a bit of an issue. I mostly put red team Claude into ultrathink + task mode to improve the veracity of its critiques.
I’ve been using Tailscale ssh to a raspberry pi.
With Termix on iOS.
I can do all the same stuff on my own. Termix is awesome (I’m not affiliated)
I do not how it is implemented, but if I can press ‘continue’ from my phone, someone else could enter other commands… Like export database…
Maybe that is more for a general engineer than a Hacker though - hacker to me implies some sort of joy in doing it yourself rather than optimizing.
Probably a bad habit.
Not very enlightening: just because Dropbox became big in one environment, doesn't mean the same questions aren't important in new spaces.
So every time someone comes around with a sentence like 'but if I can whittle away at a free and open source version, why should I ever consider paying for this?', the answer will be that Dropbox thread ;-)
When I just set Claude loose for long periods, I get incomprehensible, brittle code.
I don't do maintenance so maybe that's the difference but I have not had good results from big, unsupervised changes.
The reason isn’t that AI models have gotten better, although they clearly have, but that using subagents (1) keeps context clear of false starts and errors that otherwise poison the AI’s view of the project, and (2) by throwing in directives to run subagents that keep the main agent aligned (e.g. code review agents), it gets nudged back on course a surprisingly high percentage of the time.
First of all, there's a lot of collections of subagent definitions out there. I rolled my own, then later found others that worked better. I'm currently using this curated collection: https://github.com/VoltAgent/awesome-claude-code-subagents
CLAUDE.md has instructions to list `.agents/agents/**/*.md` to find the available agents, and knows to check the frontmatter yaml for a one-line description of what each does. These agents are really just (1) role definitions that prompts the LLM to bias its thinking in a particular way ("You are a senior Rust engineer with deep expertise in ..." -- this actually works really well), and (2) a bunch of rules and guidelines for that role, e.g. in the Rust case to use thiserror and strum crates to avoid boilerplate in Error enums, rules for how to satisfy the linter, etc. Basic project guidelines as they relate to Rust dev.
Secondly, my CLAUDE.md for the project has very specific instructions about how the top-level agent should operate, with callouts to specific procedure files to follow. These live in `.agent/action/**/*.md`. For example, I have a git-commit.md protocol definition file, and instructions in CLAUDE.md that "when the user prompts with 'commit' or 'git commit', load git-commit action and follow the directions contained within precisely." Within git-commit.md, there is a clear workflow specification in text or pseudocode. The [text] is my in-line comments to you and not in the original file:
""" You are tasked with committing the currently staged changes to the currently active branch of this git repository. You are not authorized to make any changes beyond what has already been staged for commit. You are to follow these procedures exactly.
1. Check that the output of `git diff --staged` is not empty. If it is empty, report to the user that there are no currently staged changes and await further instructions from the user.
2. Stash any unstaged changes, so that the worktree only contains the changes that are to be committed.
3. Run `./check.sh` [a bash script that runs the full CI test suite locally] and verify that no warnings or errors are generated with just the currently staged changes applied.
- If the check script doesn't pass, summarize the errors and ask the user if they wish to launch the rust-engineer agent to fix these issues. Then follow the directions given by the user.
4. Run `git diff --staged | cat` and summarize the changes in a git commit message written in the style of the Linux kernel mailing list [I find this to be much better than Claude's default commit message summaries].
5. Display the output of `git diff --staged --stat` and your suggested git commit message to the user and await feedback. For each response by the user, address any concerns brought up and then generate a new commit message, as needed or instructed, and explicitly ask again for further feedback or confirmation to continue.
6. Only when the user has explicitly given permission to proceed with the commit, without any accompanying actionable feedback, should you proceed to making the commit. Execute 'git commit` with the exact text for the commit message that the user approved.
7. Unstash the non-staged changes that were previously stashed in step 2.
8. Report completion to the user.
You are not authorized to deviate from these instructions in any way. """
This one doesn't employ subagents very much, and it is implicitly interactive, but it is smaller and easier to explain. It is, essentially, a call center script for the main agent to follow. In my experience, it does a very good job of following these instructions. This particular one addresses a pet peeve of mine: I hate the auto-commit anti-feature of basically all coding assistants. I'm old-school and want a nice, cleanly curated git history with comprehensible commits that take some refining to get right. It's not just OCD -- my workflow involves being able to git bisect effectively to find bugs, which requires a good git history.
...continued in part 2
I also have a task.md workflow that I'm actively iterating on, and is the one that I get it working autonomously for a half hour to an hour and am often surprised at finding very good results (but sometimes very terrible results) at the end of it. I'm not going to release this one because, frankly, I'm starting to realize there might be a product around this and I may move on that (although this is already a crowded space). But I don't mind outlining in broad strokes how it works (hand-summarized, very briefly):
""" You are a senior software engineer in a leadership role, directing junior engineers and research specialists (your subagents) to perform the task specified by the user.
1. If PLAN.md exists, read its contents and skip to step 4.
2. Without making any tool calls, consider the task as given and extrapolate the underlying intent of the user. [A bunch of rules and conditions related to this first part -- clarify the intent of the user without polluting the context window too much]
3. Call the software-architect agent with the reformulated user prompt, and with clear instructions to investigate how the request would be implemented on the current code base. The agent is to fill its context window with the portions of the codebase and developer documentation in this repo relevant to its task. It should then generate and report a plan of action. [Elided steps involving iterating on that plan of action with the user, and various subagents to call out to in order to make sure the plan is appropriately sequenced in terms of dependent parts, chunked into small development steps, etc. The plan of action is saved in PLAN.md in the root of the repository.]
4. While there are unfinished todos in the PLAN.md document, repeat the following steps:
a) Call rust-engineer to implement the next todo and/or verify completion of the todo.
b) Call each of the following agents with instructions to focus on the current changes in the workspace. If any actionable items are found in the generated report that are within the scope of the requested task, call rust-engineer to address these items and then repeat:
- rust-nit-checker [checks for things I find Claude gets consistently wrong in Rust code]
- test-completeness-checker [checks for missing edge cases or functionality not tested]
- code-smell-checker [a variant of the software architect agent that reports when things are generally sus]
- [... a handful of other custom agents; I'm constantly adjusting this list]
- dirty-file-checker [reports any test files or other files accidentally left and visible to git]
c) Repeat from step a until you run through the entire list of agents without any actionable, in-scope issues identified in any of the reports & rust-engineer still reports the task as fully implemented.
d) Run git-commit-auto agent [A variation of the earlier git commit script that is non-interactive.]
e) Mark the current todo as done in PLAN.md
5. If there are any unfinished todo in PLAN.md, return to step 4. Otherwise call software-architect agent with the original task description as approved by the user, and request it to assess whether the task is complete, and if not to generate a new PLAN.md document.
6. If a new PLAN.md document is generated, return to step 4. Otherwise, report completion to the user. """
That's my current task workflow, albeit with a number of items and agent definitions elided. I have lots of ideas for expanding it further, but I'm basically taking an iterative and incremental approach: every time Claude fumbles the ball in an embarrassing way (which does happen!), I add or tweak a rule to avoid that outcome. There are a couple of key points:
1) Using Rust is a superpower. With guidance to the agent about what crates to use, and with very strict linting tools and code checking subagents (e.g. no unsafe code blocks, no #[allow(...)] directives to override the linter, an entire subagent dedicated to finding and calling out string-based typing and error handling, etc.) this process produces good code that largely works and does what it was requested to do. You don't have to load the whole project in context to avoid pointer or use-after-free issues, and other things that cause vibe coded project to fail at a certain complexity. I don't see this working in a dynamic language, for example, even though LLMs are honestly not as good at Rust as they are in more prominent languages.
2) The key part of the task workflow is the long list of analysts to run against the changes, and the assumption that works well in practice that you can just keep iterating and fixing reported issues (with some of the elided secret sauce having to do with subagents to evaluate whether an issue is in scope and needs to be fixed or can be safely ignored, and keeping on eye out for deviations from the requested task). This eventual completeness assumption does work pretty well.
3) At some point the main agent's context window gets poisoned, or it reaches the full context window and compacts. Either way this kills any chance of simply continuing. In the first case (poisoning) it loses track of the task and ends up caught in some yak shaving rabbit hole. Usually it's obvious when you check in that this is going on, and I just nuke it and start over. In the latter case (full context window) the auto-compaction also pretty thoroughly destroys workflow but it usually results in the agent asking a variation on "I see you are in the middle of ... What do you want to do next?" before taking any bad action to the repo itself. Clearing the now poisoned context window with "/reset" and then providing just "task: continue" gets it back on track. I have a todo item to automate this, but the Claude Code API doesn't make it easy.
4) You have to be very explicit about what can and cannot be done by the main agent. It is trained and fine-tuned to be an interactive, helpful assistant. You are using it to delegate autonomous tasks. That requires explicit and repeated instructions. This is made somewhat easier by the fact that subagents are not given access to the user -- they simply run and generate reports for the calling agent. So I try to pack as much as I can in the subagents and make the main agent's role very well defined and clear. It does mean that you have to manage out of band communication between agents (e.g. the PLAN.md document) to conserve context tokens.
If you try this out, please let me know how it goes :)
Open-sourced my own duct-taped way* of doing this with free/open-source stuff a few weeks ago, recommend you give this kind of Claude on the go workflow a try during your next flight delay / train ride / etc.
Codex already works from your phone, I imagine Anthropic is well on its way to ship Claude Code across devices/apps too..
There are plenty of opportunities for building a good product even if the big platform copies you. For example in this case I can think of an easy differentiator: make it work with other agents and IDEs, not just Claude Code. Plenty of other ways to specialize by adding features not included in vanilla big company products.
That said, I don’t think that comparison makes sense anyway. The barrier of entry on AI apps (and by competitors cloning apps with AI) is enough these days that you can guarantee anything minimally viable will be cloned immediately.
Plus some of these features are quite literally the roadmap of OpenAI/Google/Anthropic. Competing with giants building the exact product they’re actively building rarely works. Anthropic isn’t “copying you” - they’re literally building this.
Sure Anthropic might have this on the roadmap and release next week. But apps like this can literally make hundreds of thousands of dollars in a few weeks — well worth the effort for a few months work I would say.
It’s a very, very different story.
I’m obviously not saying anyone should stop building apps or dismissing this or any other app from being potentially successful. It’s just a fundamentally different scenario than the early days of mobile, particularly for thin LLM wrappers
I can't seem to authenticate because it had a localhost URL in the address when I was first authenticating. Now my work laptop has blocked the site completely to sign in or anything at all due to the activity of rapidly trying to bypass a site with no certificate because of that pass-on or something. Bummer, this computer has some more juice vs mine!
Might be an issue with "Sign In With Apple" if no one else has reported.
This is the link I got
https://omnara.com/cli-auth?callback=http%3A//localhost%3A58...
Once you start running coding agents async you realize that prototyping becomes much cheaper and it is easier to test out product ideas and land on the right solution to a problem much quicker.
I've been coding like this for the past few months and can't imagine life without being able to invoke a coding agent from anywhere. I got so excited by it we started building https://www.terragonlabs.com so we could do this for any coding agent that crops up.
The way I use LLMs is, I enter a very specific query, and then I check the output, meticulously reviewing both the visual output and the code before I proceed. Component by component, piece by piece.
Otherwise, if you just "let it rip", I find that errors compound, and what you get isn't reliable, isn't what you intended, increases technical debt, or is just straight up dangerous.
So how can you do this from a smartphone? Don't you need your editor, and a way to run your code to review it carefully, before you can provide further input to Claude Code? Basically, how can you iterate from a phone?
I hacked a tiny web client around the claude CLI that I use daily:
https://github.com/sunpix/claude-code-web
- works on the go via PWA
- voice input (whiser) auto reading messages TTS
- drag and drop images with preview/remove
- hotkeys
I've been looking for this for some time now. This is amazing if it delivers.
This looks like exactly what I was envisioning so congrats on getting out there first! LMK if you want to add voice controls to this.
[0]: https://github.com/robdmac/talkito
[1]: https://talkito.com
I can’t get through onboarding. The main omnara command just exits complaining of a missing session id and doing “serve” asks you to set up sessions on a page that doesn’t exist?
Nit: why is pipx required if you’re using uv?
Yet these fancy terminal applications have become really trendy. I've been using Crush lately and I quite like it, but it's annoying to me that I can't copy and pasted from the terminal when it's running, and scrolling the buffer doesn't work as expected either, it somehow scrolls some kind of internal "window" instead. Again, making copy & pasting annoyingly difficult.
Is anyone making a good agent system that is just "> Output", "$ input" without trying to get crazy with ANSI escape sequences? Some color is nice, but I think that should be the end of it.
--output-format <format> Output format (only works with --print): "text"
(default), "json" (single result), or
"stream-json" (realtime streaming) (choices:
"text", "json", "stream-json")
--input-format <format> Input format (only works with --print): "text"
(default), or "stream-json" (realtime streaming
input) (choices: "text", "stream-json")
Anyway, that's how software should be written in an ideal world. A protocol to communicate with weakly-connected user interfaces (plural!).For Claude specifically, how exactly does the --ide flag work? That might be useful too.
Main question I have since your backend is open source, is there a way to self host and point the mobile app at our own servers?
But the more I worked with Claude, I felt like *I am the bottleneck* and not the waiting times. Also waiting for more than (really max) 5 minutes is for my features just not happening.
I think remote claude code is nice if you start a completely new app with loads of features that will take a long time OR for checking pull requests (the remote execution is more important here)
Start an agent, receive a call when a response from the user is needed, provide instruction, repeat.
Use case would be to continue the work hands-free while biking or driving.
* All of these are trivially installable via `npm install -g ...`
Basically, tunnelling to my mac so I can run my local mistral workflow/git/project builds yet with a gui like yours.
I've been having a lot of success with Google's Jules (https://jules.google.com/) which has the added benefit of running the agent on their VMs and being able to execute scripts (such as unit tests, linting, playwright, etc). The website works great on mobile and has notification support.
With the Google AI Pro subscription you get 100 tasks a day(!) included, it's a fantastic deal.
Currently trying it and the output from claude code output doesn't appear on my phone though? Sometimes it outputs nothing, sometimes it outputs what appears to be a bunch of xml tags for tool calls I am assuming are meant to be parsed. But the notifications are working well which is nice.
(* though I have some security concerns about this as juicy target vs just rolling my own)
Just a random remark, what's annoying and a pain point in my workflow are definitely proper development environments for agents . Not just runtimes but also managing secrets etc. Maybe an avenue to explore and use in marketing copy.
For now I'll just stick with a VNC solution for my macbook.
This would be a killer product in this setting as copilot is quite “chatty”
But yes this is a good point, it's a big reason we open sourced our backend. We've thought about doing client side encryption before sending messages to our servers, but that probably won't be implemented in the near-term.
https://github.com/sst/opencode/issues/176
I recall watching a stream where the authors imagined instructing the agent to do a piece of work and then getting notified on your phone when it is done or being able to ask it to iterate on your phone.
Is anyone working on collaborative Claude Code-ing with coworkers in Slack/Discord?
...but as I will now say at the end of every claude-centric post I make until a CSR gets back to me : I'm now approaching a week of zero CSR responses to a very valid question about a $200.00usd/mo account -- so I hope Omnara eventually matures to the point of supporting many different AI provider options; even if claude-code is the soup du jour.
Having fantastic tooling and effort around a company that is disinterested in its' userbase is a lot like the mental anguish I feel when I consider all of the tools and systems that were at one point reliant on now-gone Google services. What a waste. I'm sure the people involved learned plenty and that it was a personal growth experience for them, but boy do I hate seeing good code thrown away so routinely.
I have a bit of a feeling like that around claude-code/cursor-specific things right now. It reminds me of the work I put into Google Wave a hundred years ago.
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
Ask questions out of curiosity. Don't cross-examine.