OpenCode was the first open source agent I used, and my main workhorse after experimenting briefly with Claude Code and realizing the potential of agentic coding. Due to that, and because it's a popular an open source alternative, I want to be able to recommend it and be enthusiastic about it. The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
More than that, it's an extremely large and complex TypeScript code base — probably larger and more complex than it needs to be — and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
On top of that, at least I personally find the TUI to be overbearing and a little bit buggy, and the agent to be so full of features that I don't really need — also mildly buggy — that it sort of becomes hard to use and remember how everything is supposed to work and interact.
> and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
That's (one of the reasons) why I'm favoring Codex over Claude Code.
Claude Code is an... Electron app (for a TUI? WTH?) and Codex is Rust. The difference is tangible: the former feels sluggish and does some odd redrawing when the terminal size changes, while the latter definitely feels more snappy to me (leaving aside that GPT's responses also seem more concise). At some point, I had both chewing concurrently on the same machine and same project, and Claude Code was using multiple GBs of RAM and 100% CPU whereas Codex was happy with 80 MB and 6%.
Performance _is_ a feature and I'm afraid the amounts of code AI produces without supervision lead to an amount of bloat we haven't seen before...
I think you’re confusing capital c Claude Code, the desktop Electron app, and lowercase c `claude`, the command line tool with an interactive TUI. They’re both TypeScript under the hood, but the latter is React + Ink rendered into the terminal.
The redraw glitches you’re referring to are actually signs of what I consider to be a pretty major feature, a reason to use `claude` instead of `codex` or `opencode`: `claude` doesn’t use the alternate screen, whereas the other two do. Meaning that it uses the standard screen buffer, meaning that your chat history is in the terminal (or multiplexer) scrollback. I much prefer that, and I totally get why they’ve put so much effort into getting it to work well.
In that context handling SIGWINCH has some issues and trickiness. Well worth the tradeoff, imo.
Codex is using its app server protocol to build a nice client/server separation that I enjoy on top of the predictable Rust performance.
You can run a codex instance on machine A and connect the TUI to it from machine B. The same open source core and protocol is shared between the Codex app, VS Code and Xcode.
I had a nasty slow claude code startup time at one point something like 8s, a clean install sorts it all out. Back up your mcp config and skills and you're good.
I think Go might be a better choice but not for that reason at all.
Go could implement something like this with no dependencies outside the standard library. It would make sense to take on a few, but a comparable Rust project would have at least several dozens.
Also, Go can deliver a single binary that works on every Linux distribution right out of the box. In Rust, its possible but you have to static compile with muslc and that is a far less well-trodden path with some significant differences to the glibc that most Rust libraries have been tested with.
My personal opinion is that I like Rust much more than Go, but I can’t deny that Rust is a big, and more dauntingly to newcomers, pretty unopinionated language compared to Go.
There are more syntax features, more and more complex semantics, and while rustc and clippy do a great job of explaining like 90% of errors, the remaining 10% suuuuuck.
There’s also some choices imposed by the build system (like cargo allowing multiple versions of the same dep in a workspace) and by the macro system (axum has some unintuitive extractor ordering needs that you won’t find unless you know to look for them), and those things and the hurdles they present become intuitive after a time but just while getting started? Oof
Frankly I don't think one even needs to learn it, if you know a bunch of other languages and the codebase is good. I was able to just make a useful change to an open source project by just doing it, without having written any lines of Go before. Granted the MR needed some revisions.
Rust is my favorite, though. There are values beyond ease of contribution. I can't replicate the experience with a Rust project anymore, but I suspect it would have been tougher.
agents don't really care and they're doing anywhere between 90-100% of the work on CC. if anything, rust is better as it has more built-in verification out of the box.
Rust is accessible to everyone now that Claude Code and Opus can emit it at a high proficiency level.
Rust is designed so the error handling is ergonomic and fits into the flow of the language and the type system. Rust code will be lower defect rate by default.
Plus it's faster and doesn't have a GC.
You can use Rust now even if you don't know the language. It's the best way to start learning Rust.
The learning curve is not as bad as people say. It's really gentle.
Java (incl. Scala, Closure, Groovy, Jython, etc.) is better suited to running as a server. Let agents write clean readable code and leave performance concerns to the JIT compiler. If you really want you can let agents rewrite components at runtime without losing context.
Erlang would offer similar benefits, because what we're doing with these things is more message passing than processing.
Rust is what I'd want agents writing for edge devices, things I don't want to have to monitor. Granted, our devices are edge devices to Anthropic, but they're more tightly coupled to their services.
I run many instances of Claude Code simultaneously and have not experienced what you are seeing. It sounds like you have a bias of Rust over Typescript.
No, they are describing a typical experience with the two apps. Just open both apps, run a few queries, and take a look at the difference in resource management yourself. It sounds like you have a bias of Claude Code over Codex.
Uh, it sounds like you're having trouble understanding that people in this thread are talking about two wildly different "claude code" applications. Those who are claiming the resources issues don't apply to them are referring to the cli application, ie: `claude` and those are saying things like "Just open both apps..." are surely referring to their GUI versions.
No, I've never used the GUI version. I literally just had to close and reopen the terminal running the Claude Code CLI on my Mac yesterday because it was taking too many resources. It generally happens when I ask Claude to use multiple sub agents. It's an obvious memory leak.
I am more concerned about their, umm, gallant approach to security. Not only that OpenCode is permissive by default in what it is allowed to do, but that it apparently tries to pull its config from the web (provider-based URL) by default [1]. There is also this open GitHub issue [2], which I find quite concerning (worst case, it's an RCE vulnerability).
It also sends all of your prompts to Grok's free tier by default, and the free tier trains on your submitted information, X AI can do whatever they want with that, including building ad profiles, etc.
You need to set an explicit "small model" in OpenCode to disable that.
This. I work on projects that warrant a self hosted model to ensure nothing is leaked to the cloud. Imagine my surprise when I discovered that even though the only configured model is local, all my prompts are sent to the cloud to... generate a session title. Fortunately caught during testing phase.
If you're using software someone else wrote, you'd have to repeat this testing phase any time an update is installed, right?
(I do mean this as a general principle, but also it was pointed out elsewhere in the thread that this is a particularly "high velocity" project as far as unexpected changes go.)
I’m curious if there’s a reason you’re not just coding in a container without access to the internet, or some similar setup? If I was worried about things in my dev chain accessing any cloud service, I’d be worried about IDE plugins, libraries included in imports, etc. and probably not want internet access at all.
The small_model option configures a separate model for lightweight tasks like title generation. By default, OpenCode tries to use a cheaper model if one is available from your provider, otherwise it falls back to your main model.
I would expect that if you set a local model it would just use the same model. Or if for example you set GPT as main model, it would use something else from OpenAI. I see no mentions of Grok as default
i ran it through mitmproxy, i am using pinned version 1.2.20, 6 march 2026, set up with local chat completions.
on that version, it does not fall back to the main model. it silently calls opencode zen and uses gpt-5-nano, which is listed as having 30 day retention plus openai policy, which is plain text human review by openai AND 3rd party contractors.
They're talking about before it's configured by the user. It defaults to 'free' models so that the user can ask a question immediately on startup. Once you configure a provider, the default models aren't used.
I liked the apple II, and the TRS 80 as I rather like basic. And then I didn’t hate DOS, and then I actively hated the graphical shell of Windows 3, but could not afford a Macintosh -so suffered through it where I had to, but mainly used DOS. Then I discovered UNIX, and did almost all of my work on a timeshare - in the early 90s!
Then Windows 95 came out and I actively hated it, but did think it was amazingly pretty - somehow this was the impetus for me to get a pc again, which I put Windows NT on. Which was profitable for freelance gigs in college. Soon after that, I dual booted it to Linux and spent most of my time in Slackware.
After that, I graduated and had enough money to buy a second rig, which I installed OS/2 warp on - which was good for side gigs. And I really liked. A lot. But my day job required that I have a Windows NT box to shell into the Solaris servers as we ran. Then I got a better class of employer and the next several let me run a Linux box to connect to our solaris (or Aix) servers.
Next my girlfriend at the time got a PowerBook G4 and installed OS X on it. It was obviously amazing. Windows XP came out, and it was once again so much worse than Windows NT - and crashed so much more - which was odd as it was based on Windows NT. (yes 98 was before this but it was really bad). Anyhow, right about here the Linux box I was running at home, died. And it was obvious that I was not going to buy an XP box, so I bought my first Mac.
And it’s been the same for the last 25 years - every time I look at a Windows box it’s horrible. I pretty much always have a Linux box headless somewhere in the house, and one rented in the cloud, and a Mac for interacting with the world.
And like the parent I actively dislike windows. And that’s interesting because I’ve liked most other operating systems I’ve used in my life, including MS-DOS. Modern windows is uniquely bad.
I use windows and absolutely hate the mac UI. Having the current window title bar always at the top of the screen doesn't make any sense when you have a very big monitor. It only made sense with the tiny monitors available when the mac UI was originally created.
No, it is still configurable. You can specify in your opencode.json config that it should be able to run everything. I think they just argued that it shouldn't be the default. Which I agree with.
No, the problem is that when logging in, the provider's website can provide an authentication shell command that OpenCode will send to the shell sight unseen, even if it is "rm -rf /home". This "feature" is completely unnecessary for the agent to function as an agent, or even for authentication. It's not about it being the default, it's about it being there at all and being designed that way.
> The problem for me is that the development practices of the people that are working on it are suboptimal at best; they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things (or even build a proper list of changes for each release), and they add, remove, refine, change, fix, and break features constantly at that accelerated pace.
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
Openclaw has 20k commits, almost 700k lines of code, and it is only four months old. I feel confident that that sort of code base would have a no coherent architecture at all, and also that no human has a good mental model of how the various subsystems interact.
I’m sure we’ll all learn a lot from these early days of agentic coding.
> I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
> So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I think what we're seeing is a phase transition. In the early days of any paradigm shift, velocity trumps stability because the market rewards first movers.
But as agents move from prototypes to production, the calculus changes. Production systems need:
- Memory continuity across sessions
- Predictable behavior across updates
- Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
This makes sense. Development velocity is bought by having a short product life with few users. As you gain users that depend on your product, velocity must drop by definition.
The reason for this is that product development involves making decisions which can later be classified as good or bad decisions.
The good decisions must remain stable, while the bad decisions must remain open to change and therefore remain unstable.
The AI doesn't know anything about the user experience, which means it will inevitably change the good decisions as well.
20 for me, and let's not exaggerate. We've given lip service to it this entire time. Hell look at any of the corps we're talking about (including where I work) and they're demanding "velocity without lowering the quality bar", but it's a lie: they don't care about the quality bar in the slightest.
One of my main lessons after a decent long while in security, is that most orgs care about security, *as long as it doesn't get in the way of other priorities* like shipping new features. So when we get something like Agentic LLM tooling where everything moves super fast, security is inevitably going to suffer.
I’m learning that projects, developed with the help of agents, even when developers claim that they review and steer everything, ultimately are not fully understood or owned by the developers, and very soon turns into a thousand reinvented wheels strapped together by tape.
> very soon turns into a thousand reinvented wheels strapped together by tape.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
Claude Code breaks production features and doesn't say anything about it. The product has just shifted gears with little to no ceremony.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
We're still in the very early days of generative AI, and people and markets are already prioritizing quality over quantity. Quantity is irrelevant when it comes value.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
and once you've got your wish: ugly code without tests or a way to comprehend it, but cheap!
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
What you wrote aligns with my experience so far.
It's fast and easy to get something working, but in a number of cases it (Opus) just gets stuck 'spinning' and no number of prompts is going to fix that.
Moreover - when creating things from scratch it tends to use average/insecure/ inefficient approaches that later take a lot of time to fix.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
I'm of the opinion that the diligence of experts is part of what makes code valuable assets, and that the market does an alright job of eventually differentiating between reliable products/brands and operations that are just winging it with AI[1].
I would think that the better the code is designed and factored and refactored, the easier it is to maintain and evolve, detect and remove bugs and security vulnerabilties from it. The ease of maintenance helps both AI and humans.
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
It's understandable and even desirable that a new piece of code rapidly evolves as they iterate and fix bugs. I'd only be concerned if they keep this pattern for too long. In the early phases, I like keeping up with all the cutting edge developments. Projects where dev get afraid to ship because of breaking things end up becoming bloated with unnecessary backward compatibility.
> Due to that, and because it's a popular an open source alternative, I want to be able to recommend it and be enthusiastic about it. The problem for me is that the development practices of the people that are working on it are suboptimal at best;
This is my experience with most AI tools that I spend more than a few weeks with. It's happening so often it's making me question my own judgement: "if everything smells of shit, check your own shoes." I left professional software engineering a couple of years ago, and I don't know how much of this is also just me losing touch with the profession, or being an old man moaning about how we used to do it better.
It reminds me of social media: there was a time where social media platforms were defined by their features, Vine was short video, snapchat was disappearing pictures, twitter was short status posts etc. but now they're all bloated messes that try do everything.
The same looks to be happening with AI and agent software. They start off as defined by one features, and then become messes trying to implement the latest AI approach (skills, or tools, or functions, or RAG, or AGENTS.md, or claws etc. etc.)
I recently listened to this episode from the Claude Code creator (here is the video version: https://www.youtube.com/watch?v=PQU9o_5rHC4) and it sounded like their development process was somewhat similar - he said something like their entire codebase has 100% churn every 6 months. But I would assume they have a more professional software delivery process.
I would (incorrectly) assume that a product like this would be heavily tested via AI - why not? AI should be writing all the code, so why would the humans not invest in and require extreme levels of testing since AI is really good at that?
I feel like our industry goes through these phases where there's an obvious thought leader that everyone's copying because they are revolutionary.
Like Rails/DHH was one phase, Git/GitHub another.
And right now it's kinda Claude Code. But they're so obviously really bad at development that it feels like a MLM scam.
I'm just describing the feeling I'm getting, perhaps badly. I use Claude, I recommended Claude for the company I worked at. But by god they're bloody awful at development.
It feels like the point where someone else steps in with a rock solid, dependable, competitor and then everyone forgets Claude Code ever existed.
I use Claude Code because Anthropic requires me to in order to get the generous subscription tokens. But better tools exist. If I was allowed to use Cursor with my Claude sub I would in a heartbeat.
I mean, I'm slowly trying to learn lightweight formal methods (i.e. what stuff like Alloy or Quint do), behavior driven development, more advanced testing systems for UIs, red-green TDD, etc, which I never bothered to learn as much before, precisely because they can handle the boilerplate aspects of these things, so I can focus on specifying the core features or properties I need for the system, or thinking through the behavior, information flow, and architecture of the system, and it can translate that into machine-verifiable stuff, so that my code is more reliable! I'm very early on that path, though. It's hard!
I heard from somebody inside Anthropic that it's really two companies, one which are using AI for everything and the other which spends all their time putting out fires.
OpenCode's creator acknowledged that the ease of shipping has let them ship prototype features that probably weren't worth shipping and that they need to invest more time cleaning up and fixing things.
Uff. This is exactly what Casey Muratori and his friend was talking about in of their more recent podcast. Features that would never get implemented because of time constraints now do thanks to LLMs and now they have a huge codebase to maintain
I'm still trying to figure out how "open" it really is; There are reports that it phones home a lot[0], and there is even a fork that claims to remove this behavior[1]:
I think there’s a conflict between “open” as in “open source”, and “open” as in “open about the practice” paired with the fact we usually don’t review software’s source scrupulously enough to spot unwanted behaviors”.
so how is telemetry not open? If you don't like telemetry for dogmatic reasons then don't use it. Find the alternative magical product whose dev team is able to improve the software blindfolded
> Find the alternative magical product whose dev team is able to improve the software blindfolded
The choice isn't "telemetry or you're blindfolded", the other options include actually interacting with your userbase. Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
For example, I was recruited and paid $500 to spend an hour on a panel discussing what developers want out of platforms like DigitalOcean, what we don't like, where our pain points are. I put the dollar amount there only to emphasize how valuable such information is from one user. You don't get that kind of information from telemetry.
> Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
We all know it’s extremely, extremely hard to interact with your userbase.
> For example I was paid $500 an hour
+the time to find volunteers doubled that, so for $1000 an hour x 10 user interviews, a free software can have feedback from 0.001% of their users. I dislike telemetry, but it’s a lie to say it’s optional.
—a company with no telemetry on neither of our downloadable or cloud product.
> We all know it’s extremely, extremely hard to interact with your userbase.
On the contrary, your users will tell you what you need to know, you just have to pay attention.
> I dislike telemetry, but it’s a lie to say it’s optional.
The lie is believing it’s necessary. Software was successful before telemetry was a thing, and tools without telemetry continue to be successful. Plenty of independent developers ship zero telemetry in their products and continue to be successful.
Probably all describe problems stem from the developers using agent coding; including using TypeScript, since these tools are usually more familiar with Js/Js adjacent web development languages.
Perhaps the use of coding agents may have encouraged this behavior, but it is perfectly possible to do the opposite with agents as well — for instance, to use agents to make it easier to set up and maintain a good testing scaffold for TUI stuff, a comprehensive test suite top to bottom, in a way maintainers may not have had the time/energy/interest to do before, or to rewrite in a faster and more resource efficient language that you may find more verbose, be less familiar with, or find annoying to write — and nothing is forcing them to release as often as they are, instead of just having a high commit velocity. I've personally found AIs to be just as good at Go or Rust as TypeScript, perhaps better, as well, so I don't think there was anything forcing them to go with TypeScript. I think they're just somewhat irresponsible devs.
> I think they're just somewhat irresponsible devs.
Before coding agents it took quite a lot more experience before most people could develop and ship a successful product. The average years of experience of both core team and contributors was higher and this reflected in product and architecture choices that really have an impact, especially on non-functional requirements.
They could have had better design and architecture in this project if they had asked the AI for more help with it, but they did not even know what to ask or how to validate the responses.
Of course, lots of devs with more years of experience would do just as badly or worse. What we are seeing here though is a filter removed that means a lot of projects now are the first real product everyone the team has ever developed.
You must never rely on AI itself for authorization… don’t let it run on an environment where it can do that. I can’t believe this needs to be said but everyone seems to have lost their mind and decided to give all their permissions away to a non deterministic thing that when prompted correctly will send it all out to whoever asks it nicely.
The value of having (and executing) a coherent product vision is extremely undervalued in FOSS, and IMO the difference between a successful project in the long-term and the kind of sploogeware that just snowballs with low-value features.
> The value of having (and executing) a coherent product vision is extremely undervalued in FOSS
Interesting you say this because I'd say the opposite is true historically, especially in the systems software community and among older folks. "Do one thing and do it well" seems to be the prevailing mindset behind many foundational tools. I think this why so many are/were irked by systemd. On the other hand newer tools that are more heavily marketed and often have some commercial angle seem to be in a perpetual state of tacking on new features in lieu of refining their raison d'etre.
Is there a name for these types of "overbearing" and visually busy "TUIs"? It seems like all the other agents have the same aesthetic and it is unlike traditional nurses or plain text interfaces in a bad way IMO. The constant spinners, sidebars and needless margins are a nuisance to me. Especially over an ssh connection in a tmux session it feels wrong.
I agree that Opencodr is using a lot of RAM, but regarding the features, I am ak only using the built in features and I wouldn't say they are too many, they are just enough for a complete workflow. If you need more you can install plugins, which I haven't done yet and it's my daily driver for four months.
I’m a little surprised by your description of constant releases and instability. That matches how I would describe Claude Code, and has been one of the main reasons I tend to use OpenCode more than Claude Code.
OpenCode has been much more stable for me in the 6 months or so that I’ve been comparing the two in earnest.
I use Droid specifically because Claude Code breaks too often for me. And then Droid broke too (but rarely), and I just stuck to not upgrading (like I don't upgrade WebStorm. Dev tools are so fragile)
I’ve been testing opencode and it feels TUI in appearance only. I prefer commandline and TUIs and in my mind TUI idea is to be low level, extremely portable interface and to get out of the way. Opencode does not have low color, standard terminal theme so had to switch to a proper terminal program. Copy paste is hijacked so I need to write code out to file in order to get a snippet. The enter key (as in the return by the keypad) does not work for sending a line. I have not tested but don’t think this would work over SSH even. I have been googling around to find if I am holding it wrong but it feels to break expectations of a terminal app in a way that I wish they would have made it a gui. Makes me sad because I think the goods are there and it’s otherwise good.
FWIW, in Kitty on Linux, SHIFT + mouse-select copies and SHIFT + middle-mouse-button pastes. This use of SHIFT and otherwise using standard Unix style copy/paste is common in a lot of TUIs (eg, weechat).
I don’t think good TUI’s are the same as good command line programs. Great tui apps would to me be things like Norton/midnight commander, borlands turbo pascal, vim, eMacs and things like that
Yes cli and tui are not the same, but I expect TUI to work decent in general terminal emulator and not acitvely block copying and pasting. Having to install supported terminal emulator goes against the vibe.
> they're constantly releasing at an extremely high cadence, where they don't even spend the time to test or fix things
Tbf, this seems exactly like Claude Code, they are releasing about one new version per day, sometimes even multiple per day. It’s a bit annoying constantly getting those messages saying to upgrade cc to the latest version
Yeah every time I want to like it, scrolling is glitched vs codex and Claude. And other various things like: why is this giant model list hard coded for ollama or other local methods vs loading what I actually have...
On top of that. Open code go was a complete scam. It was not advertised as having lower quality models when I paid and glm5 was broken vs another provider, returning gibberish and very dumb on the same prompt
The biggest reason is I don't like being locked into an ecosystem. I can use whatever I want with OpenCode, not so much with Codex and Claude Code. Right now I'm only using GPT with it, but I like the option.
CC I have the least experience with. It just seemed buggy and unpolished to me. Codex was fine, but there was something about it that just didn't feel right. It seemed fined for code tasks but just as often I want to do research or discuss the code base, and for whatever reason I seemed to get terse less useful answers using Codex even when it's backed by the same model.
OpenCode works well, I haven't had any issues with bugs or things breaking, and it just felt comfortable to use right from the jump.
That is very disappointing coz I've been wanting to try an alternative to Gemini CLI for exactly these reasons. The AI is great but the actual software is a buggy, slow, bloated blob of TypeScript (on a custom Node runtime IIUC!) that I really hate running. It takes multiple seconds to start, requires restarting to apply settings, constantly fucks up the terminal, often crashes due to JS heap overflows, doesn't respect my home dir (~/.gemini? Come on folks are we serious?), has an utterly unusable permission system, etc etc. Yet they had plenty of energy to inject silly terminal graphics and have dumb jokes and tips scroll across the screen.
Is Claude Code like this too? I wonder if Pi is any better.
A big downside would be paying actual cost price for tokens but on the other hand, I wouldn't be tied to Google's model backend which is also extremely flaky and unable to meet demand a lot of the time. If I could get real work done with open models (no idea if that's the case yet) and switch providers when a given provider falls over, that would be great.
Claude will also happily write a huge pile of junk into your home directory, I am sad to report. The permissions are idiotic as well, but I always use it in a container anyway. But I have not had it crash and it hasn't been slow starting for me.
This is why I'm taking a wait-and-see approach to these tools on HN myself. My month with Claude Code (the TUI, not the GUI) was amazing from an IT POV, just slop-generating niche tools I could quickly implement and audit (not giant-ass projects), but I ain't outsourcing that to another company when Qwen et al are right there for running on my M1 Pro or RTX 3090.
I'm looking forward to more folks building these kinds of tools with a stronger focus on portability via API or loading local models, as means of having a genuinely useful assistant or co-programmer rather than paying some big corp way too much money (and letting them use my data) for roughly the same experience.
Yeah I tried using it when oh-my-opencode (now oh-my-openagent) started popping off and found it had highly unstable. I just stick with internal tooling now.
For serious coding work I use the Zed Agent; for everything else I use pi with a few skills. Overall, though, I'd recommend Pi plus a few extensions for any features you miss extremely highly. It's also TypeScript, but doesn't suffer from the other problems OC has IME. It's a beautiful little program.
Big +1 to Pi[1]. The simplicity makes it really easy to extend yourself too, so at this point I have a pretty nice little setup that's very specific to my personal workflows. The monorepo for the project also has other nice utilities like a solid agent SDK. I also use other tools like Claude Code for "serious" work, but I do find myself reaching for Pi more consistently as I've gotten more confident with my setup.
I've been building VT Code (https://github.com/vinhnx/vtcode), a Rust-based semantic coding agent. Just landed Codex OAuth with PKCE exchange, credentials go into the system keyring.
I build VT Code with Tree-sitter for semantic understanding and OS-native sandboxing. It's still early but I confident it usable. I hope you'll give it a try.
pi.dev is worth checking out. The basic idea is they provide a minimalist coding agent that's designed to be easy to extend, so you can tailor the harness to suit your needs without any bloat.
One of the best features is they haven't been noticed by Anthropic yet so you can still use your Claude subscription.
I tried it briefly and the practice - argued for strategy for operation actually - to override my working folder seelction and altering to the parent root git folder is a no go.
Isn't this pretty much the standard across projects that make heavy use of AI code generation?
Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.
I don't think that's true at all. As I said, in a response to another person blaming it on agentic coding above, there are a very large number of ways to use coding agents to make your programs faster, more efficient, more reliable, and more refined that also benefit from agents making the code writing research, data piping, and refactoring process quicker and less exhausting. For instance, by helping you set up testing scaffolding, handling the boilerplate around tests while you specify some example features or properties you want to test and expands them, rewriting into a more efficient language, large-scale refactors to use better data structures or architectures, or allowing you to use a more efficient or reliable language that you don't know as well or find to have too much boilerplate or compiler annoyance to otherwise deal with yourself. Then there are sort of higher level more phenomenological or subjective benefits, such as helping you focus on the system architecture and data flow, and only zoom in on particular algorithms or areas of the code base that are specifically relevant, instead of forever getting lost in the weeds of thinking about specific syntax and compiler errors or looking up a bunch of API documentation that isn't super important for the core of what you're trying to do and so on.
Personally, I find this idea that "coding isn't the bottleneck" completely preposterous. Getting all of the API documentation, the syntax, organizing and typing out all of the text, finding the correct places in the code base and understanding the code base in general, dealing with silly compiler errors and type errors, writing a ton of error handling, dealing with the inevitable and inoraticable boilerplate of programming (unless you're one of those people that believe macros are actually a good idea and would meaningfully solve this), all are a regular and substantial occurrence, even if you aren't writing thousands of lines of code a day. And you need to write code in order to be able to get a sense for the limitations of the technology you're using and the shape of the problem you're dealing with in order to then come up with and iterate on a better architecture or approach to the problem. And you need to see your program running in order to evaluate whether it's functionality and design a satisfactory and then to iterate on that. So coding is actually the upfront costs that you need to pay in order to and even start properly thinking about a problem. So being able to get a prototype out quickly is very important. Also, I find it hard to believe that you've never been in a situation where you wanted to make a simple change or refactor that would have resulted in needing to update 15 different call sites to do properly in a way that was just slightly variable enough or complex enough that editor macros or IDE refactoring capabilities wouldn't be capable of.
That's not to mention the fact that if agentic coding can make deploying faster, then it can also make deploying the same amount at the same cadence easier and more relaxing.
You're both right. AI can be used to do either fast releases or well designed code. Don't say both, as you're not making time, you're moving time between those two.
Which one you think companies prefer? Or if you're a consulting business, which one do you think your clients prefer?
> AI can be used to do either fast releases or well designed code
I have yet to actually see a single example of the latter, though. OpenCode isn't an isolated case - every project with heavy AI involvement that I've personally examined or used suffers from serious architectural issues, tons of obvious bugs and quirks, or both. And these are mostly independent open source projects, where corporate interests are (hopefully) not an influence.
I will continue to believe it's not actually possible until I am proven wrong with concrete examples. The incentives just aren't there. It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc. but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
> It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc
That's a complete strawman of what I — or others trying to learn how to use coding agents to increase quality, like Simon Willison or the Oxide team — am saying.
> but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
This is just a no true Scotsman. I prefer to use coding agents because they don't forget details, or get exhausted, or overwhelmed, or lazy, or give up, ever — whereas I might. Therefore, they allow me to do all of the things that improve code and software quality more extensively and thoroughly, like refactors, performance improvements, and tests among other things (because yes, there is no single panacea). Furthermore, I do still care about the clarity, concision, modularity, referential transparency, separation of concerns, local reasonability, cognitive load, and other good qualities of the code, because if those aren't kept up a) I can't review the code effectively or debug things as easily when they go wrong, b) the agent itself will struggle to male changes without breaking other things, and struggle to debug, c) those things often eventually effect the quality of the end state software.
Additionally, what you say is empirically false. Many people who do deeply value quality software and code quality, such as the creators of Flask, Redis, and SerenityOS/Ladybird, all use and value agentic coding.
Just because you haven't seen good quality software with a large amount of agentic influence doesn't mean it isn't possible. That's very close minded.
Show me an example then. I want to see an example of quality software that makes heavy use of AI generated code (as in, basically written entirely by AI similar to OpenCode), led by developer(s) who care deeply about software quality but still choose to not write code themselves.
I tried running Opencode on my 7$/yr 512mb vps but it had the OOM issue and yes it needs 1GB of ram or more.
I then tried running other options like picoclaw/picocode etc but they were all really hard to manage/create
The UI/UX I want is that I can just put my free openrouter api key in and then I am ready to go to get access to free models like Arcee AI right now
After reading your comments/I read this thread, I tried crush by charmbracelet again and it gives the UI/UX that I want.
I am definitely impressed by crush/ the charm team. They are on HN and they work great for me, highly recommended if you want something which can work on low constrained devices
I do feel like Charm's TUI's are too beautiful in the sense that running a connection over SSH can delay so when I tried to copy some things, the delay made things less copy-able but overall, I think that I am using Crush and I am happy for the most part :-)
Edit: That being said, just as I was typing this, Crush took all the Free requests from Openrouter that I get for free so it might be a bit of minor issue but overall its not much of an issue from Crush side, so still overall, my point is that Crush is worth checking out
Kudos to the CharmBracelet team for making awesome golang applications!
This is my main problem I have with it: It sends data and loads code left and right by default. For instance, the latest plugin packages are automatically installed on every startup. Their “Zen” provider is enabled by default so you might accidentally upload your code base to their servers. Better yet: The web UI has a button that just uploads the entire session to their servers WITH A SINGLE CLICK for sharing.
The situation is ... pretty bad. But I don’t think this is particularly malicious or even a really well considered stance, but just a compromise in order to move fast and ship useful features.
To make it easily adoptable by anyone privacy conscious without hours of tweaking, there should be an effort to massively improve this situation. Luckily, unlike Claude Code, the project is open source and can he changed!
There is some kind of fitting irony around agentic coding harnesses mainly being maintained by coding agents themselves, and as a result they are all a chaotic mess.
Fwiw this got changed about a week ago, where they changed the logic to match the documentation rather than default to sending your prompts to their servers. This is why so many people have noticed this happening but if you ask an AI about it right now it will say this is not true.
Personally I think it's necessary to run opencode itself inside a sandbox, and if you do that you can see all of the rejected network calls it's trying to make even in local mode. I use srt and it was pretty straightforward to set up
The model selection for title generation works as follows (prompt.ts:1956-1960):
1. If the title agent has an explicit model configured — that model is used.
2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402):
- claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku
- gemini-3-flash / gemini-2.5-flash
- gpt-5-nano
- (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano)
3. If no small model is found — it falls back to the same model currently being used for the session.
So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
When I did this, I used a single local llama.cpp server instance as my main model without setting a small model and it did not use it for chat titles while I used it for prompts.
Chat titles would work even when the local llama.cpp server hadn't started, and it was never in the the llama.cpp logs, it used an external model I hadn't set up and had not intended to use.
It was only when I set `small_model` that I was able to route title generation to my own models.
Also, even when using local models in ollama or lmstudio, prompts are proxied via their domain, so never put anything sensitive even when using local setup
To be clear, that seems to be about the webui only, the TUI doesn't seem affected. I haven't fully investigated this myself, but when I run opencode (1.2.27-a6ef9e9-dirty) + mitmproxy and using LM Studio as the backend, when starting opencode + executing a prompt, I only see two requests, both to my LM Studio instance, both normal inference requests (one for the chat itself + one for generating the title).
Everything you read on the internet seems exaggerated today. Especially true for reddit, and especially especially true for r/LocalLllama which is a former shadow of itself. Today it's mostly sockpuppets pushing various tools and models, and other sockpuppets trying to push misinformation about their competitors tools/models.
Geez there should be a big warning on the tin about this. They’re so neatly integrated with copilot that I assumed (and told others) that they had all the privacy guarantees of copilot :(
I can tell that you’re doing all of this in the name of first-use UX. It’s working: The out of the box experience is really seamless.
But for serious (“grown up”) use, stuff like this just doesn’t fly. At all. We have to know and be able to control exactly where data gets sent. You can’t just exfiltrate our data to random unvetted endpoints.
Given the hurt trust of the past, there also needs to be a communication campaign (“actually we’re secure now”), because otherwise people will keep going around claiming that OpenCode sends all of your data to Grok. This would really unnecessarily hurt the project in the long run.
More importantly, the current dev branch source for packages/opencode/src/session/summary.ts shows summarizeMessage() now only computes diffs and updates the message summary object; it does not make an LLM call there anymore. The current code path calls summarizeSession() and summarizeMessage(), and summarizeMessage() just filters messages, computes diffs, sets userMsg.summary.diffs, and saves the message.
Yikes... sending prompts to a third party by default with no disclosure in the setup flow is a rough look for a tool that positions itself as the open sources alternative. "Open" loses meaning fast if the defaults work against the user.
To the provider you select in the UI, I agree. But OpenCode automatically sends prompts to their free "Zen" proxy, even without choosing it in the UI.
Imagine someone using it at work, where they are only allowed to use a GitHub Copilot Business subscription (which is supported in OpenCode). Now they have sent proprietary code to a third party, and don't even know they're doing it.
This is exactly me considering what I might have leaked to god knows who via grok. I was hyped by opencode but now I’m thinking of alternatives. A huge red flag… at best irresponsible?
Are you using Grok for the coding? Because I have Copilot connected and I can see the request to Copilot for the summaries - with no "small model" setting even visible in my settings.
I found out about OpenCode through the Anthropic feud. I now spend most of my AI time in it, both at work and at home. It turns out to be pretty great for general chat too, with the ability to easily integrate various tools you might need (search being the top one of course).
I have things to criticize about it, their approach to security and pulling in code being my main one, but over all it’s the most complete solution I’ve found.
They have a server/client architecture, a client SDK, a pretty good web UI and use pretty standard technologies.
The extensibility story is good and just seems like the right paradigms mostly, with agents, skills, plugins and providers.
They also ship very fast, both for good and bad, I’ve personally enjoyed the rapid improvements (~2 days from criticizing not being able to disable the default provider in the web ui to being able to).
I think OpenCode has a pretty bright future and so far I think that my issues with it should be pretty fixable. The amount of tasteful choices they’ve made dwarfs the few untasteful ones for me so far.
The team also is not breathlessly talking about how coding is dead. They have pretty sane takes on AI coding including trying to help people who care about code quality.
One thing that makes OpenCode stand out to me is the web UI. I host it on my rPi 4B, serving as my AI assistant and remote mobile access to my homelab.
Since the homelab doesn't really have access to any risky data, I just gave OpenCode full Docker access and connect to it through Tailscale on my iPhone https://github.com/pprotas/homelab
The Agent that is blacklisted from Anthropic AI, soon more to come.
I really like how their subagents work, as a bonus I get to choose which model is in which agent. Sadly I have to resort to the mess that Anthropic calls Claude Code
They are not blacklisted. You are allowed to use the API at commercial usage pricing. You are just not allowed to use your Claude Code subscription with OpenCode (or any other third‑party harness for the record).
If you're not paying full-fat API prices, then probably.
From what I've heard, the metrics used by Anthropic to detect unauthorized clients is pretty easy to sidestep if you look at the existing solutions out there. Better than getting your account banned.
The highest in in the industry for API pricing right now is GPT-5.4-Pro, OpenRouter adding that as an option in their Auto Router was when I had to go customise the routing settings because it was not even close to providing $30/m input tokens and $180/m output tokens of value (for context Opus 4.6 is $5/m input and $25/m output)
(Ok, technically o1-pro is even more expensive, but I'm assuming that's a "please move on" pricing)
Sometimes people want to be real pedants about licensing terms when it comes to OSS, assuming such terms are completely bulletproof, other times people don't think the terms of their agreement with a service provider should have any force at all.
With Anthropic, you either pay per token with an API key (expensive), or use their subscription, but only with the tools that they provide you - Claude, Claude Cowork and Claude Code (both GUI and CLI variants). Individuals generally get to use the subscriptions, companies, especially the ones building services on top of their models, are expected to pay per token. Same applies to various third party tools.
The belief is that the subscriptions are subsidized by them (or just heavily cut into profit margins) so for whatever reason they're trying to maintain control over the harness - maybe to gather more usage analytics and gain an edge over competitors and improve their models better to work with it, or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute.
Given the ample usage limits, I personally just use Claude Code now with their 100 USD per month subscription because it gives me the best value - kind of sucks that they won't support other harnesses though (especially custom GUIs for managing parallel tasks/projects). OpenCode never worked well for me on Windows though, also used Codex and Gemini CLI.
>or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute
You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Sounds pretty sane, the same way how OpenWebUI and probably other software out there also has a concept of “tool models”, something you use for all the lower priority stuff.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
The sota models are largely undifferentiated from each other in performance right now. And it’s possible open weight models will get “good enough” relatively soonish. This creates a classic case where inference becomes a commodity. Commodities have very low margins. Training puts them in an economic hole where low margins will kill them.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.
Anthropic's model deployments for Claude Code are likely optimized for Claude Code. I wouldn't be surprised if they had optimizations like sharing of system prompt KV-cache across users, or a speculative execution model specifically fine-tuned for the way Claude Code does tool calls.
When setting your token limits, their economics calculations likely assume that those optimizations are going to work. If you're using a different agent, you're basically underpaying for your tokens.
It’s probably a mixture of things including direct control over how the api is called and used as pointed out above and giving a discount for using their ecosystem. They are in fact a business so it should not surprise anyone they act as one.
It might well be a mixture, but 95% of that mixture is vendor lock in. Same reason they don't support AGENTS.md, they want to add friction in switching.
It's very straightforward to instrument CC under tmux with send-keys and capturep. You could easily use that for distillation, IMO. There are also detailed I/O logs.
Yup. And right now I'm straight-up breaking Claude's TOS by modifying OpenCode to still accept tokens. But I only have a few days left and don't care if they ban me. I'm using what I paid for.
Anthropic has an API, you can use any client but they charge per input/output/cache token.
One-price-per-month subscriptions (Claude Code Pro/MAX @ $20/$100/$200 a month) use a different authentication mechanism, OAUTH. The useful difference is you get a lot more inference than you can for the same cost using the API but they require you to use Claude Code as a client.
Some clients have made it simple to use your subscription key with them and they are getting cease and desist letters.
I pay $100/mo to Anthropic. Yesterday I coded one small feature via an API key by accident and it cost $6. At this rate, it will cost me $1000/mo to develop with Opus. I might as well code by hand, or switch to the $20 Codex plan, which will probably be more than enough.
I'd rather switch to OpenAI than give up my favorite harness.
My monthly "connection fee" is more than that (no solar, just EV). Your cartel needs to step it up!
For me it's $0.8/kWh during peak, $0.47 off peak, and super off peak of $0.15. I accidentally left a little mini 500W heater on all day, while I was out, costing > 5% of your whole month!
Yeah I had a similar experience one time. Which is why I laugh when people suggest Anthropic is profitable. Sure, maybe if everyone does API pricing. Which they won’t because it’s so damn expensive. Another way to think about it is API pricing is a glimpse into the future when everyone is dependent on these services and the subscription model price increases start.
Wait - are you missing all the context on this? Anthropic pushed back against this hard, there was a whole back and forth. I'm on mobile and can't look it up for you atm but if you google about this scenario, Anthropic definitely come out of this looking a lot better than OpenAI and xAI
If you evaluate fascism in terms of donation, yes.
But it is more about the political opinions, IMHO, and Anthropic doesn't sound more attractive than the competitors. Anthropic is very much to the right of the transhumanism spectrum (even if xAI and OpenAI are even farther).
IMO, OpenAI have either implicitly committed to becoming the IT service for Trump's secret police, or they've willingly signed up for the harsh retaliation Anthropic's getting, knowing that the Trump administration will inevitably try to push OpenAI around in the same way, if they meaningfully refuse to assist in domestic mass surveillance efforts.
You can argue a moral equivalence, I guess, but on a practical level, OpenAI's decision is more dangerous for everyone, because it will help to secure Trump as a dictator.
probably more agents to be blocked by anthropic. i've seen theo from t3.gg go through a bunch of loopholes to support claude in his t3code app just so anthropic doesn't sue their asses.
There are boards starting in the $1500-$2000 range, and complete systems in the $2500-$2700 range. I actually don't know of any Strix Halo mini PCs that cost $3000, do you?
EDIT: The system I bought last summer for $1980 and just took delivery of in October, Beelink GTR 9 Pro, is now $2999.... wow...
JS is not something that was developed with CLI in mind and on top of that that language does not lend itself to be good for LLM generation as it has pretty weak validation compared to e.g. Rust, or event C, even python.
It’s simply one of the most productive languages. It actually has a very strong type system, while still being a dynamic language that doesn’t have to be compiled, leading to very fast iteration. It’s also THE language you use when writing UIs. Execution is actually pretty fast through the runtimes we have available nowadays.
The only other interpreted language is Python and that thoroughly feels like a toy in comparison (typing situation still very much in progress, very weak ORM situation, not even a usable package manger until recently!).
I'm unsure that I agree with this, for my smaller tools with a UI I have been using rust for business logic code and then platform native languages, mostly swift/C#.
I feel like with a modern agentic workflow it is actually trivial to generate UIs that just call into an agnostic layer, and keeping time small and composable has been crucial for this.
That way I get platform native integration where possible and actual on the metal performance.
If Python has a "very weak ORM situation", what is it about the TS ORM scene that makes it stronger by comparison? Is there one library in particular that stands out?
pnpm is amazing for speed and everybody should use it! but even with npm before it, at least it was correct. I had very few (none?) mysterious issues with it that could only be solved by nuking the entire environment. That is more than I can say about the python package managers before uv.
For a TUI agent, runtime performance is not the bottleneck, not by far. Hackability is the USP. Pi has extensions hotreloading which comes almost for free with jiti. The fact that the source is the shipped artifact (unlike Go/Rust) also helps the agent seeing its own code and the ability to write and load its own extensions based on that. A fact that OpenClaw’s success is in part based on IMO.
I can’t find the tweet from Mario (the author), but he prefers the Typescript/npm ecosystem for non-performance critical systems because it hits a sweet spot for him. I admire his work and he’s a real polyglot, so I tend to think he has done his homework. You’ll find pi memory usage quite low btw.
OK, make sense, but there are also claw clones that are in Rust (and self modifying).
Also python ones would also allow self modifying. I'm always puzzled (and worried) when JS is used outside of browsers.
I'm biased as I find JS/TS rather ugly language compared to anything other basically (PHP is close second). Python is clean, C has performance, Rust is clean and has performance, Java has the biggest library and can run anywhere.
In pi’s case there is a plugin system. It’s much easier to make a self extending agent work with Python or JavaScript than most other languages. JavaScript has the benefit that it has a great typing system on top with TypeScript.
Pi is refreshingly minimal in terms of system prompts, but still works really well and that makes me wonder whether other harnesses are overdoing. Look at OpenCode's prompts, for instance - long, mostly based on feels and IMO unnecessary. I would've liked to just overwrite OC's system prompts with Pi's (to get other features that Pi doesn't have) but that isn't possible today (without maintaining a custom fork)
I just found out about pi yesterday. It's the only agent that I was able to run on RISC-V. It's quite scary that it runs commands without asking though.
The simplicity of extending pi is in itself addictive, but even in its raw form it does the job well.
Before finding pi I had written a lot of custom stuff on top of all the provider specific CLI tools (codex, Claude, cursor-agent, Gemini) - but now I don’t have to anymore (except if I want to use my anthropic sub, which I will now cancel for that exact reason)
Pi is good stuff and refreshingly simple and malleable.
I used it recently inside a CI workflow in GitLab to automatically create ChangeLog.md entries for commits. That + Qwen 3.5 has been pretty successful. The job starts up Pi programatically, points it at the commits in question, and tells it to explore and get all the context it needs within 600 seconds... and it works. I love that this is possible.
opencode stands out as one of the few agents with a proper client server architecture that allows something like openchambers great vscode extension so its possible to seamlessly switch between tui, vscode, webapp, desktop app. i think there is hardly a usable alternative for most coding agent usecases (assuming agents from model providers are a no go, they cannot be allowed to own the tools AND the models). But its also far from perfect: the webui is secretly served from their servers instead of locally for no reason. worse the fallback route gets also sent to their servers so any unknown request to opencode api ends up being sent to opencode servers potentially leaking data. the security defaults are horrific, its impossible to use it safely outside a controlled container. it will just serve your whole hard drive via rest endpoint and not constrain to project folders. the share feature uploading your conversations to their servers is also so weirdly communicated and implemented that it leaves a bad taste. I dont think this will become much better until the agent ecosystem is more modular and less monolith, acp, a2a and mcp need to become good enough so tools, prompts, skills, subagent setups and workflow engines and UIs are completely swappable and the agent core has to only focus on the essentials like runtime and glue architecture. i really hope we dont see all of these grow into full agent oses with artificial lock in effects and big effort buy in.
I love OpenCode! I wrote a plugin that adds two tools: prune and retrieve. Prune lets the LLM select messages to remove from the conversation and replace with a summary and key terms. The retrieve tool lets it get those original messages back in case they're needed. I've been livestreaming the development and using it on side projects to make sure it's actually effective... And it turns out it really is! It feels like working with an infinite context window.
Long tool outputs/command outputs everything in my harness is spilled over to the filesystem. Context messages are truncated and split to filesystem with a breadcrumb for retrieving the full message.
The infinite context window framing is the right way to think about it. Running inside Claude Code continuously, the prune step matters more than retrieve in practice — most of what gets dropped stays dropped. More useful is being deliberate about what goes in at the start of each loop iteration rather than managing what comes out at the end.
Assuming you pay per token, which seems like a really strange workflow to lock yourself into at this point. Neither paid monthly plans nor local models suffer from that issue.
I tried once to use APIs for agents but seeing a counter of money go up and eventually landing at like $20 for one change, made it really hard to justify. I'd rather pay $200/month before I'd be OK with that sort of experience.
Yes I use the $200 per month plan for Claude Code and it's amazing
I assume the usage varies based on prompt caching, but I could be wrong. Why would you assume prompt caching would have zero effect on the subscription usage?
The $20-per-change problem is a workflow problem, not a pricing problem. Batching work into larger well-scoped sessions rather than interactive back-and-forth changes the unit economics significantly. Most people use these tools like a terminal — one command at a time — which is the worst possible cost profile.
I’ve been extraordinarily productive with this, their $10 Go plan, and a rigorous spec-driven workflow. Haven’t touched Claude in 2 months.
I sprinkle in some billed API usage to power my task-planner and reviewer subagents (both use GPT 5.4 now).
The ability to switch models is very useful and a great learning experience. GLM, Kimi and their free models surprised me. Not the best, not perfect, but still very productive. I would be a wary shareholder if I owned a stake in the frontier labs… that moat seems to be shrinking fast.
It's been a moving target for years at this point.
Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.
But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.
The moat is having researchers that can produce frontier models. When OpenCode starts building frontier models, then I'd be worried; otherwise they're just another wrapper
"OpenCode Go" (a subscription) lets you use lots of hosted open-weights frontier AI models, such as GLM-5 (currently right up there in the frontier model leaderboards) for $10 per month.
Can you talk more about how you leverage higher quality models for the stuff that counts? Anywhere I can read more on the philosophy of when to use each?
Sure happy to share. It’s been trial and error, but I’ve learned that for agents to reliably ship a large feature or refactor, I need a good spec (functional acceptance criteria) and I need a good plan for sequencing the work.
The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.
The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.
So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.
I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).
Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.
As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).
I'm building a full stack web app, simple but with real API integrations with CC.
Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.
I don't use it for coding but as an agent backend. Maybe opencode was thought for coding mainly, but for me, it's incredibly good as an agent, especially when paired with skills, a fastapi server, and opencode go(minimax) is just so much intelligence at an incredibly cheap price. Plus, you can talk to it via channels if you use a claw.
I'd really like to get more clarification on offline mode and privacy. The github issues related to privacy did not leave a good feeling, despite being initially excited. Is offline mode a thing yet? I want to use this, but I don't want my code to leave my device.
I do like OpenCode, and have been using it in and off since last July. But I feel like they’re trying to stuff too much GUI into a TUI? Due to this I find myself using Codex and Pi more often. But am still glad OpenCode and their Zen product exist.
The only thing I'm wondering is if they have eval frameworks (for lack of a better word). Their prompts don't seem to have changed for a while and I find greater success after testing and writing my own system prompts + modification to the harness to have the smallest most concise system prompt + dynamic prompt snippets per project.
I feel that if you want to build a coding agent / harness the first thing you should do is to build an evaluation framework to track performance for coding by having your internal metrics and task performance, instead I see most coding agents just fiddle with adding features that don't improve the core ability of a coding agent.
Now I just started looking into OpenCode yesterday, but seems you can override the system prompts by basically overloading the templates used in for example `~/.opencode/agents/build.md`, then that'd be used instead of the default "Build" system prompt.
At least from what I gathered skimming the docs earlier, might not actually work in practice, or not override all of it, but seems to be the way it works.
I've forked it locally, to be honest I haven't merged upstream in a while as I haven't seen any commits that I found relevant and would improve my usage, they seem to work on the web and desktop version which I don't use.
The changes I've made locally are:
- Added a discuss mode with almost on tools except read file, ask tool, web search only based no heuristics + being able to switch from discuss to plan mode.
Experiments:
- hashline: it doesn't bring that much benefit over the default with gpt-5.4.
- tried scribe [0]: It seems worth it as it saves context space but in worst case scenarios it fails by reading the whole file, probably worth it but I would need to experiment more with it and probably rewrite some parts.
The nice thing about opencode is that it uses sqlite and you can do experiments and then go through past conversation through code, replay and compare.
i've been using this as my primary harness for llama.cpp models, Claude, and Gemini for a few months now. the LSP integration is great. i also built a plugin to enable a very minimal OpenClaw alternative as a self modifying hook system over IPC as a plugin for OpenCode: https://github.com/khimaros/opencode-evolve -- and here's a deployment ready example making use of it which runs in an Incus container/VM: https://github.com/khimaros/persona
Very cool! I have been using opencode, as almost everybody else in the lab is using codex. I found the tools thing inside your own repo amazing but somehow I could not get it to reliably get opencode to write its own tools. Seems also a bit scary as there is pretty much not much security by default. I am using it in a NixOS WSL2 VM
I'm actually moving to containerised isolation. I realised the agents waste too much time trying to correctly install dependencies, not unlike a normal nixos user.
What would be the advantage using this over say VSCode with Copilot or Roo Code? I need to make some time to compare, but just curious if others have a good insight on things.
I started out using VSCode with their Claude plugin; it seemed like a totally unnecessary integration. A better workflow seems to just run Claude Code directly on my machine where there are fewer restrictions - it just opens a lot more possibilities on what it can do
Ok I get it now, same with the vim comment above, it seems VSCode has the more IDE setup while OpenCode is giving the vim nerdtree vibe? I'll have to take a look, it makes sense to possibly have both for different use cases I guess.
The security concerns here are real but not unique to OpenCode. Most AI coding agents have the same fundamental problem: they need broad file system access to be useful, but that access surface is also the attack surface. The config-from-web issue is particularly bad because it's essentially remote code execution through prompt injection.
What I'd want to see from any of these tools is a clear permissions model — which files the agent can read vs write, whether it can execute commands, and an audit log of what it actually did. Claude Code's hooks system at least gives you deterministic guardrails before/after agent actions, but it's still early days for this whole category.
This is another one of OpenCode’s current weak points in the security complex: They consider permissions a “UX feature” rather than actual guardrails. The reasoning is that you’re giving the agent access to the shell, so it’ll be able to sidestep everything.
This is of course a cop-out: They’re not considering the case in which you’re not blindly doing that.
Fun fact: In the default setup, the agent can fully edit all of the harnesses files, including permissions and session history. So it’s pretty trivial for it to a) escalate privileges and then even b) delete evidence of something nefarious happening.
It’s pretty reckless and even pretty easy to solve with chroot and user permissions. There just has been (from what I see currently) relatively little interest from the project in solving this issue.
Same thoughts - I wanted a "permission manager" that defines a set of policies agnostic to coding agents. It also comes with "monitor mode" that shows operations blocked, but not quite an audit log yet though.
Granted, I just started playing around with OpenCode (but been using Codex and Claude Code since they were initially available, so not first time with agents), but anyways:
> they need broad file system access to be useful, but that access surface is also the attack surface
Do they? You give them access to one directory typically (my way is to create a temporary docker container that literally only has that directory available, copied into the container on boot, copied back to the host once the agent completed), and I don't think I've needed them to have "broad file system access" at any point, to be useful or otherwise.
So that leads me to think I'm misunderstanding either what you're saying, or what you're doing?
This is the way. If you’re not running your agent harness/framework in a container with explicit bind mounts or copy-on-build then you’re doing it wrong. Whenever I see someone complain about filesystem access and sequirity risk it’s a clear signal of incompetence imo.
Someone correct me if I'm wrong, but if you're doing bind-mounts, ensure you do read-only, if you're doing bi-directional bind mounts with docker, the agent could (and most likely know how to) create a symlink that allows them to browse outside the bind mount.
That's why I explicitly made my tooling do "Create container, copy over $PWD, once agent completes, copy back to $PWD" rather than the bind-mount stuff.
> create a symlink that allows them to browse outside the bind mount
Could you reproduce that? iiuc the symlink that the agent creates should follow to the path that's still inside the container.
I built a product solving this problem about a year ago, basically a serverless, container-based, NATed VScode where you can eg "run Claude Code" (or this) in your browser on a remote container.
There's a reason I basically stopped marketing it, Cursor took off so much then, and now people are running Claude/Codex locally. First, this is something people only actually start to care about once they've been bitten by it hard enough to remember how much it hurt, and most people haven't got there yet (but it will happen more as the models get better).
Also, the people who simultaneously care a lot about security and systems work AND are AI enthusiasts AND generally highly capable are potentially building in the space, but not really customers. The people who care a lot about security and systems work aren't generally decision makers or enthusiastic adopters of AI products (only just now are they starting to do so) and the people who are super enthusiastic about AI generally aren't interested in spending a lot of time on security stuff. To the extent they do care about security, they want it to Just Work and let them keep building super fast. The people who are decision makers but less on the security/AI trains need to this happen more, and hear about the problem from other executives, before they're willing to spend on it.
To the extent most people actualy care about this, they still want to Just Work like they do now and either keep building super fast or not thinking about AI at all. It's actually extremely difficult to give granular access to agents because the entire point is them acting autonomously or keeping you in a flow state. You either need to have a really compatible threat model to doing so (eg open source work, developer credentials only used for development and kept separate from production/corp/customer data), spend a lot of time setting things up so that agents can work within your constraints (which also requires a willingness to commit serious amounts of time or resources to security, and understanding of it), or spend a lot of time approving things and nannying it.
So right now everybody is just saying, fuck it, I trust Anthropic or Microsoft or OpenAI or Cursor enough to just take my chances with them. And people who care about security are of course appalled at the idea of just giving another company full filesystem access and developer credentials in enterprises where the lack of development velocity and high process/overhead culture was actually of load-bearing importance. But really it's just that secure agentic development requires significant upfront investment in changing the way developers work, which nobody is willing to pay for yet, and has no perfect solutions yet. Dev containers were always a good idea and not that much adopted either, btw.
It takes a lot more investment in actually providing good permissions/security for agent development environments still too, which even the big companies are still working on. And I am still working on it as well. There's just not that much demand for it, but I think it's close.
I wish the team would be more responsive to popular issues - like inability to provide a dynamic api key helper like claude has. This one even has a PR open: https://github.com/anomalyco/opencode/issues/1302
Stupid question, but are there models worth using that specialize in a particular programming language? For instance, I'd love to be able to run a local model on my GPU that is specific to C/C++ or Python. If such a thing exists, is it worth it vs one of the cloud-based frontier models?
I'm guessing that a model which only covers a single language might be more compact and efficient vs a model trained across many languages and non-programming data.
Months ago I tested a concept revolving this issue and made a weird MCP-LSP-LocalLLM hybrid thing that attempts to enhance unlucky, fast changing, or unpopular languages (mine attempts with Zig)
I'm currently experimenting with (trying to) fine tune Qwen3.5 to make it better at a given language (Nim in this case); but I am quite bad at this, and honestly am unsure if it's even really fully feasible at the scale I have access to. Certainly been fun so far though, and I have a little Asus GX10 box on the way to experiment some more!
Been playing around with fine-tuning models for specific languages as well (Clojure and Rust mostly), but the persistent problem is high quality data sets, mostly I've been generating my own based on my own repositories and chat sessions, what approach are you taking for gathering the data?
My own experience trying many different models is that general intelligence of the model is more important.
If you want it to stick to better practices you have to write skills, provide references (example code it can read), and provide it with harnessing tools (linters, debuggers, etc) so the agent can iterate on its own output.
I've used both. I stuck with Claude Code, the ergonomics are better and the internals are clearly optimized for Opus which I use daily, you can feel it. That said OpenCode is still a very good alternative, well above Codex, Gemini CLI or Mistral Vibe in my experience.
OpenCode works awesome for me. The BigPickle model is all I want. I do not throw some large work at the agent that requires lot of reasoning, thinking or decision making. It's my role to chop the work down to bite-size and ask the fantastic BigPickle to just do the damn coding or bit of explaining. It works very well with interactive sessions with small tasks. Not giving something to work over night.
I used Claude with paid subscription and codex as well and settled to OpenCode with free models.
Can someone explain how Claude Code can instantly determine what file I have open and what lines I have selected in VS Code even if it's just running in a VS Code terminal instance, yet I cannot for the life of me get OpenCode to come anywhere close to that same experience?
The OpenCode docs suggest its possible, but it only works with their extension (not in an already open VS Code terminal) with a very specific keyboard shortcut and only barely at that.
Since this is blowing up, gonna plug my opencode/claude-code plugin that allows you to annotate LLMs plans like a Google doc with strikethroughs, comments, etc. and loop with your agent until you're happy with the plan.
What does well: helps context switching by using one window to control many repos with many worktrees each.
What can do better?
It's putting AI too much in control? What if I want to edit a function myself in the workspace I'm working on? or select a snippet and refer that in the promp? without that I feel it's missing a non-negotiable feature.
Do you think the design direction of “chat first” is compatible with editor first? I don’t know if any tools do both well. Seems like a fork in the road, design wise.
I think we already need to flow back and forth in both modes.
Because you steer from the chat more ambitious changes (zoom out) but then you need to still have the power to go full high res and zoom in in whatever you need.
From architecture to system programming smoothly. We need to nail that.
I use it with Qwen 3.5 running locally when my daily limits run out on my other subscriptions.
The harness is great. Local models are just slow enough that the subscription models are easier to use. For most of my tasks these days, the model's capability is sufficient; it is just not as snappy.
Could you say more about the differences between Aider and OpenCode?
I briefly dabbled with Aider some months back but never got any real work done with it. Without installing each one of these new tools I'm having trouble grokking what is changing about them that moves the LLM-assisted software dev experience forward.
One thing I like with Aider is the fact that I can control the context by using /add explicitly on a subset of files. Can you achieve the same wit OpenCode ?
I'm curious: I'venever touched cloud models beyond a few seconds. I run a AMD395+ with the new qwen coder. Is there any intelligence difference, or is it just speed and context? At 128GB, it takes quite awhile before getting context wall.
There's a difference in intelligence. However for 90% of what I'm doing I don't really need it. The online models are just faster.
I just did a one hour vibe session today, ripping out a library dependency and replacing it with another and pushing the library to pypi. I should take my task list and let the local model replicate the work and see how it works out.
The decision to build this as a TUI rather than a web app is interesting. Terminal-native tools tend to get out of the way and let you stay in flow -- curious how the context management works when you have a large codebase, do you chunk by file or do something smarter?
It’s both! The core is implemented as a server and any UI (the TUI being one) can connect to it.
It’s actually “dumber” than any of your suggestions - they just let the agent explore to build up context on its own. “ls” and “grep” are among the most used discovery tools. This works extraordinarily well and is pretty much the standard nowadays because it lets the agent be pretty smart about what context it pulls in.
That's my favorite CLI agent, over codex, claude, copilot and qwen-code.
It has beautified markdown output, much more subagents, and access to free models. Unlike claude and codex. Best is opencode with GitHub opus 4.6, but the fun only lasts for a day, then you're out of tokens for a month.
What caused the switch was that we're building AI solutions for sometimes price-conscious customers, so I was already familiar with the pattern of "Use a superior model for setting a standard, then fine-tuning a cheaper one to do that same work".
So I brought that into my own workflows (kind of) by using Opus 4.6 to do detailed planning and one 'exemplar' execution (with 'over documentation' of the choices), then after that, use Opus 4.6 only for planning, then "throw a load of MiniMax M2.5s at the problem".
They tend to do 90% of the job well, then I sometimes do a final pass with Opus 4.6 again to mop up any issues, this saves me a lot of tokens/money.
This pattern wasn't possible with Claude Code, thus my move to Open Code.
I tried to use it but OpenCode won't even open for me on Wayland (Ubuntu 24.04), whichever terminal emulator I use. I wasn't even aware TUI could have compatibility issues with Wayland
Definitely not Wayland related, or so I doubt. I'm on wayland and never had any issues, and it's a TUI, where the terminal emulator does or does not do GPU work. What led you to that conclusion?
> On Linux, some Wayland setups can cause blank windows or compositor errors.
> If you’re on Wayland and the app is blank/crashing, try launching with OC_ALLOW_WAYLAND=1.
> If that makes things worse, remove it and try launching under an X11 session instead.
OC_ALLOW_WAYLAND=1 didn't work for me (Ubuntu 24.04)
Suggesting to use a different display server to use a TUI (!!) seems a bit wild to me. I didn't put a lot of time into investigating this so maybe there is another reason than Wayland. Anyway I'm using Pi now
That issue points out that it is probably a dependency problem.
The other problem is that they let a package manager block the UI and either swallow hard errors or unable to progress on soft errors. The errors are probably (hopefully) in some logs.
A dev oriented TUI should report unrecoverable errors on screen or at least direct you to the logs. It's not easy to get right, but if you dare to do it isn't rocket science either. They didn't dare.
I had to abandon it because of the memory leak, it would fill up all my memory in a matter of minutes. The devs don't seem to pay it much attention: https://github.com/anomalyco/opencode/issues/5363
I've used it but recently moved back to plain claude code. We use claude at the company and weirdly the experience has become less and less productive using opencode. I'm a bit sad about it as it was the first experience that really clicked and got great results out of. I'm actually curious if Anthropic knows which client is used and if they negatively influence the experience on purpose. It's very difficult to prove because nothing about this is exact science.
I think Anthropic just highly RL’s their model to work best with it’s Claude Code’s particular ways of going about things.
All the background capability Claude code now has makes things way more complex and I saw a meaningful improvement with 4.6 versus 4.5, so imagine other harnesses will take time to catch up.
I've been using opencode for a few months and really like it, both from a UX and a results perspective.
It started getting increasingly flaky with Anthropic's API recently, so I switched back to Claude Code for a couple of days. Oh my, what a night and day difference. Tokens, MCP use, everything.
For anyone reading at OpenAI, your support for OpenCode is the reason I now pay you 200 bucks a month instead.
I've been paying OpenAI 200 bucks a month for what feels like forever by now, but used OpenCode for the first time yesterday, been using Codex (and Claude Code from time to time, to see if they've caught up with Codex) since then.
But I don't use MCP, don't need anything complicated, and not sure what OpenCode actually offers on top. The UI is slightly nicer (but oh so much heavier resource usage), both projects source code seems vibecoded and the architecture is held together with hopes and dreams, but in reality, minor difference really.
Also, didn't find a way in OpenCode to do the "Fast Mode" that Codex has available, is that just not possible or am I missing some setting? Not Codex-Spark but the mode that toggles faster inference.
If it was a somewhat unique name, then yeah maybe. But "opencode" is probably as generic as you could make it, hard to claim to be "squatting" something so well used already... Earliest project on GitHub named "opencode" seems to date back to 2010, but I'm sure there are even earlier projects too: https://github.com/search?q=opencode&type=repositories&s=upd...
you'll be surprised the name was actually a controversy on x/twitter since opencode was originally another dev's idea who joined the charmcli team. they wanted to keep that name but dax somehow (?) ended up squatting it. the charmcli team has renamed their tool to "crush" which matches their other tools a lot better than "opencode"
I wish they would add back support for anthropic max/pro plans via calling the claude cli in -p mode. As I understand thats still very much allowed usage of claude code cli (as you are still using claude cli as it was intended anyway and fixes the issue of cache hits which I believe were the primary reason anthropic sent them the c&d). I love the UX from OpenCode (I loved setting it up in web mode on my home server and code from the web browser vs doing claude code over ssh) but until I can use my pro/max subscription I can't go back, the API pricing is way too much for my third world country wallet.
They had that?! I saw that some people wrote skills and plugins to call claude cli and gemini cli to still be able to use the subscription.
I would also wish that this was supported out of the box, something similar to goose cli providers or acp providers (https://block.github.io/goose/docs/guides/acp-providers).
But I don't want to spend testing yet another agent harness or change the workflow when I somewhat got used to one way of working on things (the churn is real).
I'd love for all these tools to standardise on the structure of plugins / skills / commands / hooks etc., so I can swap between them to compare without feeling handicapped!
- GH copilot API is a first class citizen with access to multiple providers’ models at a very good price with a pro plan
- no terminal flicker
- it seems really good with subagents
- I can’t see any terminal history inside my emacs vterm :(
Question: How do we use Agents to Schedule and Orchestrate Farming and Agricultural production, or Manufacturing assembly machines, or Train rail transportation, or mineral and energy deposit discovery and extraction or interplanetary terraforming and mining, or nuclear reactor modulation, or water desalination automation, or plutonium electric fuel cell production with a 24,000 year half-life radiation decay, or interplanetary colonization, or physics equation creation and solving for faster-than-light travel?
Yeah, support the company that promised to help your government illegally mass surveil and mass kill people, because they support a use case slightly better than the non-mass-murdering option.
You are absolutely correct that both are evil ... as are most corporations.
Still, I feel like "will commit illegal mass murder against their own citizens" is a significant enough degree more evil. I think lots of corporations will help their government murder citizens of other countries, but very few would go so far as to agree to murder their own (fellow) citizens ... just to get a juicy contract.
I see your viewpoint but, to me, "both will happily murder you but one is better because they won't murder ME!" isn't very compelling. Like, I get it, but also it changes nothing for me. They're both bad.
It's not about "won't murder me" it's about "won't murder their own tribe". Humans are very tribal creatures, and we have all sorts of built-in societal taboos about betraying our tribe.
We also have taboos against betraying/murdering/whatever people of other tribes, but those taboos are much weaker and get relaxed sometimes (eg. in war). My point is, it takes significantly more anti-social (ie. evil) behavior to betray your own tribe, in the deepest way possible, than it does to do horrible things to other tribes.
This is just as much true for Russians murdering Ukranians as Ukranians murdering Russians, or any other conflict group: almost all Russians would consider a Russian who helps kill Russians to be more evil than a Russian who kills Ukranians (and vice versa).
Right, but I consider someone who'll murder exclusively other tribes to be infinitely closer to someone who'll murder their own tribe than to someone who won't murder anyone.
That a gross exaggeration. But to your point, I could say the same for almost any product I use from Big Tech, every laptop company I buy my hardware from, etc. I'm sure the same applies to you. I can't fight every vendor all the time. For now I pick what works best for my use case.
You're right, Anthropic shouldn't have even taken a moral stance here at all. They should have just gone full send and allowed everything, because there will never be satisfying some people. Why even try?
Many folks from other tools are only getting exposed to the same functionality they got used to, but it offers much more than other harnesses, especially for remote coding.
You can start a service via `opencode serve`, it can be accessed from anywhere and has great experience on mobile except a few bugs. It's a really good way to work with your agents remotely, goes really well with TailScale.
The WebUI that they have can connect to multiple OpenCode backends at once, so you may use multiple VPS-es for various projects you have and control all of them from a single place.
Lastly, there's a desktop app, but TBH I find it redundant when WebUI has everything needed.
Make no mistakes though, it's not a perfect tool, my gripes with it:
- There are random bugs with loading/restoring state of the session
- Model/Provider selection switch across sessions/projects is often annoying
- I had a bug making Sonnet/Opus unusable from mobile phone because phone's clock was 150ms ahead of laptop's (ID generation)
- Sometimes agent get randomly stuck. It especially sucks for long/nested sessions
- WebUI on laptop just completely forgot all the projects at
one day
- `opencode serve` doesn't pick up new skills automatically, it needs to be restarted
Interesting timing — I've been building on Cloudflare Workers
with edge-first constraints, and the resource footprint of most
AI coding tools is striking by comparison. A TypeScript agent
that uses 1GB+ RAM for a TUI feels like the wrong abstraction.
The edge computing model forces you to think differently about
state, memory, and execution — maybe that's where lighter
agentic tools will emerge.
Being able to assign different models to subagents is the feature I've been wanting. I use Claude Code daily and burning the same expensive model on simple file lookups hurts. Any way to set default model routing rules, or is it manual per task?
With OpenCode, I've found that I can do this by defining agents, assigning each agent a specifically model to use. Then K manually flip to that agent when I want it or define some might rules in my global AGENTS.nd file to gives some direction and OpenCode will automatically subtask out to the agent, which then forces the use of the defined model.
The maintaining team is incredibly petty though. Tantrums when they weren't allowed to abuse Claude subscriptions and had to use the API instead. They just removed API support entirely.
Anthropic has zero problems with API billing, there's no chance they told him to rip that out.
Reading through his X comments and GitHub comments he is behaving immaturely. I don't trust what he's saying here. Ripping out Claude API support was just throwing a tantrum. Weird given his age - he's old enough to be more mature.
‘abuse’. The same rate limits apply, the requests still go to the same endpoints.
Even as a CC user I’m glad someone is forcing the discussion.
My prediction: within two years ‘model neutrality’ will be a topic of debate. Creating lock-in through discount pricing is anti-competitive. The model provider is the ISP; the tool, the website.
> The same rate limits apply, the requests still go to the same endpoints.
That is not the point. That is a mere technicality.
You signed a contract. If you don't ignore the terms of the contract to use the product in a way that is explicitly prohibited, you're abusing the product. It is as simple as that.
They offer a separate product (API) if you don't like the terms of the contract.
Also, if you really want to get technical: the limits are under the assumption that caching works as intended, which requires control of the client. 3P clients suck at caching and increase costs. But that is not the overarching point.
> Creating lock-in through discount pricing is anti-competitive.
Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is. It's not great but players much bigger than Anthropic are using discount pricing to create an anti-competitive advantage.
> Because that could be easily resolved by factoring % cache hits into the usage limits.
Absolutely not, you are not thinking from a product perspective at all.
You might not want to capture cache % hits in usage limits because there may be some edge cases you want to support that have low hits even with an optimized client. Maybe your caching strategy isn't perfect yet, so you don't count hits to keep a good product experience going.
OSS clients that freeload on the subscription break your ability to support these use cases entirely. Now you have to count cache hits at the expense of everyone else. It is a classic case of some people ruining the experience for everyone.
> Why is the 'Apple electric company' selling cheaper electricity to households with Apple devices?
Why does Netflix not let you use your OSS hacked client of choice with your subscription?
> Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is.
And yet, OpenAI have publicly said they welcome OpenCode users to use their subscription package. So how are they being anti-competitive "far more" than Anthropic?
I had been using open code and admire they effort to create something huge and help a lot of developers around the world, connecting LLM our daily work without use a browser!
The MCP (Model Context Protocol) support is what makes this interesting to me. Most coding agents treat the file system and shell as the only surfaces — MCP opens up the possibility of connecting to any structured data source or API as a first-class tool without custom integration work each time.
Curious how the context window management works in practice. With large repos, the "what files to include" problem tends to dominate — does it have a strategy beyond embedding-based retrieval, or is that the main approach here?
I want to love this, but the "just install it globally, what could go wrong?" is simply not happening for an AI-written codebase. Open Source was never truly "you can trust it because everyone can vet it", so you had to do your due diligence. Now with AI code bases, that's "it might be open source, but no one actually knows how it works and only other AIs can check if it's safe because no one can read the code". Who's getting the data? No idea. How would you find out? I guess you can wireshark your network? This is not a great feeling.
Things that make an an OpenCode fanboy
1. OpenCode source code is even more awesome. I have learned so much from the way they have organized tools, agents, settings and prompts.
2. models.dev is an amazing free resource of LLM endpoints these guys have put together
3. OpenCode Zen almost always has a FREE coding model that you can use for all kinds of work. I recently used the free tier to organize and rename all my documents.
I use bubblewrap. This ensures it only has access to the current working directory and its own configuration. No ability to commit or push (since it doesn't have access to ssh keys) or try to run aws commands (no access to awscli configuration) and so on. It can read anything from my .envrc, since it doesn't have access to direnv or the parent directory. You could lock down the network even further if you wanted to limit web searches.
Honestly I was a Claude code only guy for a while. I switched to opencode and I’m not going back.
IMO, the web UI is a killer feature - it’s got just enough to be an agent manager - without any fluff. I run it on my remote VMs and connect over HTTP.
This is very interesting. This could allow custom harnesses to be used economically with Opus. Depending on the usage limits, this may be cheaper than their API.
You can scroll down literally two messages in the Github issue you linked:
> there isnt any telemetry, the open telemetry thing is if you want to get spans like the ai sdk has spans to track tokens and stuff but we dont send them anywhere and they arent enabled either
> most likely these requests are for models.dev (our models api which allows us to update the models list without needing new releases)
> There is currently no option to change this behavior, no startup flag, nothing. You do not have the option to serve the web app locally, using `opencode web` just automatically opens the browser with the proxied web app, not a true locally served UI.
That is the address of their hosted WebUI which connects to an OpenCode server on your localhost. Would be nice if there was an option to selfhost it, but it is nowhere near as bad as "proxying all requests".
I used Codex for a long time. It's definitely better than Claude Code due to being open source, but opencode is nicer to use. Good hotkeys, plan/build modes, fast and easy model switching, good mcp support. Supports skills, is not the fastest but good enough.
Just a data point, I would need to use it for my workflows. I do have a monorepo with a root level claude.md, and project level claude.md files for backend/frontend.
I use this. I run it in a sandbox[0]. I run it inside Emacs vterm so it's really quick for me to jump back and forth between this and magit, which I use to review what it's done.
I really should look into more "native" Emacs options as I find using vterm a bit of a clunky hack. But I'm just not that excited about this stuff right now. I use it because I'm lazy, that's all. Right now I'm actually getting into woodwork.
I started with Codex, then switched to OpenCode, then switched to Codex.
OpenCode just has more bugs, it's incredibly derivative so it doesn't really do anything else than Codex.
The advantage of OpenCode is that it can use any underlying model, but that's a disadvantage because it breaks the native integration. If you use Opus + Claude Code, or Gpt-Codex + Codex App, you are using it the way it was designed to be used.
If you don't actually use different models, or plan to switch, or somehow value vendor neutrality strategically, you are paying a large cost without much reward.
This is in general a rule, vendor neutrality is often seen as a generic positive, but it is actually a tradeoff. If you just build on top of AWS for example, you make use of it's features and build much faster and simpler than if you use Terraform.
You do not "write" code. Stop these euphemisms. It is an intellectual prosthetic for feeble minded people that plagiarizes code by written by others. And it connects to the currently "free" providers who own the means of plagiarizing.
There is nothing open about it. Please do not abuse the term "open" like in OpenBSD.
What I don't understand is that, if coding agents are making coding obsolete, why do these vibe coders not choose a language that doesn't set their users' compute resources on fire? Just vibe rust or golang for their cli tools, no one reviews code slop nowadays anyway /s.
I do not understand the insistence on using JavaScript for command line tools. I don't use rust at all, but if I'm making a vibe coded cli I'm picking rust or golang. Not zig because coding agents can't handle the breaking changes. What better test of agentic coders' conviction in their belief in AI than to vibe a language they can't read.
The topic at hand seems to shift the quality of the discussion greatly these days. Many people have thoughts on coding agents because they are aimed at the lower quartiles of coders. Far less have detailed opinions on other ways they could wield a Markov model.
Just remember, OpenCode is sending telemetry to their own servers, even when you're using your own locally hosted models. There are no environment variables, flags, or other configuration options to disable this behavior.¹
At least you can easily turn off telemetry in Claude Code - just set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC to 1.
You can use Claude Code with llama.cpp and vLLM, too right out of the box with no additional software necessary, just point ANTHROPIC_BASE_URL at your inference server of choice, with any value in ANTHROPIC_API_KEY.
Some people think that Anthropic could disable this at any time, but that's not really true - you can disable automatic updates and back up and reuse native Claude Code binaries, ensuring Anthropic cannot change your existing local Claude Code binary's behavior.
With that said, I like the idea of an open source TUI agent that won't spy on me without my consent and no way to disable it much better than a closed source TUI agent that I can effectively neuter telemetry on, but sadly, OpenCode is not the former. It's just another piece of VC-funded spyware that's destined for enshittification.
Are you sure that endpoint is sending all traffic to opencode? I'm not familiar with Hono but it looks like a catch all route if none of the above match and is used to serve the front-end web interface?