For lightweight sandboxing on Linux you can use bubblewrap or firejail instead of Docker. They are faster and _simpler_. Here is a bwrap script I wrote to run Claude in a minimal sandbox an hour back:
Nice, thanks for sharing. The lack of an equivalent on macOS (sandbox-exec is similar but mostly undocumented and described as "deprecated" by Apple) is really frustrating.
There is an equivalent. I played with it for a while before switching to containers. You can just sign an app with sandbox entitlements that starts a subshell and uses security bookmarks to expose folders to it. It's all fully supported by Apple.
The main issue I had is that most dev tools aren't sandbox compatible out of the box and it's Apple specific tech. You can add SBPL exceptions to make more stuff work but why bother. Containers/Linux VMs work everywhere.
You don't need bind mounts, you can just pass access rights to directories into the sandbox directly. Also sandboxed apps run inside a (filesystem) container so file writes to $HOME are transparently redirected to a shadow home.
I had been planning to explore Lima tonight as a mechanism to shackle CC on macOS.
The trouble with sandbox-exec is that it’s control over network access is not fine grain enough, and I found its file system controls insufficient.
Also, I recently had some bad experiences which lead me to believe the tool MUST be run with strict CPU and memory resource limits, which is tricky on macOS.
Wait, does lima do isolation in a macos context too?
It looks like linux vms, which apple's container-cli (among others) covers at a basic level.
I'd like apple to start providing macOS images that weren't the whole OS.. unless sandbox-exec/libsandbox have affordance for something close enough?
You can basically ask claude/chatgpt to write its jail (dockerfile) and then run that via `container` without installing anything on macos outside the container it builds (IIRC). Even the container-cli will use a container to build your container..
I recently built my own coding agent, due to dissatisfaction with the ones that are out there (though the Claude Code UI is very nice). It works as suggested in the article. It starts a custom Docker container and asks the model, GPT-5 in this case, to send shell scripts down the wire which are then run in the container. The container is augmented with some extra CLI tools to make the agent's life easier.
My agent has a few other tricks up its sleeve and it's very new, so I'm still experimenting with lots of different ideas, but there are a few things I noticed.
One is that GPT-5 is extremely willing to speculate. This is partly because of how I prompt it, but it's willing to write scripts that try five or six things at once in a single script, including things like reading files that might not exist. This level of speculative execution speeds things up dramatically especially as GPT-5 is otherwise a very slow model that likes to think about things a lot.
Another is that you can give it very complex "missions" and it will drive things to completion using tactics that I've not seen from other agents. For example, if it needs to check something that's buried in a library dependency, it'll just clone the upstream repository into its home directory and explore that to find what it needs before going back to working on the user's project.
None of this triggers any user interaction due to running in the container. In fact, no user interaction is possible. You set it going and do something else until it finishes. The model is very much to queue up "missions" that can then run in parallel and you merge them together at the end. The agent also has a mode where it takes the mission, writes a spec, reviews the spec, updates the spec given the review, codes, reviews the code, etc.
Even though it's early days I've set this agent missions that it spent 20 minutes of continuous uninterrupted inferencing time on, and succeeded excellently. I think this UI paradigm is the way to go. You can't scale up AI assisted coding if you're constantly needing to interact with the agent. Getting the most out of models requires maximally exploiting parallelism, so sandboxing is a must.
> Getting the most out of models requires maximally exploiting parallelism, so sandboxing is a must.
What are your thoughts on checkpointing as a refinement of sandboxing? For tight human/llm loops I find automatic checkpoints (of both model context and file system state) and easy rolling back to any checkpoint to the most important tool. It's just so much faster to roll back on major mistakes and try again with the proper context, than to try to get the LLM to fix a mistake, since now the broken code and invalid assumptions are contaminating the context.
But that relies on the human in the loop deciding when to undo. Are you giving some layer of your system the power of resetting sub-agents to previous checkpoints, do you do a full mind-wipe of the sub-agents if they get stuck and try again, or is the context rot just not a problem in practice?
I want to minimize human in the loop. At the moment my agent allows user interaction in one place, after a spec is written+reviewed+updated to reflect the review, it stops. You can then edit the spec before asking for an implementation. It helps to catch cases where the instructions were ambiguous.
At the moment my agent is pretty basic. It doesn't detect endless loops. The model is allowed to bail at any time when it feels it's done or is stuck, so it doesn't seem to need to. It doesn't checkpoint currently. If it does the wrong thing you just roll it all back and improve the AGENTS.md or the mission text. That way you're less likely to encounter problems next time.
The downside is that it's an expensive way to do things but for various reasons that's not a concern for this agent. One of the things I'm experimenting with is how very large token budgets affect agent design.
This also feels like the structure that Sketch.dev uses --- it's asynchronous running in a YOLO mode in a container on a cloud instance, with very little interaction (the expectation is you give it tasks and walk away). I have friends that queue up lots of tasks in the morning and prune down to just a couple successes in the afternoon. I'd do this too but for the scale I work on merge conflicts are too problematic.
I'm working on my own dumb agent for my own dumb problems (engineering, but not software development) and I'd love to hear more about the tricks you're spotting.
Yes it's a bit like that except not in the cloud. It runs locally and doesn't make PRs, it just leaves your worktree in an uncommitted state so you can do any final tests, fixes to the code etc before committing.
That's really interesting. I've noticed something similar - I've tried frontend tasks against GPT-5-Codex and seen it guess the URL of the underlying library (on jsdelivr or GitHub) and attempt to fetch the original source code, often trying several different URLs, in order to dig through the source and figure out how to use an undocumented API feature.
Yes. It made me realize how much intelligence is in these models that isn't being exploited due to minor details of the harness. I've been doing this as a side project and it took nearly no effort to get something that I felt worked better than every other agent I tried, even if the UI is rougher. We're really in the stone age with this stuff. The models are not the limiting factor.
> If anything goes wrong it’s a Microsoft Azure machine somewhere that’s burning CPU and the worst that can happen is code you checked out into the environment might be exfiltrated by an attacker, or bad code might be pushed to the attached GitHub repository.
Isn't that risking getting banned from Azure? The compromised agent might not accomplish anything useful, but its attempts might get (correctly!) flagged by the cloud provider.
My guess is that most cloud providers have procedures in place to help avoid banning legitimate customers because one of their instances got infected with malware (effectively what a rogue agent would be).
I'm surprised to see so many people using containers when setting up a KVM is so easy, gives the most robust environment possible, and to my knowledge much has better isolation. A vanilla build of Linux plus your IDE of choice and you're off to the races.
Wrong takeaway, but claude code was released February of this year??? I swear people have been glazing it for way longer... my memory isn't that bad right?
This is great. I feel like most of the oxygen is going to go to the sandboxing question (fair enough). But I'm kind of obsessed with what agent loops for engineering tasks that aren't coding look like, and also the tweaks you need for agent loops that handle large amounts of anything (source code lines, raw metrics or oTel span data, whatever).
There was an interval where the notion of "context engineering" came into fashion, and we quickly dunked all over it (I don't blame anybody for that; "prompt engineering" seemed pretty cringe-y to me), but there's definitely something to the engineering problems of managing a fixed-size context window while iterating indefinitely through a complex problem, and there's all sorts of tricks for handling it.
On one hand, folks are complaining that even basic tasks are impossible with LLMs.
On the other hand we have cursed language [0] which was fully driven by AI and seems to be functional. Btw, how much did that cost?
I feel like I've been hugely successful tools like Claude Code, aider, open code. Especially when I can define custom tools. "You" have to be a part of the loop, in some capacity. To provide guidance and/or direction. I'm puzzled by the fact that people are surprised by this. When I'm working with other entities (people) who are not artificially intelligent, the majority of the time is spent clarifying requirements and aligning on goals. Why would it be different with LLMs?
One important issue with agentic loops is that agents are lazy, so you need some sort of retrigger mechanism. Claude code supports hooks, you can wire your agent stop hook to a local LLM, feed the context in and ask the model to prompt Claude to continue if needed. It works pretty well, Claude can override retriggers if it's REALLY sure it's done.
Regarding sandboxing, VMs are the way. Prompt injected agents WILL be able to escape containers 100%.
My mental model of container escapes is that they are security bugs which get patched when they are reported, and none of the mainstream, actively maintained container platforms currently have a known open escape bug.
That's going a bit far; a good mental model is that every kernel LPE is a sandbox escape (that's not precisely true but is to a first approximation), and kernel LPEs are pretty routine and rarely widely reported.
A good heuristic would be that unless you have reason to think you're a target, containers are a safe bet. A motivated attacker probably can pop most container configurations. Also, it can be counterintuitive what makes you a target:
* Large-scale cotenant work? Your target quotient is the sum of those of all your clients.
* Sharing infrastructure (including code supply chains) with people who are targeted? Similar story.
But for people just using Claude in YOLO mode, "security" is not really a top-of-mind concern for me, so much as "wrecking my dev machine".
This is not speculative, it's happened plenty already. People put mitigations in place, patch libraries and move on. The difference is that agents will find new zero days you've never heard of for stuff on your system people haven't scrutinized adequately. There will be zero advanced notice, and unlike human attackers who need to lie low until they can plan an exit, it'll be able to exploit you heavily right away.
Do not take the security impact of agents lightly!
I feel like my bona fides on this topic are pretty solid (without getting into my background on container vs. VM vs. runtime isolation) and: "the agents will find new zero days" also seems "big if true". I point `claude` at a shell inside a container and tell it "go find a zero day that breaks me out of this container", and you think I'm going to succeed at that?
I had assumed you were saying something more like "any attacker that prompt-injects you probably has a container escape in their back pocket they'll just stage through the prompt injection vector", but you apparently meant something way further out.
I know at least one person who supplements their income finding bounties with Claude Code.
Right now you can prompt inject an obfuscated payload that can trick claude into trying to root a system under the premise that you're trying to identify an attack vector on a test system to understand how you were compromised. It's not good enough to do much, but with the right prompts, better models and if you could smuggle extra code in, you could get quite far.
Lots of people find zero days with Claude Code. That is not the same thing as Claude Code autonomously finding zero days without direction, which was what you implied. This seems like a pretty simple thing to go empirically verify for yourself. Just boot up Claude and tell it to break out of a container shell. I'll wait here for your zero day! :)
Edit: to say more about my opinions, "agentic loop" could mean a few things -- it could mean the thing you say, or it could mean calling multiple individual agents in a loop ... whereas "agentic harness" evokes a sort of interface between the LLM and the digital outside world which mediates how the LLM embodies itself in that world. That latter thing is exactly what you're describing, as far as I can tell.
I like "agentic harness" too, but that's not the name of a skill.
"Designing agentic loops" describes a skill people need to develop. "Designing agentic harnesses" sounds more to me like you're designing a tool like Claude Code from scratch.
Plus "designing agentic loops" includes a reference to my preferred definition of the term "agent" itself - a thing that runs tools in a loop to achieve a goal.
Context engineering is about making sure you've stuffed the context with all of the necessary information - relevant library documentation and examples and suchlike.
Design the agentic loop is about picking the right tools to be provided to the model. The tool descriptions may go in the context but you also need to provide the right implementations of them.
Reason I felt like they are closely connected are because for designing tools for lets say coding agents, you have to be thoughful of context engineering.
Eg linear MCP is notorious for giving large JSONs which quickly fill up context and hard for model to understand. So tools need to be designed slightly differently for agents keeping context engineering in mind compared to how you design them for humans.
Context engineering feels like more central and first-principle approach of designing tools, agent loops.
They feel pretty closely connected. For instance: in an agent loop over a series of tool calls, which tool results should stay resident in the context, which should be summarized, which should be committed to a tool-searchable "memory", and which should be discarded? All context engineering questions and all kind of fundamental to the agent loop.
One thing I'm really fuzzy on is, if you're building a multi-model agent thingy (like, can drive with GPT5 or Sonnet), should you be thinking about context management tools like memory and autoediting as tools the agent provides, or should you be wrapping capabilities the underlying models offer? Memory is really easy to do in the agent code! But presumably Sonnet is better trained to use its own builtins.
It boils down to information loss in compaction driven by LLM's. Either you could carefully design tools that only give compacted output with high information density so models have to auto-compact or organize information only once in a while which eventually is going to be lossy.
Or you just give loads of information without thinking much about it, assuming models will have to do frequent compaction and memory organization and hope its not super lossy.
Right, just so I'm clear here: assume you decide your design should be using a memory tool. Should you make your own with a tool call interface or should you rely on a model feature for it, and how much of a difference does it make?
I wouldn't be surprised if agents start getting managed by a distributed agentic system - think about it. Right now you get codex/claude/etc... and it's system prompt and various other internally managed prompts are locked down to the version you downloaded. What if a distributed system ran experimental prompts and monitored the success rate (what code makes it into a commit) and provides feedback to the agent manager. That could help automatically fine tune it's own prompts.
This is what Anthropic does for their "High Compute" SWE benchmarking:
"
For our "high compute" numbers we adopt additional complexity and parallel test-time compute as follows:
- We sample multiple parallel attempts.
- We discard patches that break the visible regression tests in the repository, similar to the rejection sampling approach adopted by Agentless (Xia et al. 2024); note no hidden test information is used.
- We then use an internal scoring model to select the best candidate from the remaining attempts.
- This results in a score of 82.0% for Sonnet 4.5.
"
But running in parallel means you use more compute, meaning that the cost is higher. Good results are worth paying for, but if a super recursive approach costs more and takes longer than a human...