If the machine can fork itself, it could allow for some really neat auto-forking workflows where you fuzz the UI testing of a website by forking at every decision point. I forget the name of the recent model that used only video as its latent space to control computers and cars, but they had an impressive demo where they fuzzed a bank interface by doing this, and it ended up with an impressive number of permutations of reachable UI states.
Self hosting is a valuable feature but our technology is unfriendly to small nodes — it will not work on consumer hardware. Many of the optimizations we spend our time on only seriously kick in above 2TB of storage and above 500GB of RAM.
Daytona runs on Sysbox (https://github.com/nestybox/sysbox) which is VM-like but when you run low level things it has issues.
Modal is the only provider with GPU support.
I haven't played around with Blaxel personally yet.
E2B/Vercel are both great hardware virtualized "sandboxes"
Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.
The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.
The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.
You can handroll a lot with: https://github.com/nestybox/sysbox?tab=readme-ov-file https://gvisor.dev https://github.com/containers/bubblewrap?tab=readme-ov-file
For hardware virtualized machines it much harder but you can do it via: https://github.com/firecracker-microvm/firecracker/ https://github.com/cloud-hypervisor/cloud-hypervisor
Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.
The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).
I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.
https://GitHub.com/jgbrwn/vibebin
Also I'm a huge proponent of exe.dev
Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!
Still WIP, but the core works — three rootfs tiers (minimal Ubuntu, headless Chromium with CDP, Docker-in-VM), OCI image support (pull any Docker image), automatic thermal management (idle VMs pause then snapshot to disk, wake transparently on next API call), per-user bridge networking with L2 isolation, named checkpoints, persistent volumes, and preview URLs with auto-wake.
Fair warning: the website is too technical and the docs are mostly AI-generated, both being actively reworked. But I've been running it daily on a Hetzner server for my AI agents' browser automation, and deploy previews.
I'd love any feedback if you want to go ahead and try it yourself
is the experience similar? can i just get console to one machine, work for a bit, logout. come back later, continue?
how does i cost work if i log into a machine and do nothing on it? just hold the connection.
We do auto suspend depending on your configured timeout. We'll pause your VM and when you come back the processes will be in the exact same state as when you left.
Thats why our pricing is usage based and we have a much larger API surface.
When I’m thinking of sandboxes, I’m thinking of isolated execution environments.
What does forking sandboxes bring me? What do your sandboxes in general bring me?
Please take this in the best possible way: I’m missing a use case example that’s not abstract and/or small. What’s the end goal here(
When your coding agent has 10 ideas for what to do, to evaluate them correctly it needs to be able to evaluate them in isolation.
If you're building a website testing agent and halfway down a website, with a form half filled out a session ongoing, etc and it realizes it wants to test 2 things in isolation, forking is the only way.
We also envision this powering the next generation of devcycles "AI Agent, go try these 10 things and tell me which works best". AI forks the environment 10 times, gets 10 exact copies, does the thing in each of them, evaluates it, then takes the best option.
You have to change the branch on each fork individually currently and thats unlikely to change in the short term due to the complexity of git internals, but its not that hard to do yourself `git checkout -b fork-{whateverDiscriminator}`
The work of a developer is open ended, so we use a computer for it. We don't try to box developers into small granular screwdrivers for each small thing.
Thats whats coming to all agents, they might want to run some analysis with python, want to generate a website/document in typescript, and might want to store data in markdown files or in MongoDB. I expect them to get much more autonomous and with that to end up just needing computers like us.
The cost argument for owning the hardware for this specific use case also makes sense, considering the scale these agent environments will demand. Also worth noting, sandboxes are effectively an open attack surface; architecting them not to be in your main VPC is a sound security decision from the start.
We ended up creating localsandbox [0] with that in mind by using AgentFS for filesystem snapshotting, but our solution is meant for a different use case than Freestyle - simpler FS + code execution for agents all done locally. Since we're not running a full OS it's much less capable but also simpler for lots of use cases where we want the agent execution to happen locally.
The ability to fork is really interesting - the main use case I could imagine is for conversations that the user forks or parallel sub-agents. Have you seen other use cases?
The memory forking seems like a cool technical achievement, but I don't understand how it benefits me as a user. If I'm delegating the whole thing to the AI anyway, I care more about deterministic builds so that the AI can tackle the problem.
The memory forking was originally invented because for AI App Builders and first response driven applications its extremely important that they are instant (difference between running bun dev and the dev server already being running).
However its much more generally applicable, Postgres is a great example of this. You can't fork the filesystem under postgres and get consistency. Same thing with a browser state, a weird server state, or anything that exists in memory. The memory forking gives a huge performance boost while snapshotting whats actually going on at one instant.
> we mean forking the whole memory of it
How does this work? Are you copying the entire snapshot, or is this something fancy like copy-on-write memory? If it's the former, doesn't the fork time depend on the size of the machine?Creating snapshots takes a 2-4 second interruption in the VM due to sheer IO that we didn't want here.
Whats especially cool about this approach is not only is fork time O(1) with respect to machine size, but its also O(1) with respect to the amount of forks.
That said, our $50 a month plan can be used as an individual for your coding agents, but I wouldn't recommend it.
And you can go even below that by self-hosting it yourself with a very cheap Hetzner box for $2 or $5.
Congrats on the launch. This is cool tech
> Freestyle is the only sandbox provider with built-in multi-tenant git hosting — create thousands of repos via API and pair them directly with sandboxes for seamless code management. On top of that, Freestyle VMs are full Linux virtual machines with nested virtualization, systemd, and a complete networking stack, not containers.
It makes me think of the git automation around rigs in Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
Edit: I realize the Loom is a way to look at it. Loom interrupted me twice and I almost skipped it. However it gave me a better idea of what it does, it "invents" snapshotting and restoring of VMs in a way that appears faster. That actually makes sense and I know it isn't that hard to do with how VMs work and that it greatly benefits from having only part of the VM writable and having little memory used (maybe it has read-only memory too?).
Git is useful for branching vs forking (IE you can't merge two VM forks back together), but all the tech I showed in the Loom exists independently from Git.
The hard part of it was making the VM large and powerful while making snapshotting/forking instant, which required a lot of custom VMM work.
We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!
However, if you don't put your administrative credentials inside of the VM and treat it as an unsafe environment you can safely give it minimal permissions to access specific things that it needs and using that access it can perform complex tasks.
https://simonwillison.net/2024/Mar/5/prompt-injection-jailbr...
It is a very necessary building block for many common features that can be steered in a more deterministic way, e.g. "code interpreter" feature for data analysis or file creation like commonly seen in chat web UIs.
But like I see multiple sandbox for agents products a week. Way too saturated of a market
With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.
We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.
Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.
This means that while complex protocol connections like remote Postgres can break in the forks, stuff like Websockets just automatically reconnects.