We, at NonBioS.ai [AI Software Dev], built something like this from scratch for Linux VM's, and it was a heavy lift. Could have used you guys if had known about it. But can see this being immediately useful at a ton of places.
We’re currently focused on macOS but planning to support Linux soon, so I’d love to hear more about your use case. Feel free to reach out at founders@trycua.com - always great to learn from others building in this space.
We covered this a fair bit on our blogs: - https://www.nonbios.ai/post/why-nonbios-chose-cloud-vms-for-... - https://www.nonbios.ai/post/private-linux-vms-for-every-nonb...
This is like an OS developer who has never heard of Linux.
I don’t know if this is a problem you’ve faced, but I’m curious: how do LLM tool devs handle authn/authz? Do host apps normally forward a token or something? Is there a standard commonly used? What if the tool needs some permissions to act on the user’s behalf?
I'm also working on a blog post that touches on this - particularly in the context of giving agents long-term and episodic memory. Should be out next week!
First time: it opened a MacOS VM and started to do stuff, but it got ahead of itself and starting typing things in the wrong place. So now that VM has a Finder window open, with a recent file that's called
plt.ylabel('Price(USD)').sh
The second and third times, it launched the VM but failed to do anything, showing these errors: INFO:cua:VM run response: None
INFO:cua:Waiting for VM to be ready...
INFO:cua:Waiting for VM macos-sequoia-cua_latest to be ready (timeout: 600s)...
INFO:cua:VM status changed to: stopped (after 0.0s)
DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
INFO:cua:VM status changed to: running (after 12.4s)
INFO:cua:VM macos-sequoia-cua_latest got IP address: 192.168.64.2 (after 12.4s)
INFO:cua:VM is ready with IP: 192.168.64.2
INFO:cua:Initializing interface for macos at 192.168.64.2
INFO:cua.interface:Logger set to INFO level
INFO:cua.interface.macos:Logger set to INFO level
INFO:cua:Connecting to WebSocket interface...
INFO:cua.interface.macos:Waiting for Computer API Server to be ready (timeout: 60s)...
INFO:cua.interface.macos:Attempting WebSocket connection to ws://192.168.64.2:8000/ws
WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 10.0s, attempts: 11)
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 20.0s, attempts: 21)
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 30.0s, attempts: 31)
WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 40.0s, attempts: 41)
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 50.1s, attempts: 51)
ERROR:cua.interface.macos:Could not connect to 192.168.64.2 after 60 seconds
ERROR:cua:Failed to connect to WebSocket interface
DEBUG:cua:Computer initialization took 76856.09ms
ERROR:agent.core.agent:Error in agent run method: Could not connect to WebSocket interface at 192.168.64.2:8000/ws: Could not connect to 192.168.64.2 after
60 seconds
WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
This was using the gradio interface, with the agent loop provider as OMNI and the model as gemma3:4b-it-q4_K_MThese versions:
cua-agent==0.1.29
cua-computer==0.1.23
cua-core==0.1.5
cua-som==0.1.3
Stay tuned - we're also releasing support for UI-Tars-1.5 7B this week! It offers excellent speed and accuracy, and best of all, it doesn't require bounding box detection (Omni) since it's a pixel-native model.
Feel free to ping me on Discord (I'm francesco there) - happy to hop on a quick call to help debug: https://discord.com/invite/mVnXXpdE85
I reckon I could run this for buying fashion drops, is this a use case y'all have seen?
I wanted to look at a Docker alternative to e2b
The LLM interacts with the VM through a structured virtual computer interface (cua-computer and cua-agent). It’s a high-level abstraction that lets the agent act (e.g., “open Terminal”, “type a command”, “focus an app”) and observe (e.g., current window, file system, OCR of the screen, active processes) in a way that feels a lot more like using a real computer than parsing raw data.
So under the hood, yes, screen+metadata are used (especially with the Omni loop and visual grounding), but what the model sees is a clean interface designed for agentic workflows - closer to how a human would think about using a computer.
If you're curious, the agent loops (OpenAI, Anthropic, Omni, UI-Tars) offer different ways of reasoning and grounding actions, depending on whether you're using cloud or local models.
https://github.com/trycua/cua/tree/main/libs/agent#agent-loo...
Second, as a user, you’d want to handle the case where some or all of these have been fully compromised. Surreptitiously, super-intelligently, and partially or fully autonomously, one container or many may have access to otherwise isolated networks within homes, corporate networks, or some device in a high security area with access to a nuclear weapons, biological weapons, the electrical grid, our water supply, our food supplies, manufacturing, or even some other key vulnerability we’ve discounted, like a toy.
While providing more isolation is good, there is no amount of caution that can prevent calamity when you give everyone a Pandora’s box. It’s like giving someone a bulletproof jacket to protect them from fox tapeworm cancer or hyper-intelligent, time-traveling, timespace-manipulating super-Ebola.
That said, it’s the world we live in now, where we’re in a race to our demise. So, thanks for the bulletproof jacket.
Agents seems exciting to us because have you ever tried getting an 80 year old man to figure out how to pay his town taxes online? Or how to register for some obscure permit?
We hope agents will be able to guide these users to some degree. So many users struggle with basic information and interfaces.
Picture this:
User walks up to kiosk. Wants to pay property tax bill. They have to study the kiosk/website homepage, sift through dozens or hundreds of options/menus/pages (or go through "wizards") to get to the right page for their issue. Then they have to figure out how to use that page!
These kiosks/websites usually support many functions, not just paying property tax.
So the user gets frustrated and says, "I just want to pay my property tax."
Enter the agent.
Anything that "improves access to public services" is what our customers are paying for. And we def see this as a viable option.
thank you e forza Cua
We're designing with that in mind: think fine-grained permissioning, auditability, and minimizing surface area. But it’s still early, and a lot of it depends on how teams end up using CUAs in practice.
- Open-source from the start. Cua’s built under an MIT license with the goal of making Computer-Use agents easy and accessible to build. Cua's Lume CLI was our first step - we needed fast, reproducible VMs with near-native performance to even make this possible.
- Native macOS support. As far as we know, we’re the only ones offering macOS VMs out of the box, built specifically for Computer-Use workflows. And you can control them with a PyAutoGUI-compatible SDK (cua-computer) - so things like click, type, scroll just work, without needing to deal with any inter-process communication.
- Not just the computer/sandbox, but the agent too. We’re also shipping an Agent SDK (cua-agent) that helps you build and run these workflows without having to stitch everything together yourself. It works out of the box with OpenAI and Anthropic models, UI-Tars, and basically any VLM if you’re using the OmniParser agent loop.
- Not limited to Linux. The hosted version we’re working on won’t be Linux-only - we’re going to support macOS and Windows too.
In the meantime, I’ll give this a shot on macOS tonight. Congrats!
Also, let us know on Discord once you’ve tried out c/ua locally on macOS: https://discord.com/invite/mVnXXpdE85
(I am not affiliated)
Also, is the project still active? No commits for 2 months is odd for a YC startup in current batch :)
https://news.ycombinator.com/threads?id=SkylerJi
https://news.ycombinator.com/threads?id=zwenbo
Here's what you guys need to understand:
(1) Not everyone spends hours on Hacker News—many casual users have no idea about the culture of this place re voting rings, booster comments, and so on.
(2) Many people enjoy congratulating their friends when they reach a major milestone.
(3) Other sites have a culture where this kind of thing is fine.
HN is different, of course, and we tell founders to stop this from happening. In fact, I basically yell it at them in the Launch HN guide: https://news.ycombinator.com/yli.html#noboost. I also yell it at them in person every chance I get—I do my best to scare them! But if you think that including something in a list of rules plus repeating it over and over in person is sufficient to get a message across, may I introduce you to the Measure Zero Effect: no matter how often you repeat something, the set of users who receive the message has measure zero (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...)
As it happens, I saw those comments in the thread (mostly the same ones you listed), marked them offtopic, and emailed the founders as soon as I could:
"Btw, did you send a message to batchmates/friends about this thread? I"m seeing a lot of booster comments in there now. This is not good for you! (See https://news.ycombinator.com/yli.html.)
Fortunately though, there are a lot of organic comments as well so I can just move the booster ones lower down and they shouldn't harm anything. Still, if you have a way to tell your friends not to do that, it would be good. Send them to https://news.ycombinator.com/yli.html as well, if you like :) - the text about that is repeated and in a bold font for a reason!"
They replied that their Discord was probably spreading word of the launch and they'd add a message asking people to stop. After that, it mostly stopped.
Seriously though, this kind of behavior should be considered a violation of the social contract.
Would love to chat sometime!
Feel free to join our Discord so we can chat more: https://discord.com/invite/mVnXXpdE85
Also built something on top of Browser Use (Nanobrowser) and Docker.
https://github.com/reindent/nanomachine
Just finished planning and shell capabilities
Lets chat @reindentai (X)