~ / freestyle-team ❯ The Best AI Sandbox for WebSocket Apps
The easiest AI sandbox to build is a command runner. Send code in, get stdout back, maybe collect a file artifact, then destroy the environment.
That shape is fine for snippets. It breaks down fast when the agent is building or operating a realtime app.
WebSocket apps are not just code that returns a value. They are live services with connected clients, ports, process state, logs, background workers, reconnect behavior, and usually a browser sitting on the other side. A coding agent that edits a chat app, collaborative editor, multiplayer prototype, live dashboard, terminal UI, or MCP-style gateway needs the sandbox to behave like a real machine. It has to run the server, expose the port, keep the connection path stable, and leave enough state behind for the next agent turn to make sense.
That is why the best AI sandbox for WebSocket apps is not a narrow interpreter. It is a VM. Freestyle VMs are the most powerful VMs for AI agents: hardware-virtualized real Linux machines that can run forever when configured without idle timeout, while still giving your product API control over exec, files, PTYs, domains, stop, start, resize, fork, and delete.
The WebSocket sandbox test
If you want to know whether a sandbox is ready for serious agent work, ask it to run a WebSocket app and keep it useful after the first demo.
The test is simple:
- start a backend that listens on a port
- expose it on a real HTTPS origin
- connect a browser client over
wss:// - keep the process alive while the agent edits files
- stream logs while clients are connected
- restart the service without losing the workspace
- debug the same machine when the app misbehaves
- preserve enough state that the next turn does not begin from zero
This is the normal loop for realtime software. The agent changes a handler, refreshes a browser, watches connection errors, tails logs, checks environment variables, restarts the server, and tries again.
A one-shot sandbox can run node server.js. A real agent sandbox has to support the operational loop around that server.
Realtime apps are services, not results
The key distinction is that a WebSocket server is a service. It does not fit neatly into a request-response execution model.
When an agent runs a Python script that calculates a chart, the result can be a file. When it runs a test, the result can be a status code. When it runs a WebSocket app, the result is a running process that accepts future traffic. The process is the thing the user cares about.
That means the sandbox needs normal Linux behavior:
- long-lived processes
- ports that can receive public traffic
- logs that can be tailed while the process runs
- package managers and native dependencies
- environment files and config
- restart behavior under
systemdor another supervisor - a terminal that can stay attached to the work
- a filesystem that the agent can inspect and mutate
Freestyle's VM docs start from that model. A VM is a full Linux virtual machine for long-running, complex tasks. You can create one with the SDK, run commands with vm.exec(), read and write files through vm.fs, resize it for heavier workloads, and route web traffic to a service running on a VM port.
That is the right primitive for WebSocket apps because the app is not an artifact. It is a live program inside an environment.
Ports should become real URLs
For realtime products, previews are not optional. The agent needs to hand a user, browser automation tool, webhook source, or test harness a URL that reaches the app.
Freestyle VM domains route public HTTPS traffic from a hostname to a port inside a VM. The documented flow is ordinary infrastructure: verify the domain, point DNS at Freestyle, map the hostname to the VM port, and run a service inside the VM that listens on that port. For previews, every account can use *.style.dev subdomains without DNS setup or verification, and HTTPS is provisioned automatically.
That matters for WebSocket apps because wss:// is the secure WebSocket form of the same public service boundary. The Freestyle docs show exposed ports being turned into HTTPS URLs and WebSocket URLs, and the OpenClaw guide describes an agent gateway served through a Freestyle domain with HTTP and WebSocket proxied directly.
The agent should not have to invent a tunnel every time it starts a dev server. It should start the service on 0.0.0.0, your product should map a domain to the VM port, and the client should connect to the resulting origin.
That keeps the preview model simple: if the app runs in the VM, the URL points at the VM.
Terminals matter more when sockets stay open
WebSocket bugs are often timing bugs, lifecycle bugs, or process bugs. The failure is not always visible in a final command result.
The client connects and immediately disconnects. The server accepts the connection but never emits the first event. A hot reload closes every socket. A reverse proxy sends the wrong upgrade headers. A package manager leaves the dev server in a bad state. A browser tab is connected to an old build. A database listener works locally but fails in the sandbox.
Agents need a terminal for this work. Not just command history, but a real interactive process they can return to while the app is still running.
Freestyle PTY sessions are long-lived interactive shells inside a VM. The docs describe sessions that can be opened, detached, and reattached over a WebSocket. They survive client disconnects, VM suspends, and VM forks. They are backed by a real pseudo-terminal, so shells, REPLs, package managers, debuggers, terminal UIs, file watchers, and log tails behave like they do on Linux.
That is especially useful for realtime apps. One PTY can run npm run dev. Another can tail journalctl. Another can run a small WebSocket client. If the browser disconnects from your product UI, the underlying terminal session does not have to disappear with it.
The runtime should keep the work alive. The frontend should only be one view into it.
State is part of the protocol
Realtime apps are stateful even when the code looks simple.
A WebSocket connection has identity. It has subscription state. It may hold a cursor, room membership, auth context, buffered messages, or a negotiated protocol version. The server may depend on Redis, Postgres, a local queue, a file watcher, or a background worker. The useful debugging evidence is distributed across all of that state.
This is where fake computers get expensive. If every agent turn starts from a clean process, the product has to reconstruct the world around the bug. If the sandbox only preserves files, the agent loses the running context. If logs are only attached to a single command invocation, the user cannot inspect what happened after the command returned.
A VM gives you the simpler rule: keep the machine as the unit of state.
Freestyle's lifecycle docs describe VMs as durable runtime objects. Your application can start work, stop it, start it later, fork it for parallel exploration, and delete it when the workspace is finished. Disk state survives a stop. A running VM can stay running when the workload should continue. Forking creates a new VM from the current running state, which is valuable when an agent wants to try a risky protocol change without destroying the working connection path.
Important source code should still live outside the runtime. When an agent changes application code, keep branches, diffs, repos, and review in Freestyle Git or another source-control system. The VM is where the realtime app runs. Git is where the code becomes reviewable.
A practical WebSocket agent architecture
The architecture does not need to be exotic.
Create a VM for the workspace. Clone or write the app into it. Install dependencies with the project's normal tools. Run the WebSocket server under systemd, a process manager, or a PTY depending on whether it is production-like or actively being developed. Map a *.style.dev preview domain to the server port. Give the browser agent or test harness that URL. Keep a terminal session open for logs and interactive debugging.
For development loops, the agent can edit files, restart the service, and reconnect the client. For user-facing previews, the app keeps the same URL while the underlying VM holds the project state. For long-running sessions, set the idle timeout according to the product: reclaim idle machines when the session is disposable, or configure the VM to stay running when connected clients and background work matter.
For risky changes, fork before the agent edits protocol code. One fork can try a new message schema. Another can change the persistence layer. Another can update dependencies. Each fork starts from the same running context, but its processes and files diverge after the fork.
That is the agent-native version of a preview environment. It is not just a build artifact. It is a full computer with a URL.
What to look for
If you are choosing an AI sandbox for WebSocket apps, evaluate it against the parts of the workflow that happen after the first command succeeds.
Can it run a real Linux service that listens on a port? Can the service bind to 0.0.0.0 and receive public HTTPS traffic? Can the same origin support browser clients that expect secure WebSockets? Can the agent keep terminals and log tails alive across frontend reconnects? Can you restart, resize, stop, start, and delete the environment through an API? Can you fork the running workspace before a risky change? Can a human debug the same environment when the agent gets stuck?
Those questions are more important than whether the first snippet runs quickly. WebSocket apps are built out of ongoing behavior. The sandbox has to preserve enough of that behavior to be useful.
The bottom line
The best AI sandbox for WebSocket apps is a real VM because realtime software needs a real runtime. It needs processes, ports, terminals, logs, package managers, files, service restarts, previews, and durable state.
Freestyle VMs give agent products that runtime as an API-controlled Linux machine. Start the app inside the VM, expose the port on a real domain, keep the terminal and logs available, and let the agent work in the same environment the user can inspect.
For code snippets, a command runner can be enough. For WebSocket apps, choose the computer.

