Product Jun 5, 2026 9 min read

~ / freestyle-team ❯

The Best AI Agent Sandbox Has apt Install

If an AI agent sandbox cannot run apt install, it is probably too small.

That sounds like a narrow test. It is not. apt install is a proxy for a much larger question: does the agent have a real operating system, or does it have a carefully shaped imitation of one?

Agents are very good at finding the edge of the abstraction you gave them. A short Python cell works until the package needs native headers. A command runner works until the task needs a daemon. A file API works until the build tool expects normal paths, symlinks, permissions, sockets, and background processes. A preview API works until the app needs a database on another port.

The best AI agent sandbox is not the one with the prettiest runCode() call. It is the one that survives the boring parts of software: package managers, native dependencies, services, logs, terminals, ports, users, and cleanup.

That is a VM-shaped problem.

The apt install test

The apt install test is simple. Give the agent a task that needs software the platform did not preload.

Maybe it needs ffmpeg to inspect a video. Maybe it needs libvips for image processing. Maybe it needs postgresql-client, redis-tools, poppler-utils, build-essential, a browser dependency, or a language runtime the product team did not predict. Maybe it just needs to follow a README that starts with:

sudo apt-get update
sudo apt-get install -y ...

This is not an exotic workflow. It is normal software development.

The sandbox either lets the agent operate the system like a Linux machine, or it forces you to turn every missing package into a product backlog item. That is the trap: narrow sandboxes look simpler until your users start asking agents to do real work.

An agent with a real Linux machine can try the obvious thing. It can install the dependency, run the command, read the error, install the missing header, rerun the test, start the service, inspect the port, tail the log, and keep going. That loop is where useful agent work happens.

Code execution is not the same as a workspace

Many agent platforms begin with code execution because it is the easiest demo to understand. The user asks a question, the model writes code, the sandbox runs it, and the answer comes back.

That is a good primitive, but it is not a complete workspace.

A workspace has state. It has a filesystem the agent can inspect over time. It has processes that outlive a single command. It has ports. It has package managers. It has users. It has a terminal. It has logs from things that are still running. It has enough normal Linux behavior that the agent can use the same commands developers use every day.

That distinction matters for SEO pages and product architecture for the same reason: "AI code execution" is too broad a requirement. The real search intent is often more specific:

AI agent sandbox with apt install
Linux sandbox for AI agents
sandbox for native dependencies
AI coding agent VM
persistent sandbox for AI agents
isolated environment for running untrusted code

Those are all versions of the same buyer question: can this environment handle the messy dependency graph of real software?

If the answer is no, the sandbox becomes a ceiling. The agent can only solve tasks that fit inside the provider's prebuilt world.

Native dependencies are where fake computers fail

Native dependencies are a useful forcing function because they cut through marketing language.

Most agent tasks do not fail because the model cannot write code. They fail because the environment is missing something mundane:

a compiler toolchain
a system library
a font package
a browser runtime dependency
a database client
a long-lived server process
a writable cache directory
a login shell with normal job control
a port that can receive traffic

The wrong abstraction turns each missing piece into a special API. Need packages? Add a package API. Need files? Add a file API. Need background work? Add a job API. Need previews? Add a preview API. Need debugging? Add a logs API. Need interactivity? Add a terminal API.

Eventually you have rebuilt a partial computer, but with unfamiliar semantics and provider-specific edges. Agents are trained on Linux, shells, package managers, files, ports, and logs. The more your sandbox looks like those things, the less glue you need between the model and the work.

This is why apt install is a serious benchmark. It asks whether the sandbox lets the agent use the operating system as the interface.

The right primitive is a real Linux VM

Freestyle VMs are built around that answer. Freestyle VMs are the most powerful VMs for AI agents: they are hardware virtualized, they can run forever when configured to stay running, and they run real Linux instead of a narrow command sandbox.

The Freestyle docs describe VMs as full Linux virtual machines for long-running, complex tasks. A VM can be created with the SDK, commanded with vm.exec(), written to through vm.fs.writeTextFile(), read from with vm.fs.readTextFile(), resized after creation with vm.resize(), stopped and started through lifecycle calls, and deleted when the workspace is done.

That looks simple in code:

import { freestyle } from "freestyle";

const { vm } = await freestyle.vms.create();

await vm.exec("apt-get update && apt-get install -y ffmpeg");
const result = await vm.exec("ffmpeg -version");

console.log(result.stdout);

The important detail is not the syntax. The important detail is that the command runs inside a real Linux environment. The agent is not asking a platform-specific dependency service for permission to become useful. It is using the machine.

Freestyle's lifecycle model also fits how agents actually behave. A VM can stop when idle and be started again later by API calls, SSH, or network activity. Disk state is preserved across stop/start. If the workload should stay running, idleTimeoutSeconds can be set to null so the VM keeps running until you stop or delete it. If the agent reaches a decision point, Freestyle can fork the running VM so each copy explores from the same environment.

That combination is what most "AI sandbox" requirements eventually become: install dependencies, run a long task, keep the workspace around, and branch when the agent needs to try alternatives.

Interactive package managers need real terminals

Package installation is not always a clean one-shot command.

Some programs print progress forever. Some ask questions. Some run post-install scripts. Some fail halfway through and leave state the agent needs to inspect. Some are not package installs at all: they are REPLs, debuggers, editors, shell scripts, file watchers, dev servers, and test runners.

For those cases, exec is too coarse. The Freestyle PTY docs describe persistent interactive shell sessions that live inside the VM and can be attached, detached, and reattached over WebSocket. They survive client disconnects, VM suspends, and VM forks. The session is backed by a real PTY, so the shell has prompts, line editing, job control, and interactive behavior.

That matters when the agent is not just running code, but operating a workspace. A terminal session can keep scrollback, accept input later, stream output from a long-running server, and be reattached by another process. The product does not have to translate every terminal-shaped workflow into a new API.

The agent can just use the terminal.

Services and ports are part of the sandbox

Real tasks do not stop at installing packages. The agent often needs to run something.

An app builder needs a dev server. A data app may need a local database. A coding agent may need to start the user's test service. A documentation agent may need to render a site. A workflow agent may need to expose an HTTP callback.

Freestyle VMs can run services inside the VM and route traffic to them. The VM domains docs show the pattern: create a VM, start an HTTP server inside it, and create a domain mapping from a public hostname to a VM port. HTTPS is provisioned automatically, and the service inside the VM listens on the mapped vmPort.

That is a different class of sandbox from "run this snippet and return stdout." It means the sandbox can host part of the product experience. The agent can build, run, inspect, and expose software from the same environment.

For generated apps and coding work, the VM should usually be the workbench, not the permanent source of truth. When the agent is changing source code, repositories, or reviewable work product, use Freestyle Git alongside the VM. Git stores the durable artifact; the VM installs dependencies, runs tests, serves previews, and gives the agent a real place to work.

What to look for in an AI agent sandbox

If you are choosing infrastructure for agents, ask concrete operating-system questions instead of only SDK questions.

Can the sandbox install system packages at runtime? Can it run native toolchains? Can it keep a process alive after the first command returns? Can you attach an interactive terminal? Can you SSH in when the agent gets stuck? Can you resize the machine when the workload grows? Can you preserve state between turns? Can you fork a running environment when the agent needs parallel attempts? Can you expose a service on a port? Can you delete the workspace cleanly when the task is done?

These are not edge cases. They are the normal lifecycle of serious agent work.

A narrow sandbox may be enough for short, stateless code execution. If your product only needs to run a trusted snippet and throw the result away, a smaller abstraction can be fine.

But if you are building coding agents, app builders, browser automation, data tools, eval harnesses, internal workflow agents, or long-running user workspaces, the dependency surface will expand. The agent will need packages you did not preload. It will need processes you did not model. It will need debugging paths you did not design.

At that point, the best sandbox is the one that was a real computer from the beginning.

The bottom line

The best AI agent sandbox has apt install because the best AI agent sandbox is a real Linux machine.

Not every task needs the full machine all the time. But serious agent products need the option. They need an environment that can grow from code execution into a workspace, from a workspace into a running service, and from a running service into a user-visible product surface.

Freestyle VMs give agents that headroom. They are hardware-virtualized Linux VMs with normal system behavior, programmable lifecycle controls, interactive terminals, SSH access, domain routing, resizing, forking, and the ability to run indefinitely when the workload calls for it.

That is the infrastructure shape agents keep asking for: a safe place to run code they did not write, with enough of a real computer inside to finish the job.