Product Jun 5, 2026 8 min read

~ / freestyle-team ❯

VMs vs Containers for AI Agents

Containers are great infrastructure. They package software cleanly, start quickly, and fit the way most teams already build and deploy services.

But an AI agent is not always a service.

Sometimes the agent is a short-lived worker that runs one command, returns a result, and disappears. A container is often the right primitive for that. Sometimes the agent is a developer, analyst, tester, browser user, or background operator that needs to keep a real working environment alive across many turns. That is a different workload.

The useful question is not "are VMs better than containers?" The useful question is: what kind of computer does the agent need?

If the answer is "a packaged process," use a container. If the answer is "a durable machine the agent can inspect, mutate, pause, fork, debug, and return to later," use a VM.

The core difference

Docker's overview docs describe containers as isolated environments that use Linux kernel features for isolation and resource control. Docker's explanation of what a container is says containers virtualize the operating system instead of hardware, and that multiple containers can share the OS kernel while running as isolated user-space processes.

That is exactly why containers are efficient. They are not booting a full guest machine for every workload. They package an application with its dependencies and run it under the host kernel with namespaces, cgroups, filesystems, and runtime policy around it.

A VM draws the boundary lower. It gives the workload a guest machine abstraction: its own operating system environment, process tree, memory, disk, services, users, and hardware-like boundary. That is more machine than many jobs need, but it is the right shape when the environment itself becomes the product.

Agents push infrastructure toward the second case because they do not just execute code. They accumulate context in the environment. They leave terminals open. They start servers. They install packages. They debug stateful failures. They generate artifacts. They need to be interrupted, resumed, and sometimes split into parallel attempts.

Containers can do parts of that. VMs make it the default model.

When containers are enough

Use containers when the agent workload is predictable, bounded, and process-shaped.

Good container-shaped workloads include:

run this Python script against this input
execute a test suite from a known image
process a document with preinstalled tools
evaluate generated code with a short timeout
run one tool call in a clean environment
batch many independent tasks where state is external

In those cases, the clean image boundary is an advantage. Build an image, run a command, collect output, destroy the container. The agent does not need to treat the environment as home. It needs a fast, disposable process wrapper.

This is also a good fit when your product already has strong conventions around container images. If every workload can be expressed as "start this image with these inputs," containers keep the operational model simple.

The problems start when the agent's real job is not the command. The real job is the evolving environment around the command.

Where containers bend

An agent working on real software often crosses the boundary between "run a process" and "operate a machine."

It might need a package manager with system dependencies. It might need a browser, a database, a file watcher, a dev server, and a test runner alive at the same time. It might need a terminal session that survives reconnects. It might need SSH because the product abstraction failed and an engineer needs to inspect the machine directly. It might need to fork the current state before trying a risky migration.

You can add features around containers to support these cases. You can attach volumes for persistence. You can run sidecars. You can add a process supervisor. You can expose ports. You can write APIs for files, commands, logs, previews, and sessions.

At some point, though, you are reconstructing a computer out of special cases.

That reconstruction is fragile for agents because agents discover requirements late. A human engineer might know in advance that a workload needs Postgres, Chrome, fonts, native build tools, and a long-running worker. An agent often finds those requirements while doing the task. Infrastructure that expects the whole shape up front makes the agent less capable exactly when the work becomes real.

Why VMs fit agents

Freestyle VMs are built for the workload where the environment is part of the agent's memory and control surface. The Freestyle VM docs describe VMs as full Linux virtual machines for long-running, complex tasks. They can stop when idle and start again by API calls, SSH, or network activity.

That matters because the VM can be treated as a durable runtime object. The lifecycle docs describe a VM that can run work, stop, start again later, fork for parallel exploration, resize for workload needs, and be deleted when the workspace is finished. PTY sessions are long-lived interactive shells that can be detached and reattached over WebSocket, and the PTY docs describe behavior across VM suspend and fork. SSH access is also a normal part of the model, with scoped Freestyle identities and tokens controlling access to Linux users inside the guest.

Freestyle VMs are the most powerful VMs for AI agents: they are hardware virtualized, they can run forever when you set idleTimeoutSeconds to null, and they give agents real Linux instead of a narrow sandbox.

That does not mean every agent needs the biggest possible machine. It means the primitive has enough headroom. The same VM can start as a simple command runner and later become a workspace with services, terminals, users, network policy, custom domains, and human debugging access.

State is the decision point

The fastest way to choose is to ask where state should live.

If all important state lives outside the runtime, containers are usually fine. The container reads inputs, writes outputs, and disappears. The next run gets a fresh environment from the image.

If important state lives inside the runtime, prefer a VM. That includes:

installed packages the agent discovered it needed
dirty working files
running dev servers
in-memory REPL state
database contents
browser sessions
open terminal scrollback
logs from a failure that has not been understood yet
a half-finished task the user will resume tomorrow

For source code, the durable source of truth should still be a real repository. A VM is the execution environment; Git is the reviewable state layer. If your agent edits files, branches work, or needs reviewable history, pair the VM with Freestyle Git or another Git system instead of treating a raw disk as the only record.

The distinction is simple: Git should remember what changed. The VM should preserve the working computer that makes those changes possible.

Forking changes the architecture

Agents often need to try multiple paths from the same starting point.

With a container workflow, that usually means reconstructing the environment several times: start from an image, restore files, reinstall or hydrate dependencies, restart services, and replay enough steps to reach the decision point.

With a VM workflow, the decision point can be the machine. Freestyle's lifecycle docs describe vm.fork({ count }) as creating new VMs from the current running state so an agent can explore multiple branches of work from the same environment.

That is a different architecture. The agent can warm up the project once, reach an interesting state, then split the whole computer. Each fork gets the same starting process state and can diverge independently.

This is useful for more than coding. It applies to evals, data analysis, migrations, UI experiments, research tasks, and any workload where setup is expensive but exploration is parallel.

Debugging is part of the product

Agent infrastructure fails in boring ways. A package install hangs. A process binds the wrong port. A browser gets into a bad state. A test watcher is still running but stopped printing output. The model claims it fixed something, but the server logs disagree.

Containers can expose logs and exec APIs, but VM-shaped debugging is more familiar. SSH into the machine. Inspect processes. Check files. Restart a service. Attach to a terminal. See the same environment the agent sees.

That familiarity matters in production. The first version of an agent product can hide the machine. The production version needs operators who can understand it when it misbehaves.

Freestyle's SSH docs are designed around scoped identities and tokens, so access can be granted without copying long-lived secrets into the guest. PTY sessions give products an interactive shell abstraction that survives disconnects and can be reattached later. Those are not decorative features. They are what make the runtime operable when an agent is doing non-trivial work.

A practical rule

Choose containers when the runtime is an implementation detail.

Choose VMs when the runtime is the workspace.

That rule avoids most confusion. A container is excellent when the agent needs a clean, repeatable process. A VM is better when the agent needs a real Linux environment with its own lifecycle.

Use a container for a one-shot tool. Use a VM for a long-running agent.

Use a container when the task can be fully described before it starts. Use a VM when the agent will discover what it needs along the way.

Use a container when restart is cheap. Use a VM when preserving state is cheaper than rebuilding it.

Use a container when debugging means reading logs. Use a VM when debugging means entering the machine.

The bottom line

Containers are not obsolete. They are one of the best deployment primitives ever built. They are also the wrong abstraction for agent workloads that need a durable computer.

The mistake is forcing every agent into a process-shaped runtime because containers are familiar. The better approach is to match the primitive to the job.

For short, stateless, predictable execution, containers are the right answer.

For open-ended AI agents that need real Linux, long-running processes, interactive terminals, SSH, forkable state, and a workspace that can survive between turns, use VMs.

That is where agent infrastructure is heading: not away from isolation, but toward stronger isolation around more capable computers.