How to Give Your OpenAI Agent a Sandbox

The OpenAI Agents SDK runs an agent’s shell commands inside a sandbox — the isolated machine the agent drives through tools like exec_command. The SDK ships sandbox clients for a handful of providers; this guide writes one for Freestyle. Implement the SDK’s SandboxClient against a Freestyle VM and your agent gets a fresh, disposable Linux box — and, because it’s a real VM, any port it serves becomes a real HTTPS domain.

Requirements

A Freestyle API key — to create the VM and map domains.
An OpenAI API key — the agent loop calls a model (OPENAI_API_KEY, read from the environment). The sandbox client itself works without one; the run() loop doesn’t.
Node.js 20+.

Install

pnpm add @openai/agents freestyle

bun add @openai/agents freestyle

npm install @openai/agents freestyle

yarn add @openai/agents freestyle

export FREESTYLE_API_KEY="your-api-key"
export OPENAI_API_KEY="sk-..."

Implement a Freestyle Sandbox Client

The SDK’s SandboxClient is tiny: a backendId and a create() that returns a session. The session is where the work happens. For a shell-driven agent the runtime only needs three things from it: a state whose manifest it reads each turn, an execCommand() that the exec_command tool calls (it returns the string the model sees), and one teardown method (close() here) to dispose the VM. Everything else on the session interface is optional.

We add one more: resolveExposedPort(), which turns a VM port into a Freestyle domain. The SDK’s urlForExposedPort rebuilds the URL from { host, port, tls }, so returning { host: "<sub>.style.dev", port: 443, tls: true } yields exactly https://<sub>.style.dev/.

import { freestyle } from "freestyle";
import {
  Manifest,
  normalizeSandboxClientCreateArgs,
  getRecordedExposedPortEndpoint,
  recordExposedPortEndpoint,
} from "@openai/agents/sandbox";
import type {
  SandboxClient,
  SandboxClientCreateArgs,
  SandboxSession,
  SandboxSessionState,
  ExecCommandArgs,
  ExposedPortEndpoint,
} from "@openai/agents/sandbox";

type Vm = Awaited<ReturnType<typeof freestyle.vms.create>>["vm"];

// POSIX single-quote so paths with spaces survive the shell.
const sh = (s: string) => `'${s.replace(/'/g, `'\\''`)}'`;

// How long a single agent command may run before we give up on it. Freestyle's
// exec is blocking, so we don't implement the SDK's background-session model.
const EXEC_TIMEOUT_MS = 120_000;

export class FreestyleSandboxClient implements SandboxClient {
  readonly backendId = "freestyle";

  constructor(private readonly options: { snapshotId?: string } = {}) {}

  // The runtime calls create() with { manifest, options, ... }; users may also call
  // create(manifest). normalizeSandboxClientCreateArgs accepts both and always
  // hands back a real Manifest (the runtime dereferences session.state.manifest).
  async create(
    args?: SandboxClientCreateArgs | Manifest,
    manifestOptions?: Record<string, unknown>,
  ): Promise<SandboxSession> {
    const { manifest } = normalizeSandboxClientCreateArgs(args, manifestOptions);

    const { vm, vmId } = await freestyle.vms.create({ slug: "openai-sandbox", snapshotId: this.options.snapshotId });
    // Commands default to running in the manifest root (/workspace) — make sure it exists.
    await vm.exec(`mkdir -p ${sh(manifest.root)}`);

    return new FreestyleSandboxSession(vm, vmId, { manifest });
  }
}

class FreestyleSandboxSession implements SandboxSession {
  private closed = false;

  constructor(
    private readonly vm: Vm,
    private readonly vmId: string,
    readonly state: SandboxSessionState,
  ) {}

  // The shell capability's exec_command tool calls this; the returned string is
  // what the model sees. Run the command in the VM and report its exit code.
  async execCommand(args: ExecCommandArgs): Promise<string> {
    const workdir = args.workdir ?? this.state.manifest.root;
    const startedAt = Date.now();

    let exitCode = 0;
    let output = "";
    try {
      const res = await this.vm.exec({
        command: `cd ${sh(workdir)} && ${args.cmd}`,
        timeoutMs: EXEC_TIMEOUT_MS,
      });
      exitCode = res.statusCode ?? 0;
      output = [res.stdout, res.stderr].filter(Boolean).join("");
    } catch (err) {
      exitCode = 1;
      output = err instanceof Error ? err.message : String(err);
    }

    const wallTimeSeconds = (Date.now() - startedAt) / 1000;
    return [
      `Wall time: ${wallTimeSeconds.toFixed(4)} seconds`,
      `Process exited with code ${exitCode}`,
      "Output:",
      output,
    ].join("\n");
  }

  // The standout Freestyle hook: expose a port as a real HTTPS domain. The SDK's
  // urlForExposedPort turns { host, port: 443, tls: true } into https://<domain>/.
  async resolveExposedPort(port: number): Promise<ExposedPortEndpoint> {
    const cached = getRecordedExposedPortEndpoint(this.state, port);
    if (cached) return cached;

    const domain = `sandbox-${crypto.randomUUID().slice(0, 8)}.style.dev`;
    await freestyle.domains.mappings.create({ domain, vmId: this.vmId, vmPort: port });
    // Key the cache by the requested port (the endpoint's own port is 443).
    return recordExposedPortEndpoint(this.state, { host: domain, port: 443, tls: true }, port);
  }

  // The runtime tears a session down via stop/shutdown/delete/close — delete the VM.
  async close(): Promise<void> {
    if (this.closed) return;
    this.closed = true;
    await freestyle.vms.delete({ vmId: this.vmId });
  }
}

execCommand returns a string, not a structured result — the shell tool surfaces that text straight to the model, so we format the exit code and output into it. We don’t implement the SDK’s still-running/sessionId streaming model: vm.exec is blocking, so each command runs to completion. For long-lived processes (a dev server), start them with systemd-run and stream the journal over the PTY API, as in the other guides.

Try It Without an Agent

Drive the session directly — no LLM, no token spend — to confirm the Freestyle wiring before you run a model:

import { FreestyleSandboxClient } from "./freestyle-sandbox";
import { urlForExposedPort } from "@openai/agents/sandbox";

const client = new FreestyleSandboxClient();
const session = await client.create(); // boots a fresh VM

// Run a command — exactly what the agent's exec_command tool does.
console.log(await session.execCommand!({ cmd: "uname -a" }));

// Write and run a script in the workspace (/workspace is the default cwd).
await session.execCommand!({ cmd: "echo 'print(sum(range(10)))' > sum.py" });
console.log(await session.execCommand!({ cmd: "python3 sum.py" })); // → 45

// Start something on a port, then expose it as a real domain.
await session.execCommand!({
  cmd: "systemd-run --unit=app --working-directory=/workspace python3 -m http.server 8000",
});
const endpoint = await session.resolveExposedPort!(8000);
const url = urlForExposedPort(endpoint, "http"); // https://sandbox-xxxx.style.dev/
console.log(url, (await fetch(url)).status); // → https://… 200

await session.close!(); // deletes the VM

The methods are marked optional on the SandboxSession interface, so the ! just tells TypeScript our client always defines them.

Build the Computer-Use Agent

Give the client to a SandboxAgent and run() it. The shell() capability exposes the exec_command tool, so the model drives the VM — read the system, write a script, run it — to satisfy the prompt. The SDK creates the VM on the first turn and tears it down (calling close()) when the run ends.

import { run } from "@openai/agents";
import { SandboxAgent, shell } from "@openai/agents/sandbox";
import { FreestyleSandboxClient } from "./freestyle-sandbox";

const agent = new SandboxAgent({
  name: "computer-use",
  model: "gpt-5.5", // any model the SDK supports; set that provider's key
  instructions:
    "You operate a fresh Linux VM. Use exec_command to run shell commands — " +
    "inspect the system, install tools, write and run code — to accomplish the task.",
  capabilities: [shell()],
});

const result = await run(
  agent,
  "What OS, kernel, and CPU count does your machine have? Then write a Python " +
    "script that prints the first 10 primes and run it.",
  { sandbox: { client: new FreestyleSandboxClient() } },
);

console.log(result.finalOutput);

The model calls exec_command itself — uname/nproc, then writes a script and runs python3 — all inside the Freestyle VM, and summarizes what it found. Set OPENAI_API_KEY first; the SDK reads it from the environment.

Serve a Port on a Real Domain

resolveExposedPort is the part of this client that no local sandbox can offer: when the agent starts something listening on a port, you get a public HTTPS URL for it. Map the port, then build the URL with the SDK’s urlForExposedPort:

import { urlForExposedPort } from "@openai/agents/sandbox";

const session = await new FreestyleSandboxClient().create();

// …the agent (or your own code) starts a server on 5173…
await session.execCommand!({
  cmd: "systemd-run --unit=app --working-directory=/workspace python3 -m http.server 5173",
});

const endpoint = await session.resolveExposedPort!(5173);
console.log(urlForExposedPort(endpoint, "http")); // https://sandbox-xxxx.style.dev/
console.log(urlForExposedPort(endpoint, "ws"));   // wss://sandbox-xxxx.style.dev/

A *.style.dev subdomain needs no DNS or verification, and the mapping is cached per port — call resolveExposedPort(5173) again and you get the same domain. Expose a tool that hands the URL to the model and your agent can build a web app and link you straight to it.

Beyond the Basics

The session holds a real Freestyle VM, so capabilities beyond the SDK’s shell interface are a method away:

File edits via apply_patch — add a createEditor() to the session that returns an Editor (createFile / updateFile / deleteFile) and include filesystem() in capabilities. The model then edits files with the structured apply_patch tool instead of shell here-docs.
Snapshots & resume — implement the client’s serializeSessionState / resume backed by vm.snapshot() to persist an agent’s workspace across runs.
Fork & VPC — fork the prepared VM into parallel agent sandboxes, or boot it on a private VPC, exactly as in the Mastra guide. Both are plain Freestyle SDK calls on the VM your create() returns.