Skip to main content
This guide explains how a sandbox progresses from creation to shutdown, including the states it passes through, how to start and wait for it, and how to stop it cleanly. Use this guide when you need to choose between creation patterns, control startup timing, or handle shutdown reliably in your SDK code.

Lifecycle states

Every sandbox passes through a series of states as it starts, runs, and shuts down. The SDK represents these as SandboxStatus values.
StateMeaningTerminal?How wait() handles it
PENDINGStart accepted, waiting for backend schedulingNoPolls through
CREATINGBackend is provisioning the containerNoPolls through
RUNNINGContainer is up and accepting operationsNoReturns normally
TERMINATINGSandbox is draining through its grace period before exitNoReturns
COMPLETEDCommand exited (check returncode)YesReturns normally
FAILEDStartup or runtime errorYesRaises SandboxFailedError
UNSPECIFIEDBackend no longer tracking this sandboxYesTreated as completed, returns normally
PENDING and CREATING are transient. The SDK polls through them automatically. TERMINATING is a transient backend state that wait_until_complete() and stop().result() drive through to a terminal state. RUNNING is the stable operational state. COMPLETED and FAILED are terminal. UNSPECIFIED is mapped to COMPLETED by the SDK at poll time.

Creation patterns

You can create a sandbox in two ways, differing in when the start RPC fires. The following sections describe each pattern and when to choose it.

Sandbox.run(): immediate start

Sandbox.run() creates a sandbox and calls start().result() internally, blocking until the backend accepts the request:
from cwsandbox import Sandbox

# Returns after backend accepts the start request
sandbox = Sandbox.run("echo", "hello")

# sandbox_id is set immediately
print(sandbox.sandbox_id)  # "sandbox-abc123"

# But status may still be PENDING, not necessarily RUNNING yet
The first positional argument is the command, the rest are arguments:
# These are equivalent
Sandbox.run("echo", "hello", "world")
Sandbox.run(command="echo", args=["hello", "world"])

session.sandbox(): deferred start

session.sandbox() creates a sandbox object without making any network call. The start RPC fires on first use:
import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults()) as session:
    sandbox = session.sandbox()

    # No network call yet, sandbox_id is None
    print(sandbox.sandbox_id)  # None

    # First operation triggers start automatically
    result = sandbox.exec(["echo", "hello"]).result()

Main command lifetime

The command passed to run() or session.sandbox() is the sandbox’s main process. When it exits, the sandbox transitions to COMPLETED:
# This sandbox completes almost immediately because echo exits right away
sandbox = Sandbox.run("echo", "hi")
sandbox.wait()  # May already be COMPLETED

# For exec-based workflows, use a long-running main command
sandbox = Sandbox.run()
# Or rely on the default
sandbox = Sandbox.run()
If you need to run a short command and capture its output, use exec() on a long-running sandbox rather than making the short command the main process.

Start a sandbox

After a sandbox object exists, you can start it explicitly or rely on the SDK to start it on first use. The following sections describe both approaches and how context managers interact with them.

Explicit start

start() sends the start RPC and returns an OperationRef[None]:
sandbox = session.sandbox()

# Explicit start for error control
sandbox.start().result()  # Blocks until backend accepts
print(sandbox.sandbox_id)  # Now set
This is useful when you want to control timing or handle start errors separately from operation errors.

Auto-start

Most operations auto-start the sandbox if it hasn’t been started yet. In the common case, you create a sandbox and start using it.
MethodAuto-starts?Notes
exec()YesAlso waits for RUNNING before executing
read_file()YesAlso waits for RUNNING before reading
write_file()YesAlso waits for RUNNING before writing
wait()YesStarts, then polls until RUNNING, TERMINATING, or terminal
wait_until_complete()YesStarts, then polls until terminal
await sandboxYesStarts and waits for RUNNING
get_status()NoRequires sandbox_id. Call start() first
stop()NoNothing to stop if not started

Context managers and start

Context managers (with/async with) call start() on entry but do not call wait(). If you need the sandbox to be RUNNING before your first operation, call wait() explicitly:
with Sandbox.run() as sandbox:
    # start() has been called, but sandbox may still be PENDING/CREATING
    sandbox.wait()  # Now guaranteed RUNNING (or raises on failure)
    result = sandbox.exec(["echo", "ready"]).result()
In practice, this rarely matters because exec() waits for RUNNING internally. Explicit wait() is useful when you want to separate startup failures from operation failures.

Wait for a sandbox

The SDK provides two waiting methods depending on whether you need the sandbox to be ready for operations or to finish its main command. The following sections describe each method and how startup and operation timeouts relate.

wait(): block until RUNNING

wait() polls until the sandbox reaches RUNNING, TERMINATING, or a terminal state:
sandbox = Sandbox.run()
sandbox.wait()  # Blocks until RUNNING

# Check status, could be RUNNING or a terminal state like COMPLETED
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "hello"]).result()
The polling uses exponential backoff: starting at 0.2s intervals, scaling by 1.5x, capping at 2.0s. wait() returns self for method chaining:
result = Sandbox.run().wait().exec(["echo", "hello"]).result()
If the sandbox reaches a terminal state during startup, wait() handles it:
Terminal statewait() behavior
COMPLETEDReturns normally (sandbox finished before RUNNING)
UNSPECIFIEDReturns normally (treated as completed)
FAILEDRaises SandboxFailedError

wait_until_complete(): block until terminal

wait_until_complete() blocks until the sandbox reaches a terminal state, polling through TERMINATING automatically. Use this for sandboxes where the main command does the work:
sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=3600.0).result()
print(sandbox.returncode)  # 0 if training succeeded
The raise_on_termination parameter controls whether wait_until_complete() raises SandboxTerminatedError after this client called stop(). With the default raise_on_termination=True, the SDK raises. External stops and lifetime-exceeded events surface as COMPLETED without a distinct error, because the backend doesn’t yet provide termination reason metadata.
from cwsandbox import SandboxTerminatedError

# Default: raises after this client called stop()
try:
    sandbox.wait_until_complete().result()
except SandboxTerminatedError:
    print("This client stopped the sandbox")

# Suppress for graceful handling
sandbox.wait_until_complete(raise_on_termination=False).result()
print(f"Exit code: {sandbox.returncode}")

Timeout phases

Startup wait time and operation timeouts are separate phases:
start() --> [startup wait] --> RUNNING --> exec() --> [operation timeout]
  • Startup wait: Time spent in PENDING/CREATING before reaching RUNNING. Controlled by the timeout parameter on wait() or wait_until_complete(). Typically 30 to 60 seconds depending on backend scheduling.
  • Operation timeout: Time for an individual exec, read, or write. Controlled by timeout_seconds on exec(), or request_timeout_seconds in SandboxDefaults. Doesn’t include startup wait.

Operations and lifecycle

Operations like exec(), read_file(), and write_file() auto-start the sandbox if needed, then wait for RUNNING before proceeding:
# Session sandbox, not started yet
sandbox = session.sandbox()

# exec() handles everything: start, wait for RUNNING, then execute command
result = sandbox.exec(["echo", "hello"]).result()
The operation timeout (timeout_seconds) applies only after the sandbox is RUNNING. Startup time is not counted against it.

Stop a sandbox and end of life

This section covers how to shut down a sandbox, what happens after a stop, and when to use stop() versus delete().

stop()

stop() sends a stop request and returns OperationRef[None]. The sandbox transitions through TERMINATING (grace period draining) before reaching a terminal state. The returned OperationRef resolves when the backend confirms the terminal state, not only when the stop RPC succeeds:
sandbox.stop().result()  # Blocks until terminal
Parameters:
ParameterDefaultPurpose
graceful_shutdown_seconds10.0Grace period for processes to exit before force-stop
snapshot_on_stopFalseCapture sandbox filesystem state before shutdown
missing_okFalseReturn normally if sandbox already gone (instead of raising)
# Capture state for debugging
sandbox.stop(snapshot_on_stop=True).result()

# Idempotent cleanup: safe to call even if sandbox is already gone
sandbox.stop(missing_ok=True).result()

# Give processes more time to shut down
sandbox.stop(graceful_shutdown_seconds=30.0).result()
stop() handles in-flight starts: if a start is still being processed, it waits for start to complete before stopping. Concurrent or repeated calls to stop() share one stop operation and don’t issue duplicate stop RPCs. This makes repeated stop() calls safe and cheap.

Post-stop behavior

After stop() is called, the sandbox transitions through TERMINATING (the grace period draining state) and then reaches a terminal state (COMPLETED or FAILED). After stop() has been called, the sandbox is unusable. Further operations raise SandboxNotRunningError:
sandbox.stop().result()

# These will raise SandboxNotRunningError
sandbox.exec(["echo", "hello"])  # Raises
sandbox.read_file("/path")       # Raises
The status property is cached from the last API call. For fresh data before stopping, use get_status():
fresh_status = sandbox.get_status()
if fresh_status == SandboxStatus.RUNNING:
    sandbox.stop().result()

Context manager exit

Context managers call stop() automatically on exit:
with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
# stop() called here, even if an exception occurred inside the block
If an exception is in-flight, the context manager suppresses stop errors to avoid masking the original exception.

stop() compared to delete()

stop()delete()
TargetLive sandbox instanceSandbox by ID (class method)
PurposeGraceful shutdownPermanent removal or orphan cleanup
Requires instance?YesNo. Use Sandbox.delete(sandbox_id)
missing_okYesYes
Use stop() for sandboxes you’re actively using. Use delete() for cleanup of sandboxes discovered through Sandbox.list() or Sandbox.from_id():
# Stop a sandbox you created
sandbox.stop().result()

# Delete an orphan by ID
Sandbox.delete("sandbox-abc123", missing_ok=True).result()
See the Cleanup patterns guide for orphan management and batch cleanup strategies.

Under the hood

The SDK runs all gRPC operations on a background daemon thread with its own asyncio event loop. This design means:
  • The sync API (.result()) blocks the calling thread while the background loop handles the network call.
  • The async API (await) bridges to the same background loop, so both patterns use the same underlying implementation.
  • The background loop starts lazily on first use. gRPC channels are also created lazily.
  • Auto-start works by checking if sandbox_id is None before each operation and triggering start() if so.
  • On process exit, cleanup handlers (atexit + signal handlers) stop all sandboxes in registered sessions. A second Ctrl+C during cleanup forces immediate exit.
This architecture avoids cross-event-loop issues and works in Jupyter notebooks without nest_asyncio. See the Sync compared to async patterns guide for usage patterns.

Common patterns

The following examples show end-to-end patterns that combine the lifecycle steps described above.

Quick one-off

Run a command and get the result. Context manager handles cleanup:
from cwsandbox import Sandbox

with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
    print(result.stdout)

Controlled startup

Separate start errors from operation errors:
from cwsandbox import Sandbox, SandboxFailedError

sandbox = Sandbox.run()
try:
    sandbox.wait()
except SandboxFailedError:
    print("Failed to start. Check container image and resources")
    raise

result = sandbox.exec(["echo", "ready"]).result()
sandbox.stop().result()

Long-running sandbox

Wait for the main command to complete:
from cwsandbox import Sandbox

sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=7200.0).result()

if sandbox.returncode == 0:
    data = sandbox.read_file("/output/model.pt").result()

Reconnection

Reattach to a sandbox from a previous session or process:
from cwsandbox import Sandbox, SandboxStatus

sandbox = Sandbox.from_id("sandbox-abc123").result()

# from_id() fetches current status but does not start or verify the sandbox
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "reconnected"]).result()
else:
    print(f"Sandbox is {sandbox.status}, not RUNNING")

Parallel batch with session

Create multiple sandboxes and wait for results:
import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults(tags=("batch-job",))) as session:
    sandboxes = [session.sandbox() for _ in range(5)]

    processes = [
        sb.exec(["python", "-c", f"print({i} ** 2)"])
        for i, sb in enumerate(sandboxes)
    ]

    done, pending = cwsandbox.wait(processes)
    for p in done:
        print(p.result().stdout)
Last modified on May 29, 2026