Sandbox lifecycle - CoreWeave Docs

This guide covers sandbox states, creation patterns, waiting, and shutdown behavior.

Lifecycle states

Every sandbox passes through a series of states. The SDK represents these as SandboxStatus values.

State	Meaning	Terminal?	How `wait()` handles it
`PENDING`	Start accepted, waiting for backend scheduling	No	Polls through
`CREATING`	Backend is provisioning the container	No	Polls through
`RUNNING`	Container is up and accepting operations	No	Returns normally
`TERMINATING`	Sandbox is draining through its grace period before exit	No	Returns
`COMPLETED`	Command exited (check `returncode`)	Yes	Returns normally
`FAILED`	Startup or runtime error	Yes	Raises `SandboxFailedError`
`UNSPECIFIED`	Backend no longer tracking this sandbox	Yes	Treated as completed, returns normally

PENDING and CREATING are transient - the SDK polls through them automatically. TERMINATING is a transient backend state that wait_until_complete() and stop().result() drive through to a terminal state. RUNNING is the stable operational state. COMPLETED and FAILED are terminal. UNSPECIFIED is mapped to COMPLETED by the SDK at poll time.

Creation patterns

There are two ways to create a sandbox, differing in when the start RPC fires.

Sandbox.run() - immediate start

Sandbox.run() creates a sandbox and calls start().result() internally, blocking until the backend accepts the request:

from cwsandbox import Sandbox

# Returns after backend accepts the start request
sandbox = Sandbox.run("echo", "hello")

# sandbox_id is set immediately
print(sandbox.sandbox_id)  # "sandbox-abc123"

# But status may still be PENDING - not necessarily RUNNING yet

The first positional argument is the command, the rest are arguments:

# These are equivalent
Sandbox.run("echo", "hello", "world")
Sandbox.run(command="echo", args=["hello", "world"])

session.sandbox() - deferred start

session.sandbox() creates a sandbox object without making any network call. The start RPC fires on first use:

import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults()) as session:
    sandbox = session.sandbox()

    # No network call yet - sandbox_id is None
    print(sandbox.sandbox_id)  # None

    # First operation triggers start automatically
    result = sandbox.exec(["echo", "hello"]).result()

Main command lifetime

The command passed to run() or session.sandbox() is the sandbox’s main process. When it exits, the sandbox transitions to COMPLETED:

# This sandbox completes almost immediately - echo exits right away
sandbox = Sandbox.run("echo", "hi")
sandbox.wait()  # May already be COMPLETED

# For exec-based workflows, use a long-running main command
sandbox = Sandbox.run()
# Or rely on the default
sandbox = Sandbox.run()

If you need to run a short command and capture its output, use exec() on a long-running sandbox rather than making the short command the main process.

Starting

Explicit start

start() sends the start RPC and returns an OperationRef[None]:

sandbox = session.sandbox()

# Explicit start for error control
sandbox.start().result()  # Blocks until backend accepts
print(sandbox.sandbox_id)  # Now set

This is useful when you want to control timing or handle start errors separately from operation errors.

Auto-start

Most operations auto-start the sandbox if it hasn’t been started yet. This makes the common case simple: create a sandbox, start using it.

Method	Auto-starts?	Notes
`exec()`	Yes	Also waits for RUNNING before executing
`read_file()`	Yes	Also waits for RUNNING before reading
`write_file()`	Yes	Also waits for RUNNING before writing
`wait()`	Yes	Starts, then polls until RUNNING, TERMINATING, or terminal
`wait_until_complete()`	Yes	Starts, then polls until terminal
`await sandbox`	Yes	Starts and waits for RUNNING
`get_status()`	No	Requires sandbox_id - call `start()` first
`stop()`	No	Nothing to stop if not started

Context managers and start

Context managers (with/async with) call start() on entry but do not call wait(). If you need the sandbox to be RUNNING before your first operation, call wait() explicitly:

with Sandbox.run() as sandbox:
    # start() has been called, but sandbox may still be PENDING/CREATING
    sandbox.wait()  # Now guaranteed RUNNING (or raises on failure)
    result = sandbox.exec(["echo", "ready"]).result()

In practice, this rarely matters because exec() waits for RUNNING internally. Explicit wait() is useful when you want to separate startup failures from operation failures.

Waiting

wait() - block until RUNNING

wait() polls until the sandbox reaches RUNNING, TERMINATING, or a terminal state:

sandbox = Sandbox.run()
sandbox.wait()  # Blocks until RUNNING

# Check status - could be RUNNING or a terminal state like COMPLETED
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "hello"]).result()

The polling uses exponential backoff: starting at 0.2s intervals, scaling by 1.5x, capping at 2.0s. wait() returns self for method chaining:

result = Sandbox.run().wait().exec(["echo", "hello"]).result()

If the sandbox reaches a terminal state during startup, wait() handles it:

Terminal state	wait() behavior
`COMPLETED`	Returns normally (sandbox finished before RUNNING)
`UNSPECIFIED`	Returns normally (treated as completed)
`FAILED`	Raises `SandboxFailedError`

wait_until_complete() - block until terminal

wait_until_complete() blocks until the sandbox reaches a terminal state, polling through TERMINATING automatically. Use this for sandboxes where the main command does the work:

sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=3600.0).result()
print(sandbox.returncode)  # 0 if training succeeded

The raise_on_termination parameter controls whether wait_until_complete() raises SandboxTerminatedError after this client called stop(). With the default raise_on_termination=True, the SDK raises. External kills and lifetime-exceeded events surface as COMPLETED without a distinct error, because the backend does not yet provide termination reason metadata.

from cwsandbox import SandboxTerminatedError

# Default: raises after this client called stop()
try:
    sandbox.wait_until_complete().result()
except SandboxTerminatedError:
    print("This client stopped the sandbox")

# Suppress for graceful handling
sandbox.wait_until_complete(raise_on_termination=False).result()
print(f"Exit code: {sandbox.returncode}")

Timeout phases

Startup wait time and operation timeouts are separate phases:

start() --> [startup wait] --> RUNNING --> exec() --> [operation timeout]

Startup wait: Time spent in PENDING/CREATING before reaching RUNNING. Controlled by the timeout parameter on wait() or wait_until_complete(). Typically 30-60 seconds depending on backend scheduling.
Operation timeout: Time for an individual exec/read/write. Controlled by timeout_seconds on exec(), or request_timeout_seconds in SandboxDefaults. Does not include startup wait.

Operations and lifecycle

Operations like exec(), read_file(), and write_file() auto-start the sandbox if needed, then wait for RUNNING before proceeding:

# Session sandbox - not started yet
sandbox = session.sandbox()

# exec() handles everything: start -> wait for RUNNING -> execute command
result = sandbox.exec(["echo", "hello"]).result()

The operation timeout (timeout_seconds) applies only after the sandbox is RUNNING. Startup time is not counted against it.

Stopping and end of life

stop()

stop() sends a stop request and returns OperationRef[None]. The sandbox transitions through TERMINATING (grace period draining) before reaching a terminal state. The returned OperationRef resolves when the backend confirms the terminal state, not just when the stop RPC succeeds:

sandbox.stop().result()  # Blocks until terminal

Parameters:

Parameter	Default	Purpose
`graceful_shutdown_seconds`	10.0	Grace period for processes to exit before force-kill
`snapshot_on_stop`	`False`	Capture sandbox filesystem state before shutdown
`missing_ok`	`False`	Return normally if sandbox already gone (instead of raising)

# Capture state for debugging
sandbox.stop(snapshot_on_stop=True).result()

# Idempotent cleanup - safe to call even if sandbox is already gone
sandbox.stop(missing_ok=True).result()

# Give processes more time to shut down
sandbox.stop(graceful_shutdown_seconds=30.0).result()

stop() handles in-flight starts: if a start is still being processed, it waits for start to complete before stopping. Concurrent or repeated calls to stop() share one stop operation and do not issue duplicate stop RPCs. This makes repeated stop() calls safe and cheap.

Post-stop behavior

After stop() is called, the sandbox transitions through TERMINATING (the grace period draining state) and then reaches a terminal state (COMPLETED or FAILED). Once stop() has been called, the sandbox is unusable. Further operations raise SandboxNotRunningError:

sandbox.stop().result()

# These will raise SandboxNotRunningError
sandbox.exec(["echo", "hello"])  # Raises
sandbox.read_file("/path")       # Raises

The status property is cached from the last API call. For fresh data before stopping, use get_status():

fresh_status = sandbox.get_status()
if fresh_status == SandboxStatus.RUNNING:
    sandbox.stop().result()

Context manager exit

Context managers call stop() automatically on exit:

with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
# stop() called here, even if an exception occurred inside the block

If an exception is in-flight, the context manager suppresses stop errors to avoid masking the original exception.

stop() vs. delete()

	`stop()`	`delete()`
Target	Live sandbox instance	Sandbox by ID (class method)
Purpose	Graceful shutdown	Permanent removal / orphan cleanup
Requires instance?	Yes	No - `Sandbox.delete(sandbox_id)`
`missing_ok`	Yes	Yes

Use stop() for sandboxes you’re actively using. Use delete() for cleanup of sandboxes discovered via Sandbox.list() or Sandbox.from_id():

# Stop a sandbox you created
sandbox.stop().result()

# Delete an orphan by ID
Sandbox.delete("sandbox-abc123", missing_ok=True).result()

See the Cleanup patterns guide for orphan management and batch cleanup strategies.

Under the hood

The SDK runs all gRPC operations on a background daemon thread with its own asyncio event loop. This design means:

The sync API (.result()) blocks the calling thread while the background loop handles the network call.
The async API (await) bridges to the same background loop, so both patterns use the same underlying implementation.
The background loop starts lazily on first use. gRPC channels are also created lazily.
Auto-start works by checking if sandbox_id is None before each operation and triggering start() if so.
On process exit, cleanup handlers (atexit + signal handlers) stop all sandboxes in registered sessions. A second Ctrl+C during cleanup forces immediate exit.

This architecture avoids cross-event-loop issues and works in Jupyter notebooks without nest_asyncio. See the Sync vs. async patterns guide for usage patterns.

Common patterns

Quick one-off

Run a command and get the result. Context manager handles cleanup:

from cwsandbox import Sandbox

with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
    print(result.stdout)

Controlled startup

Separate start errors from operation errors:

from cwsandbox import Sandbox, SandboxFailedError

sandbox = Sandbox.run()
try:
    sandbox.wait()
except SandboxFailedError:
    print("Failed to start - check container image and resources")
    raise

result = sandbox.exec(["echo", "ready"]).result()
sandbox.stop().result()

Long-running sandbox

Wait for the main command to complete:

from cwsandbox import Sandbox

sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=7200.0).result()

if sandbox.returncode == 0:
    data = sandbox.read_file("/output/model.pt").result()

Reconnection

Reattach to a sandbox from a previous session or process:

from cwsandbox import Sandbox, SandboxStatus

sandbox = Sandbox.from_id("sandbox-abc123").result()

# from_id() fetches current status but does not start or verify the sandbox
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "reconnected"]).result()
else:
    print(f"Sandbox is {sandbox.status}, not RUNNING")

Parallel batch with session

Create multiple sandboxes and wait for results:

import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults(tags=("batch-job",))) as session:
    sandboxes = [session.sandbox() for _ in range(5)]

    processes = [
        sb.exec(["python", "-c", f"print({i} ** 2)"])
        for i, sb in enumerate(sandboxes)
    ]

    done, pending = cwsandbox.wait(processes)
    for p in done:
        print(p.result().stdout)

Documentation Index

​Lifecycle states

​Creation patterns

​Sandbox.run() - immediate start

​session.sandbox() - deferred start

​Main command lifetime

​Starting

​Explicit start

​Auto-start

​Context managers and start

​Waiting

​wait() - block until RUNNING

​wait_until_complete() - block until terminal

​Timeout phases

​Operations and lifecycle

​Stopping and end of life

​stop()

​Post-stop behavior

​Context manager exit

​stop() vs. delete()

​Under the hood

​Common patterns

​Quick one-off

​Controlled startup

​Long-running sandbox

​Reconnection

​Parallel batch with session