Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers sandbox states, creation patterns, waiting, and shutdown behavior.

Lifecycle states

Every sandbox passes through a series of states. The SDK represents these as SandboxStatus values.
StateMeaningTerminal?How wait() handles it
PENDINGStart accepted, waiting for backend schedulingNoPolls through
CREATINGBackend is provisioning the containerNoPolls through
RUNNINGContainer is up and accepting operationsNoReturns normally
COMPLETEDCommand exited with code 0YesReturns normally
TERMINATEDExternally killed or max_lifetime_seconds exceededYesRaises SandboxTerminatedError
FAILEDStartup or runtime errorYesRaises SandboxFailedError
UNSPECIFIEDBackend no longer tracking this sandboxYesTreated as completed, returns normally
PENDING and CREATING are transient - the SDK polls through them automatically. RUNNING is the stable operational state. COMPLETED, TERMINATED, FAILED, and UNSPECIFIED are terminal.

Creation patterns

There are two ways to create a sandbox, differing in when the start RPC fires.

Sandbox.run() - immediate start

Sandbox.run() creates a sandbox and calls start().result() internally, blocking until the backend accepts the request:
from cwsandbox import Sandbox

# Returns after backend accepts the start request
sandbox = Sandbox.run("echo", "hello")

# sandbox_id is set immediately
print(sandbox.sandbox_id)  # "sandbox-abc123"

# But status may still be PENDING - not necessarily RUNNING yet
The first positional argument is the command, the rest are arguments:
# These are equivalent
Sandbox.run("echo", "hello", "world")
Sandbox.run(command="echo", args=["hello", "world"])

session.sandbox() - deferred start

session.sandbox() creates a sandbox object without making any network call. The start RPC fires on first use:
import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults()) as session:
    sandbox = session.sandbox()

    # No network call yet - sandbox_id is None
    print(sandbox.sandbox_id)  # None

    # First operation triggers start automatically
    result = sandbox.exec(["echo", "hello"]).result()

Main command lifetime

The command passed to run() or session.sandbox() is the sandbox’s main process. When it exits, the sandbox transitions to COMPLETED:
# This sandbox completes almost immediately - echo exits right away
sandbox = Sandbox.run("echo", "hi")
sandbox.wait()  # May already be COMPLETED

# For exec-based workflows, use a long-running main command
sandbox = Sandbox.run()
# Or rely on the default (tail -f /dev/null)
sandbox = Sandbox.run()
If you need to run a short command and capture its output, use exec() on a long-running sandbox rather than making the short command the main process.

Starting

Explicit start

start() sends the start RPC and returns an OperationRef[None]:
sandbox = session.sandbox()

# Explicit start for error control
sandbox.start().result()  # Blocks until backend accepts
print(sandbox.sandbox_id)  # Now set
This is useful when you want to control timing or handle start errors separately from operation errors.

Auto-start

Most operations auto-start the sandbox if it hasn’t been started yet. This makes the common case simple: create a sandbox, start using it.
MethodAuto-starts?Notes
exec()YesAlso waits for RUNNING before executing
read_file()YesAlso waits for RUNNING before reading
write_file()YesAlso waits for RUNNING before writing
wait()YesStarts, then polls until RUNNING or terminal
wait_until_complete()YesStarts, then polls until terminal
await sandboxYesStarts and waits for RUNNING
get_status()NoRequires sandbox_id - call start() first
stop()NoNothing to stop if not started

Context managers and start

Context managers (with/async with) call start() on entry but do not call wait(). If you need the sandbox to be RUNNING before your first operation, call wait() explicitly:
with Sandbox.run() as sandbox:
    # start() has been called, but sandbox may still be PENDING/CREATING
    sandbox.wait()  # Now guaranteed RUNNING (or raises on failure)
    result = sandbox.exec(["echo", "ready"]).result()
In practice, this rarely matters because exec() waits for RUNNING internally. Explicit wait() is useful when you want to separate startup failures from operation failures.

Waiting

wait() - block until RUNNING

wait() polls until the sandbox reaches RUNNING or a terminal state:
sandbox = Sandbox.run()
sandbox.wait()  # Blocks until RUNNING

# Check status - could be RUNNING or a terminal state like COMPLETED
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "hello"]).result()
The polling uses exponential backoff: starting at 0.2s intervals, scaling by 1.5x, capping at 2.0s. wait() returns self for method chaining:
result = Sandbox.run().wait().exec(["echo", "hello"]).result()
If the sandbox reaches a terminal state during startup, wait() handles it:
Terminal statewait() behavior
COMPLETEDReturns normally (sandbox finished before RUNNING)
UNSPECIFIEDReturns normally (treated as completed)
FAILEDRaises SandboxFailedError
TERMINATEDRaises SandboxTerminatedError

wait_until_complete() - block until terminal

wait_until_complete() blocks until the sandbox reaches any terminal state. Use this for sandboxes where the main command does the work:
sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=3600.0)
print(sandbox.returncode)  # 0 if training succeeded
The raise_on_termination parameter controls behavior when a sandbox is externally terminated:
from cwsandbox import SandboxTerminatedError

# Default: raises on termination
try:
    sandbox.wait_until_complete()
except SandboxTerminatedError:
    print("Sandbox was killed externally")

# Suppress termination errors for graceful handling
sandbox.wait_until_complete(raise_on_termination=False)
if sandbox.status == SandboxStatus.TERMINATED:
    print("Terminated, but we handle it ourselves")

Timeout phases

Startup wait time and operation timeouts are separate phases:
start() --> [startup wait] --> RUNNING --> exec() --> [operation timeout]
  • Startup wait: Time spent in PENDING/CREATING before reaching RUNNING. Controlled by the timeout parameter on wait() or wait_until_complete(). Typically 30-60 seconds depending on backend scheduling.
  • Operation timeout: Time for an individual exec/read/write. Controlled by timeout_seconds on exec(), or request_timeout_seconds in SandboxDefaults. Does not include startup wait.

Operations and lifecycle

Operations like exec(), read_file(), and write_file() auto-start the sandbox if needed, then wait for RUNNING before proceeding:
# Session sandbox - not started yet
sandbox = session.sandbox()

# exec() handles everything: start -> wait for RUNNING -> execute command
result = sandbox.exec(["echo", "hello"]).result()
The operation timeout (timeout_seconds) applies only after the sandbox is RUNNING. Startup time is not counted against it.

Stopping and end of life

stop()

stop() sends a stop request and returns OperationRef[None]:
sandbox.stop().result()  # Blocks until stopped
Parameters:
ParameterDefaultPurpose
graceful_shutdown_seconds10.0Grace period for processes to exit before force-kill
snapshot_on_stopFalseCapture sandbox filesystem state before shutdown
missing_okFalseReturn normally if sandbox already gone (instead of raising)
# Capture state for debugging
sandbox.stop(snapshot_on_stop=True).result()

# Idempotent cleanup - safe to call even if sandbox is already gone
sandbox.stop(missing_ok=True).result()

# Give processes more time to shut down
sandbox.stop(graceful_shutdown_seconds=30.0).result()
stop() handles in-flight starts: if a start is still being processed, it waits for start to complete before stopping. Multiple calls to stop() are safe - the second call returns immediately.

Post-stop behavior

After stopping, the sandbox is unusable. Further operations raise SandboxNotRunningError:
sandbox.stop().result()

# These will raise SandboxNotRunningError
sandbox.exec(["echo", "hello"])  # Raises
sandbox.read_file("/path")       # Raises
The status property is cached from the last API call. For fresh data before stopping, use get_status():
fresh_status = sandbox.get_status()
if fresh_status == SandboxStatus.RUNNING:
    sandbox.stop().result()

Context manager exit

Context managers call stop() automatically on exit:
with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
# stop() called here, even if an exception occurred inside the block
If an exception is in-flight, the context manager suppresses stop errors to avoid masking the original exception.

stop() vs. delete()

stop()delete()
TargetLive sandbox instanceSandbox by ID (class method)
PurposeGraceful shutdownPermanent removal / orphan cleanup
Requires instance?YesNo - Sandbox.delete(sandbox_id)
missing_okYesYes
Use stop() for sandboxes you’re actively using. Use delete() for cleanup of sandboxes discovered via Sandbox.list() or Sandbox.from_id():
# Stop a sandbox you created
sandbox.stop().result()

# Delete an orphan by ID
Sandbox.delete("sandbox-abc123", missing_ok=True).result()
See the Cleanup patterns guide for orphan management and batch cleanup strategies.

Under the hood

The SDK runs all gRPC operations on a background daemon thread with its own asyncio event loop. This design means:
  • The sync API (.result()) blocks the calling thread while the background loop handles the network call.
  • The async API (await) bridges to the same background loop, so both patterns use the same underlying implementation.
  • The background loop starts lazily on first use. gRPC channels are also created lazily.
  • Auto-start works by checking if sandbox_id is None before each operation and triggering start() if so.
  • On process exit, cleanup handlers (atexit + signal handlers) stop all sandboxes in registered sessions. A second Ctrl+C during cleanup forces immediate exit.
This architecture avoids cross-event-loop issues and works in Jupyter notebooks without nest_asyncio. See the Sync vs. async patterns guide for usage patterns.

Common patterns

Quick one-off

Run a command and get the result. Context manager handles cleanup:
from cwsandbox import Sandbox

with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
    print(result.stdout)

Controlled startup

Separate start errors from operation errors:
from cwsandbox import Sandbox, SandboxFailedError

sandbox = Sandbox.run()
try:
    sandbox.wait()
except SandboxFailedError:
    print("Failed to start - check container image and resources")
    raise

result = sandbox.exec(["echo", "ready"]).result()
sandbox.stop().result()

Long-running sandbox

Wait for the main command to complete:
from cwsandbox import Sandbox

sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=7200.0)

if sandbox.returncode == 0:
    data = sandbox.read_file("/output/model.pt").result()

Reconnection

Reattach to a sandbox from a previous session or process:
from cwsandbox import Sandbox, SandboxStatus

sandbox = Sandbox.from_id("sandbox-abc123").result()

# from_id() fetches current status but does not start or verify the sandbox
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "reconnected"]).result()
else:
    print(f"Sandbox is {sandbox.status}, not RUNNING")

Parallel batch with session

Create multiple sandboxes and wait for results:
import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults(tags=("batch-job",))) as session:
    sandboxes = [session.sandbox() for _ in range(5)]

    processes = [
        sb.exec(["python", "-c", f"print({i} ** 2)"])
        for i, sb in enumerate(sandboxes)
    ]

    done, pending = cwsandbox.wait(processes)
    for p in done:
        print(p.result().stdout)
Last modified on April 21, 2026