> ## Documentation Index
> Fetch the complete documentation index at: https://docs.coreweave.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Sandbox lifecycle

> Understand sandbox states, creation patterns, waiting, and shutdown behavior.

This guide explains how a sandbox progresses from creation to shutdown, including the states it passes through, how to start and wait for it, and how to stop it cleanly. Use this guide when you need to choose between creation patterns, control startup timing, or handle shutdown reliably in your SDK code.

## Lifecycle states

Every sandbox passes through a series of states as it starts, runs, and shuts down. The SDK represents these as `SandboxStatus` values.

```mermaid theme={"system"}
stateDiagram-v2
    [*] --> PENDING: start()
    PENDING --> CREATING: backend scheduling
    CREATING --> RUNNING: container ready
    CREATING --> FAILED: startup error
    RUNNING --> COMPLETED: command exits
    RUNNING --> FAILED: runtime error
    RUNNING --> TERMINATING: stop() / lifetime exceeded
    TERMINATING --> COMPLETED: grace period ends
    TERMINATING --> FAILED: error during drain
    COMPLETED --> [*]
    FAILED --> [*]
```

| State         | Meaning                                                  | Terminal? | How `wait()` handles it                |
| ------------- | -------------------------------------------------------- | --------- | -------------------------------------- |
| `PENDING`     | Start accepted, waiting for backend scheduling           | No        | Polls through                          |
| `CREATING`    | Backend is provisioning the container                    | No        | Polls through                          |
| `RUNNING`     | Container is up and accepting operations                 | No        | Returns normally                       |
| `TERMINATING` | Sandbox is draining through its grace period before exit | No        | Returns                                |
| `COMPLETED`   | Command exited (check `returncode`)                      | Yes       | Returns normally                       |
| `FAILED`      | Startup or runtime error                                 | Yes       | Raises `SandboxFailedError`            |
| `UNSPECIFIED` | Backend no longer tracking this sandbox                  | Yes       | Treated as completed, returns normally |

`PENDING` and `CREATING` are transient. The SDK polls through them automatically.
`TERMINATING` is a transient backend state that `wait_until_complete()` and `stop().result()`
drive through to a terminal state. `RUNNING` is the stable operational state.
`COMPLETED` and `FAILED` are terminal. `UNSPECIFIED` is mapped to `COMPLETED` by the SDK at
poll time.

## Creation patterns

You can create a sandbox in two ways, differing in when the start RPC fires. The following sections describe each pattern and when to choose it.

### Sandbox.run(): immediate start

`Sandbox.run()` creates a sandbox and calls `start().result()` internally, blocking until the
backend accepts the request:

```python theme={"system"}
from cwsandbox import Sandbox

# Returns after backend accepts the start request
sandbox = Sandbox.run("echo", "hello")

# sandbox_id is set immediately
print(sandbox.sandbox_id)  # "sandbox-abc123"

# But status may still be PENDING, not necessarily RUNNING yet
```

The first positional argument is the command, the rest are arguments:

```python theme={"system"}
# These are equivalent
Sandbox.run("echo", "hello", "world")
Sandbox.run(command="echo", args=["hello", "world"])
```

### session.sandbox(): deferred start

`session.sandbox()` creates a sandbox object without making any network call. The start RPC
fires on first use:

```python theme={"system"}
import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults()) as session:
    sandbox = session.sandbox()

    # No network call yet, sandbox_id is None
    print(sandbox.sandbox_id)  # None

    # First operation triggers start automatically
    result = sandbox.exec(["echo", "hello"]).result()
```

### Main command lifetime

The command passed to `run()` or `session.sandbox()` is the sandbox's main process. When it
exits, the sandbox transitions to `COMPLETED`:

```python theme={"system"}
# This sandbox completes almost immediately because echo exits right away
sandbox = Sandbox.run("echo", "hi")
sandbox.wait()  # May already be COMPLETED

# For exec-based workflows, use a long-running main command
sandbox = Sandbox.run()
# Or rely on the default
sandbox = Sandbox.run()
```

If you need to run a short command and capture its output, use `exec()` on a long-running
sandbox rather than making the short command the main process.

## Start a sandbox

After a sandbox object exists, you can start it explicitly or rely on the SDK to start it on first use. The following sections describe both approaches and how context managers interact with them.

### Explicit start

`start()` sends the start RPC and returns an `OperationRef[None]`:

```python theme={"system"}
sandbox = session.sandbox()

# Explicit start for error control
sandbox.start().result()  # Blocks until backend accepts
print(sandbox.sandbox_id)  # Now set
```

This is useful when you want to control timing or handle start errors separately from
operation errors.

### Auto-start

Most operations auto-start the sandbox if it hasn't been started yet. In the common
case, you create a sandbox and start using it.

| Method                  | Auto-starts? | Notes                                                      |
| ----------------------- | ------------ | ---------------------------------------------------------- |
| `exec()`                | Yes          | Also waits for RUNNING before executing                    |
| `read_file()`           | Yes          | Also waits for RUNNING before reading                      |
| `write_file()`          | Yes          | Also waits for RUNNING before writing                      |
| `wait()`                | Yes          | Starts, then polls until RUNNING, TERMINATING, or terminal |
| `wait_until_complete()` | Yes          | Starts, then polls until terminal                          |
| `await sandbox`         | Yes          | Starts and waits for RUNNING                               |
| `get_status()`          | No           | Requires sandbox\_id. Call `start()` first                 |
| `stop()`                | No           | Nothing to stop if not started                             |

### Context managers and start

Context managers (`with`/`async with`) call `start()` on entry but do **not** call `wait()`.
If you need the sandbox to be RUNNING before your first operation, call `wait()` explicitly:

```python theme={"system"}
with Sandbox.run() as sandbox:
    # start() has been called, but sandbox may still be PENDING/CREATING
    sandbox.wait()  # Now guaranteed RUNNING (or raises on failure)
    result = sandbox.exec(["echo", "ready"]).result()
```

In practice, this rarely matters because `exec()` waits for RUNNING internally. Explicit
`wait()` is useful when you want to separate startup failures from operation failures.

## Wait for a sandbox

The SDK provides two waiting methods depending on whether you need the sandbox to be ready for operations or to finish its main command. The following sections describe each method and how startup and operation timeouts relate.

### wait(): block until RUNNING

`wait()` polls until the sandbox reaches RUNNING, TERMINATING, or a terminal state:

```python theme={"system"}
sandbox = Sandbox.run()
sandbox.wait()  # Blocks until RUNNING

# Check status, could be RUNNING or a terminal state like COMPLETED
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "hello"]).result()
```

The polling uses exponential backoff: starting at 0.2s intervals, scaling by 1.5x, capping
at 2.0s.

`wait()` returns self for method chaining:

```python theme={"system"}
result = Sandbox.run().wait().exec(["echo", "hello"]).result()
```

If the sandbox reaches a terminal state during startup, `wait()` handles it:

| Terminal state | wait() behavior                                    |
| -------------- | -------------------------------------------------- |
| `COMPLETED`    | Returns normally (sandbox finished before RUNNING) |
| `UNSPECIFIED`  | Returns normally (treated as completed)            |
| `FAILED`       | Raises `SandboxFailedError`                        |

### wait\_until\_complete(): block until terminal

`wait_until_complete()` blocks until the sandbox reaches a terminal state, polling through
`TERMINATING` automatically. Use this for sandboxes where the main command does the work:

```python theme={"system"}
sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=3600.0).result()
print(sandbox.returncode)  # 0 if training succeeded
```

The `raise_on_termination` parameter controls whether `wait_until_complete()` raises
`SandboxTerminatedError` after this client called `stop()`. With the default
`raise_on_termination=True`, the SDK raises. External stops and lifetime-exceeded events
surface as `COMPLETED` without a distinct error, because the backend doesn't yet provide
termination reason metadata.

```python theme={"system"}
from cwsandbox import SandboxTerminatedError

# Default: raises after this client called stop()
try:
    sandbox.wait_until_complete().result()
except SandboxTerminatedError:
    print("This client stopped the sandbox")

# Suppress for graceful handling
sandbox.wait_until_complete(raise_on_termination=False).result()
print(f"Exit code: {sandbox.returncode}")
```

### Timeout phases

Startup wait time and operation timeouts are separate phases:

```text theme={"system"}
start() --> [startup wait] --> RUNNING --> exec() --> [operation timeout]
```

* **Startup wait**: Time spent in PENDING/CREATING before reaching RUNNING. Controlled by the
  `timeout` parameter on `wait()` or `wait_until_complete()`. Typically 30 to 60 seconds depending
  on backend scheduling.
* **Operation timeout**: Time for an individual exec, read, or write. Controlled by `timeout_seconds`
  on `exec()`, or `request_timeout_seconds` in `SandboxDefaults`. Doesn't include startup wait.

## Operations and lifecycle

Operations like `exec()`, `read_file()`, and `write_file()` auto-start the sandbox if needed,
then wait for RUNNING before proceeding:

```python theme={"system"}
# Session sandbox, not started yet
sandbox = session.sandbox()

# exec() handles everything: start, wait for RUNNING, then execute command
result = sandbox.exec(["echo", "hello"]).result()
```

The operation timeout (`timeout_seconds`) applies only after the sandbox is RUNNING. Startup
time is not counted against it.

## Stop a sandbox and end of life

This section covers how to shut down a sandbox, what happens after a stop, and when to use `stop()` versus `delete()`.

### stop()

`stop()` sends a stop request and returns `OperationRef[None]`. The sandbox transitions through
`TERMINATING` (grace period draining) before reaching a terminal state. The returned
`OperationRef` resolves when the backend confirms the terminal state, not only when the stop
RPC succeeds:

```python theme={"system"}
sandbox.stop().result()  # Blocks until terminal
```

Parameters:

| Parameter                   | Default | Purpose                                                      |
| --------------------------- | ------- | ------------------------------------------------------------ |
| `graceful_shutdown_seconds` | 10.0    | Grace period for processes to exit before force-stop         |
| `snapshot_on_stop`          | `False` | Capture sandbox filesystem state before shutdown             |
| `missing_ok`                | `False` | Return normally if sandbox already gone (instead of raising) |

```python theme={"system"}
# Capture state for debugging
sandbox.stop(snapshot_on_stop=True).result()

# Idempotent cleanup: safe to call even if sandbox is already gone
sandbox.stop(missing_ok=True).result()

# Give processes more time to shut down
sandbox.stop(graceful_shutdown_seconds=30.0).result()
```

`stop()` handles in-flight starts: if a start is still being processed, it waits for start to
complete before stopping. Concurrent or repeated calls to `stop()` share one stop operation and
don't issue duplicate stop RPCs. This makes repeated `stop()` calls safe and cheap.

### Post-stop behavior

After `stop()` is called, the sandbox transitions through `TERMINATING` (the grace period
draining state) and then reaches a terminal state (`COMPLETED` or `FAILED`). After `stop()` has
been called, the sandbox is unusable. Further operations raise `SandboxNotRunningError`:

```python theme={"system"}
sandbox.stop().result()

# These will raise SandboxNotRunningError
sandbox.exec(["echo", "hello"])  # Raises
sandbox.read_file("/path")       # Raises
```

The `status` property is cached from the last API call. For fresh data before stopping, use
`get_status()`:

```python theme={"system"}
fresh_status = sandbox.get_status()
if fresh_status == SandboxStatus.RUNNING:
    sandbox.stop().result()
```

### Context manager exit

Context managers call `stop()` automatically on exit:

```python theme={"system"}
with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
# stop() called here, even if an exception occurred inside the block
```

If an exception is in-flight, the context manager suppresses stop errors to avoid masking the
original exception.

### stop() compared to delete()

|                    | `stop()`              | `delete()`                           |
| ------------------ | --------------------- | ------------------------------------ |
| Target             | Live sandbox instance | Sandbox by ID (class method)         |
| Purpose            | Graceful shutdown     | Permanent removal or orphan cleanup  |
| Requires instance? | Yes                   | No. Use `Sandbox.delete(sandbox_id)` |
| `missing_ok`       | Yes                   | Yes                                  |

Use `stop()` for sandboxes you're actively using. Use `delete()` for cleanup of sandboxes
discovered through `Sandbox.list()` or `Sandbox.from_id()`:

```python theme={"system"}
# Stop a sandbox you created
sandbox.stop().result()

# Delete an orphan by ID
Sandbox.delete("sandbox-abc123", missing_ok=True).result()
```

See the [Cleanup patterns guide](/products/sandboxes/client/guides/cleanup-patterns) for orphan management and batch cleanup
strategies.

## Under the hood

The SDK runs all gRPC operations on a background daemon thread with its own asyncio event loop.
This design means:

* The sync API (`.result()`) blocks the calling thread while the background loop handles the
  network call.
* The async API (`await`) bridges to the same background loop, so both patterns use the same
  underlying implementation.
* The background loop starts lazily on first use. gRPC channels are also created lazily.
* Auto-start works by checking if `sandbox_id` is None before each operation and triggering
  `start()` if so.
* On process exit, cleanup handlers (atexit + signal handlers) stop all sandboxes in registered
  sessions. A second Ctrl+C during cleanup forces immediate exit.

This architecture avoids cross-event-loop issues and works in Jupyter notebooks without
`nest_asyncio`. See the [Sync compared to async patterns guide](/products/sandboxes/client/guides/sync-vs-async) for usage patterns.

## Common patterns

The following examples show end-to-end patterns that combine the lifecycle steps described above.

### Quick one-off

Run a command and get the result. Context manager handles cleanup:

```python theme={"system"}
from cwsandbox import Sandbox

with Sandbox.run() as sandbox:
    result = sandbox.exec(["echo", "hello"]).result()
    print(result.stdout)
```

### Controlled startup

Separate start errors from operation errors:

```python theme={"system"}
from cwsandbox import Sandbox, SandboxFailedError

sandbox = Sandbox.run()
try:
    sandbox.wait()
except SandboxFailedError:
    print("Failed to start. Check container image and resources")
    raise

result = sandbox.exec(["echo", "ready"]).result()
sandbox.stop().result()
```

### Long-running sandbox

Wait for the main command to complete:

```python theme={"system"}
from cwsandbox import Sandbox

sandbox = Sandbox.run("python", "train.py")
sandbox.wait_until_complete(timeout=7200.0).result()

if sandbox.returncode == 0:
    data = sandbox.read_file("/output/model.pt").result()
```

### Reconnection

Reattach to a sandbox from a previous session or process:

```python theme={"system"}
from cwsandbox import Sandbox, SandboxStatus

sandbox = Sandbox.from_id("sandbox-abc123").result()

# from_id() fetches current status but does not start or verify the sandbox
if sandbox.status == SandboxStatus.RUNNING:
    result = sandbox.exec(["echo", "reconnected"]).result()
else:
    print(f"Sandbox is {sandbox.status}, not RUNNING")
```

### Parallel batch with session

Create multiple sandboxes and wait for results:

```python theme={"system"}
import cwsandbox
from cwsandbox import SandboxDefaults

with cwsandbox.Session(SandboxDefaults(tags=("batch-job",))) as session:
    sandboxes = [session.sandbox() for _ in range(5)]

    processes = [
        sb.exec(["python", "-c", f"print({i} ** 2)"])
        for i, sb in enumerate(sandboxes)
    ]

    done, pending = cwsandbox.wait(processes)
    for p in done:
        print(p.result().stdout)
```
