# Workloads

The Mithril CLI manages clusters and jobs through [SkyPilot](https://skypilot.co/), an open-source framework for running ML workloads. The commands below — launch, exec, status, logs, queue, start, stop, and down — provide the core workflow: provision GPUs, run tasks, monitor progress, and clean up when you're done.\
For options not covered here, run ml --help for the full option set, or use ml sky for direct access to SkyPilot's CLI.

### Quick reference

| Command       | Purpose                                                         |
| ------------- | --------------------------------------------------------------- |
| **ml launch** | Create cluster and run a task (or start interactive setup).     |
| **ml exec**   | Run a task or command on an existing cluster (or open a shell). |
| **ml status** | List clusters and jobs; optionally show IP or endpoints.        |
| **ml logs**   | Stream or download job logs; stream provision/autostop logs.    |
| **ml queue**  | Show job queue for cluster(s).                                  |
| **ml start**  | Start stopped (or failed) cluster(s).                           |
| **ml stop**   | Stop cluster(s); keep disks for later **ml start**.             |
| **ml down**   | Tear down cluster(s) and delete resources.                      |

For full option lists, run **`ml <command> --help`**.

***

### ml launch

Provision a cluster and run a task. With no arguments, runs an interactive setup. With a task YAML or inline command, launches the cluster and (unless `--detach-run`) streams job logs.

When invoked with no arguments, ml launch starts an interactive onboarding flow:

1. Creates a starter task YAML (task.yaml) with annotated fields for resources, setup, and run.
2. Optionally adds an AGENTS.md to your project so coding agents (Claude, Cursor, etc.) can discover the Mithril CLI docs bundled with the package.
3. Prints a ready-to-use launch command and, if AGENTS.md was written, a prompt for an agent-guided walkthrough.

**Synopsis**

```
ml launch [ENTRYPOINT] [OPTIONS]
```

**Arguments**

| Argument     | Description                                                                                                         |
| ------------ | ------------------------------------------------------------------------------------------------------------------- |
| `ENTRYPOINT` | Optional. Path to a task YAML (`.yaml`/`.yml`) or a single bash command in quotes. Omit to start interactive setup. |

**Options**

| Option                               | Description                                                           |
| ------------------------------------ | --------------------------------------------------------------------- |
| `-c`, `--cluster NAME`               | Cluster name. If the cluster exists, reuses it; otherwise creates it. |
| `--gpus SPEC`                        | GPU type and count (e.g. `A100:4`, `H100:8`).                         |
| `--cpus SPEC`                        | vCPU requirement (e.g. `4`, `4+`).                                    |
| `--memory SPEC`                      | Memory in GB.                                                         |
| `--cloud CLOUD`                      | Cloud provider.                                                       |
| `--region REGION`                    | Region.                                                               |
| `--num-nodes N`                      | Number of nodes.                                                      |
| `-i`, `--idle-minutes-to-autostop N` | Auto-stop cluster after N minutes of idleness.                        |
| `--down`                             | Tear down the cluster after the job finishes.                         |
| `-d`, `--detach-run`                 | Do not stream job logs; return after the job is submitted.            |
| `-r`, `--retry-until-up`             | Retry provisioning until the cluster is up.                           |
| `-y`, `--yes`                        | Skip confirmation prompts.                                            |
| `--dryrun`                           | Print cluster name, task, and resources only; do not launch.          |
| `-n`, `--name NAME`                  | Task name.                                                            |
| `--workdir DIR`                      | Local directory to sync as the task workdir.                          |
| `-e`, `--env KEY=VALUE`              | Set environment variables (repeatable).                               |

**Examples**

```bash
ml launch task.yaml -c mycluster
ml launch --gpus B200:8 -c dev
```

***

### ml exec

Run a task or command on an **existing** cluster without re-provisioning. Use a task YAML or a bash command. For interactive use, open a shell with **`ml exec CLUSTER`** or use **`ml ssh CLUSTER`**.

**Synopsis**

```
ml exec CLUSTER [ENTRYPOINT ...] [OPTIONS]
```

**Arguments**

| Argument     | Description                                                                                   |
| ------------ | --------------------------------------------------------------------------------------------- |
| `CLUSTER`    | Cluster name.                                                                                 |
| `ENTRYPOINT` | Optional. Task YAML path or bash command. Omit to open an interactive shell on the head node. |

**Options**

| Option               | Description                                    |
| -------------------- | ---------------------------------------------- |
| `-d`, `--detach-run` | Submit the job and return; do not stream logs. |

Additional task and resource options (e.g. `--workdir`, `--gpus`, `--env`) are supported; run **`ml exec --help`** for the full list.

**Examples**

```bash
ml exec my-cluster task.yaml
ml exec my-cluster python train.py
```

***

### ml status

List clusters and job information. Updates local SSH config so you can **`ssh CLUSTER`** or use **`ml exec`**. With one cluster, **`--ip`** or **`--endpoints`** can be used to get connection details.

**Synopsis**

```
ml status [CLUSTER ...] [OPTIONS]
```

**Arguments**

| Argument  | Description                                                 |
| --------- | ----------------------------------------------------------- |
| `CLUSTER` | Optional. One or more cluster names. Default: all clusters. |

**Options**

| Option                                           | Description                                                                                     |
| ------------------------------------------------ | ----------------------------------------------------------------------------------------------- |
| `-v`, `--verbose`                                | Show all fields.                                                                                |
| `-r`, `--refresh`                                | Query latest status from the cloud (use when clusters change outside Mithril or with autostop). |
| `--ip`                                           | Show head node IP (only with exactly one cluster).                                              |
| `--endpoints`                                    | Show all exposed endpoints (only with exactly one cluster).                                     |
| `--endpoint PORT`                                | Show URL for the given port (only with exactly one cluster).                                    |
| `--show-managed-jobs` / `--no-show-managed-jobs` | Include in-progress managed jobs (default: show).                                               |
| `--show-services` / `--no-show-services`         | Include Sky Serve services (default: show).                                                     |
| `--show-pools` / `--no-show-pools`               | Include pools (default: show).                                                                  |
| `--all-users`                                    | Include clusters for all users.                                                                 |

**Cluster states**

| State       | Description                                                             |
| ----------- | ----------------------------------------------------------------------- |
| **UP**      | Ready; provisioning and setup completed.                                |
| **STOPPED** | Stopped; use **`ml start`** to restart.                                 |
| **INIT**    | Provisioning or setup in progress, or cluster in an inconsistent state. |

**Examples**

```bash
ml status
ml status my-cluster
ml status --refresh
ml status my-cluster --ip
```

***

### ml logs

Stream or download job logs, or stream provisioning/autostop logs.

**Synopsis**

```
ml logs CLUSTER [JOB_ID ...] [OPTIONS]
```

**Arguments**

| Argument  | Description                                                                                                                |
| --------- | -------------------------------------------------------------------------------------------------------------------------- |
| `CLUSTER` | Cluster name.                                                                                                              |
| `JOB_ID`  | Optional. Job ID(s). If omitted, uses the latest job. For streaming, at most one job; for `--sync-down`, multiple allowed. |

**Options**

| Option                     | Description                                                                                                                 |
| -------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| `--provision`              | Stream cluster provisioning logs (provision.log).                                                                           |
| `--autostop`               | Stream autostop hook logs.                                                                                                  |
| `-w`, `--worker ID`        | Worker ID for logs (only with `--provision`).                                                                               |
| `-s`, `--sync-down`        | Download job logs to `~/sky_logs` (multiple job IDs allowed).                                                               |
| `--status`                 | Do not show logs; exit with status code: 0 = succeeded, 100 = failed, 101 = not finished, 102 = not found, 103 = cancelled. |
| `--follow` / `--no-follow` | Stream logs continuously (default: follow).                                                                                 |
| `--tail N`                 | Show only the last N lines (0 = all).                                                                                       |

**Examples**

```bash
ml logs my-cluster 1
ml logs my-cluster --provision
ml logs my-cluster --status
ml logs my-cluster -s 1 2 3
```

***

### ml queue

Show the job queue for one or more clusters (pending and running jobs; optionally finished).

**Synopsis**

```
ml queue [CLUSTER ...] [OPTIONS]
```

**Arguments**

| Argument  | Description                                       |
| --------- | ------------------------------------------------- |
| `CLUSTER` | Optional. Cluster name(s). Default: all clusters. |

**Options**

| Option                  | Description                         |
| ----------------------- | ----------------------------------- |
| `-s`, `--skip-finished` | Show only pending and running jobs. |
| `--all-users`           | Show queue for all users.           |

**Examples**

```bash
ml queue
ml queue my-cluster
ml queue my-cluster --skip-finished
```

***

### ml start

Start one or more stopped clusters (or retry provisioning/setup for clusters in INIT). No effect if a cluster is already UP.

**Synopsis**

```
ml start [CLUSTER ...] [OPTIONS]
```

**Arguments**

| Argument  | Description                                                                                  |
| --------- | -------------------------------------------------------------------------------------------- |
| `CLUSTER` | Optional. Cluster name(s). Default: all clusters (or the single cluster if only one exists). |

**Options**

| Option                               | Description                                                                     |
| ------------------------------------ | ------------------------------------------------------------------------------- |
| `-a`, `--all`                        | Start all clusters.                                                             |
| `-y`, `--yes`                        | Skip confirmation.                                                              |
| `-i`, `--idle-minutes-to-autostop N` | Set autostop after N minutes of idleness.                                       |
| `--down`                             | Use autodown (tear down after idleness); requires `--idle-minutes-to-autostop`. |
| `-r`, `--retry-until-up`             | Retry until the cluster is up on availability failures.                         |
| `-f`, `--force`                      | Start even if already UP (e.g. to upgrade SkyPilot runtime).                    |

**Examples**

```bash
ml start my-cluster
ml start cluster1 cluster2
ml start -a
```

***

### ml stop

Stop one or more clusters. Billing for instances stops; attached disks are kept and reattached when you **`ml start`**. Spot clusters cannot be stopped.

**Synopsis**

```
ml stop [CLUSTER ...] [OPTIONS]
```

**Arguments**

| Argument  | Description                                          |
| --------- | ---------------------------------------------------- |
| `CLUSTER` | Optional. Cluster name(s) or glob (e.g. `cluster*`). |

**Options**

| Option                 | Description                                                              |
| ---------------------- | ------------------------------------------------------------------------ |
| `-a`, `--all`          | Stop all clusters.                                                       |
| `--all-users`          | Stop all clusters for all users.                                         |
| `-y`, `--yes`          | Skip confirmation.                                                       |
| `--graceful`           | Wait for MOUNT\_CACHED uploads to complete (cancels current jobs first). |
| `--graceful-timeout N` | Timeout in seconds for `--graceful`.                                     |

**Examples**

```bash
ml stop my-cluster
ml stop cluster1 cluster2
ml stop "cluster*"
ml stop -a
```

***

### ml down

Tear down one or more clusters. All associated resources are deleted and billing stops; data on attached disks is lost.

**Synopsis**

```
ml down [CLUSTER ...] [OPTIONS]
```

**Arguments**

| Argument  | Description                                          |
| --------- | ---------------------------------------------------- |
| `CLUSTER` | Optional. Cluster name(s) or glob (e.g. `cluster*`). |

**Options**

| Option                 | Description                                                                                                                                                            |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `-a`, `--all`          | Tear down all clusters.                                                                                                                                                |
| `--all-users`          | Tear down all clusters for all users.                                                                                                                                  |
| `-y`, `--yes`          | Skip confirmation.                                                                                                                                                     |
| `-p`, `--purge`        | (Advanced) Remove cluster(s) from SkyPilot’s table even if cloud teardown failed. Use only when troubleshooting; you are responsible for cleaning up leaked resources. |
| `--graceful`           | Wait for MOUNT\_CACHED uploads before terminating (cancels current jobs first).                                                                                        |
| `--graceful-timeout N` | Timeout in seconds for `--graceful`.                                                                                                                                   |

**Examples**

```bash
ml down my-cluster
ml down cluster1 cluster2
ml down "cluster*"
ml down -a
```
