# Overview

The Mithril CLI removes the manual setup typically required to run ML workloads on remote infrastructure. Workloads are defined **declaratively in YAML** — specifying code, compute requirements, and runtime configuration in a single spec. Instead of SSHing into nodes or stitching together launch scripts, researchers submit runs directly from these definitions. The CLI handles packaging, scheduling, and execution **across Mithril and other clouds**, supporting common patterns like model training, offline batch inference, and large-scale evaluation.

#### Defining a basic workload

```yaml
# task.yaml
resources:
  infra: mithril
  accelerators: B200:8

num_nodes: 2

setup: |
  pip install -r requirements.txt

run: |
  MASTER_ADDR=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
  torchrun \
    --nnodes=$SKYPILOT_NUM_NODES \
    --nproc_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \
    --master_addr=$MASTER_ADDR \
    --node_rank=$SKYPILOT_NODE_RANK \
    train.py --distributed
```

#### Running the workload

```bash
ml launch task.yaml
```

#### Provisioning and scheduling

Before execution, the CLI evaluates available capacity and proposes a cluster configuration:

```
> ml launch task.yaml
mithril-client 0.1.0
SkyPilot API server 0.1.0a4
Considered resources (2 nodes):
-------------------------------------------------------------------------------------
 INFRA                     INSTANCE   vCPUs   Mem(GB)   GPUS     COST ($)   CHOSEN
-------------------------------------------------------------------------------------
 Mithril (us-central5-a)   b200.8x    232      192       B200:8   2.25          ✔
-------------------------------------------------------------------------------------
Launching a new cluster 'sky-692d-olivier'. Proceed? [Y/n]:
• Launching... View logs: ml logs --provision sky-69a7-olivier
```

#### Features

* **Attach storage** – Mount persistent volumes or cloud buckets for datasets, checkpoints, and run outputs.
* **Scale from single-node to distributed training** – Provision multi-node GPU clusters with InfiniBand networking automatically configured.
* **Cost protection** – Set maximum price limits to control spend.
* **Idle auto-shutdown** – Instances pause automatically when GPUs are no longer in use.
* **AI-native** – Built with coding agents in mind.
* **Multi-cloud ready** – Launch workloads across Mithril, Nebius, Oracle, GCP, AWS, and 15 other providers.
* **No lock-in** – Workload specs and CLI workflows build on open-source SkyPilot — not proprietary tooling.

#### Built on Skypilot

The Mithril CLI is built on the open-source [SkyPilot](https://skypilot.co/) framework — adopting its workload definition model, provisioning engine, and multi-cloud integrations.

This means existing SkyPilot workflows run unchanged, and workloads remain portable across all SkyPilot-supported clouds.

Mithril extends this foundation at the capacity layer, integrating auction-based GPU allocation, flexible reservation models, and cost controls directly into the same declarative workflow.
