Overview

The Mithril CLI removes the manual setup typically required to run ML workloads on remote infrastructure. Workloads are defined declaratively in YAML — specifying code, compute requirements, and runtime configuration in a single spec. Instead of SSHing into nodes or stitching together launch scripts, researchers submit runs directly from these definitions. The CLI handles packaging, scheduling, and execution across Mithril and other clouds, supporting common patterns like model training, offline batch inference, and large-scale evaluation.

Defining a basic workload

# task.yaml
resources:
  infra: mithril
  accelerators: B200:8

num_nodes: 2

setup: |
  pip install -r requirements.txt

run: |
  MASTER_ADDR=$(echo "$SKYPILOT_NODE_IPS" | head -n1)
  torchrun \
    --nnodes=$SKYPILOT_NUM_NODES \
    --nproc_per_node=$SKYPILOT_NUM_GPUS_PER_NODE \
    --master_addr=$MASTER_ADDR \
    --node_rank=$SKYPILOT_NODE_RANK \
    train.py --distributed

Running the workload

Provisioning and scheduling

Before execution, the CLI evaluates available capacity and proposes a cluster configuration:

Features

  • Attach storage – Mount persistent volumes or cloud buckets for datasets, checkpoints, and run outputs.

  • Scale from single-node to distributed training – Provision multi-node GPU clusters with InfiniBand networking automatically configured.

  • Cost protection – Set maximum price limits to control spend.

  • Idle auto-shutdown – Instances pause automatically when GPUs are no longer in use.

  • AI-native – Built with coding agents in mind.

  • Multi-cloud ready – Launch workloads across Mithril, Nebius, Oracle, GCP, AWS, and 15 other providers.

  • No lock-in – Workload specs and CLI workflows build on open-source SkyPilot — not proprietary tooling.

Built on Skypilot

The Mithril CLI is built on the open-source SkyPilotarrow-up-right framework — adopting its workload definition model, provisioning engine, and multi-cloud integrations.

This means existing SkyPilot workflows run unchanged, and workloads remain portable across all SkyPilot-supported clouds.

Mithril extends this foundation at the capacity layer, integrating auction-based GPU allocation, flexible reservation models, and cost controls directly into the same declarative workflow.

Last updated