Workloads

The Mithril CLI manages clusters and jobs through SkyPilot, an open-source framework for running ML workloads. The commands below — launch, exec, status, logs, queue, start, stop, and down — provide the core workflow: provision GPUs, run tasks, monitor progress, and clean up when you're done. For options not covered here, run ml --help for the full option set, or use ml sky for direct access to SkyPilot's CLI.

Quick reference

Command

Purpose

ml launch

Create cluster and run a task (or start interactive setup).

ml exec

Run a task or command on an existing cluster (or open a shell).

ml status

List clusters and jobs; optionally show IP or endpoints.

ml logs

Stream or download job logs; stream provision/autostop logs.

ml queue

Show job queue for cluster(s).

ml start

Start stopped (or failed) cluster(s).

ml stop

Stop cluster(s); keep disks for later ml start.

ml down

Tear down cluster(s) and delete resources.

For full option lists, run ml <command> --help.

ml launch

Provision a cluster and run a task. With no arguments, runs an interactive setup. With a task YAML or inline command, launches the cluster and (unless --detach-run) streams job logs.

When invoked with no arguments, ml launch starts an interactive onboarding flow:

Creates a starter task YAML (task.yaml) with annotated fields for resources, setup, and run.
Optionally adds an AGENTS.md to your project so coding agents (Claude, Cursor, etc.) can discover the Mithril CLI docs bundled with the package.
Prints a ready-to-use launch command and, if AGENTS.md was written, a prompt for an agent-guided walkthrough.

Synopsis

ml launch [ENTRYPOINT] [OPTIONS]

Arguments

Argument

Description

ENTRYPOINT

Optional. Path to a task YAML (.yaml/.yml) or a single bash command in quotes. Omit to start interactive setup.

Options

Option

Description

-c, --cluster NAME

Cluster name. If the cluster exists, reuses it; otherwise creates it.

--gpus SPEC

GPU type and count (e.g. A100:4, H100:8).

--cpus SPEC

vCPU requirement (e.g. 4, 4+).

--memory SPEC

Memory in GB.

--cloud CLOUD

Cloud provider.

--region REGION

Region.

--num-nodes N

Number of nodes.

-i, --idle-minutes-to-autostop N

Auto-stop cluster after N minutes of idleness.

--down

Tear down the cluster after the job finishes.

-d, --detach-run

Do not stream job logs; return after the job is submitted.

-r, --retry-until-up

Retry provisioning until the cluster is up.

-y, --yes

Skip confirmation prompts.

--dryrun

Print cluster name, task, and resources only; do not launch.

-n, --name NAME

Task name.

--workdir DIR

Local directory to sync as the task workdir.

-e, --env KEY=VALUE

Set environment variables (repeatable).

Examples

ml launch task.yaml -c mycluster
ml launch --gpus B200:8 -c dev

ml exec

Run a task or command on an existing cluster without re-provisioning. Use a task YAML or a bash command. For interactive use, open a shell with ml exec CLUSTER or use ml ssh CLUSTER.

Synopsis

ml exec CLUSTER [ENTRYPOINT ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Cluster name.

ENTRYPOINT

Optional. Task YAML path or bash command. Omit to open an interactive shell on the head node.

Options

Option

Description

-d, --detach-run

Submit the job and return; do not stream logs.

Additional task and resource options (e.g. --workdir, --gpus, --env) are supported; run ml exec --help for the full list.

Examples

ml exec my-cluster task.yaml
ml exec my-cluster python train.py

ml status

List clusters and job information. Updates local SSH config so you can ssh CLUSTER or use ml exec. With one cluster, --ip or --endpoints can be used to get connection details.

Synopsis

ml status [CLUSTER ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Optional. One or more cluster names. Default: all clusters.

Options

Option

Description

-v, --verbose

Show all fields.

-r, --refresh

Query latest status from the cloud (use when clusters change outside Mithril or with autostop).

--ip

Show head node IP (only with exactly one cluster).

--endpoints

Show all exposed endpoints (only with exactly one cluster).

--endpoint PORT

Show URL for the given port (only with exactly one cluster).

--show-managed-jobs / --no-show-managed-jobs

Include in-progress managed jobs (default: show).

--show-services / --no-show-services

Include Sky Serve services (default: show).

--show-pools / --no-show-pools

Include pools (default: show).

--all-users

Include clusters for all users.

Cluster states

State

Description

Ready; provisioning and setup completed.

STOPPED

Stopped; use ml start to restart.

INIT

Provisioning or setup in progress, or cluster in an inconsistent state.

Examples

ml status
ml status my-cluster
ml status --refresh
ml status my-cluster --ip

ml logs

Stream or download job logs, or stream provisioning/autostop logs.

Synopsis

ml logs CLUSTER [JOB_ID ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Cluster name.

JOB_ID

Optional. Job ID(s). If omitted, uses the latest job. For streaming, at most one job; for --sync-down, multiple allowed.

Options

Option

Description

--provision

Stream cluster provisioning logs (provision.log).

--autostop

Stream autostop hook logs.

-w, --worker ID

Worker ID for logs (only with --provision).

-s, --sync-down

Download job logs to ~/sky_logs (multiple job IDs allowed).

--status

Do not show logs; exit with status code: 0 = succeeded, 100 = failed, 101 = not finished, 102 = not found, 103 = cancelled.

--follow / --no-follow

Stream logs continuously (default: follow).

--tail N

Show only the last N lines (0 = all).

Examples

ml logs my-cluster 1
ml logs my-cluster --provision
ml logs my-cluster --status
ml logs my-cluster -s 1 2 3

ml queue

Show the job queue for one or more clusters (pending and running jobs; optionally finished).

Synopsis

ml queue [CLUSTER ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Optional. Cluster name(s). Default: all clusters.

Options

Option

Description

-s, --skip-finished

Show only pending and running jobs.

--all-users

Show queue for all users.

Examples

ml queue
ml queue my-cluster
ml queue my-cluster --skip-finished

ml start

Start one or more stopped clusters (or retry provisioning/setup for clusters in INIT). No effect if a cluster is already UP.

Synopsis

ml start [CLUSTER ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Optional. Cluster name(s). Default: all clusters (or the single cluster if only one exists).

Options

Option

Description

-a, --all

Start all clusters.

-y, --yes

Skip confirmation.

-i, --idle-minutes-to-autostop N

Set autostop after N minutes of idleness.

--down

Use autodown (tear down after idleness); requires --idle-minutes-to-autostop.

-r, --retry-until-up

Retry until the cluster is up on availability failures.

-f, --force

Start even if already UP (e.g. to upgrade SkyPilot runtime).

Examples

ml start my-cluster
ml start cluster1 cluster2
ml start -a

ml stop

Stop one or more clusters. Billing for instances stops; attached disks are kept and reattached when you ml start. Spot clusters cannot be stopped.

Synopsis

ml stop [CLUSTER ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Optional. Cluster name(s) or glob (e.g. cluster*).

Options

Option

Description

-a, --all

Stop all clusters.

--all-users

Stop all clusters for all users.

-y, --yes

Skip confirmation.

--graceful

Wait for MOUNT_CACHED uploads to complete (cancels current jobs first).

--graceful-timeout N

Timeout in seconds for --graceful.

Examples

ml stop my-cluster
ml stop cluster1 cluster2
ml stop "cluster*"
ml stop -a

ml down

Tear down one or more clusters. All associated resources are deleted and billing stops; data on attached disks is lost.

Synopsis

ml down [CLUSTER ...] [OPTIONS]

Arguments

Argument

Description

CLUSTER

Optional. Cluster name(s) or glob (e.g. cluster*).

Options

Option

Description

-a, --all

Tear down all clusters.

--all-users

Tear down all clusters for all users.

-y, --yes

Skip confirmation.

-p, --purge

(Advanced) Remove cluster(s) from SkyPilot’s table even if cloud teardown failed. Use only when troubleshooting; you are responsible for cleaning up leaked resources.

--graceful

Wait for MOUNT_CACHED uploads before terminating (cancels current jobs first).

--graceful-timeout N

Timeout in seconds for --graceful.

Examples

ml down my-cluster
ml down cluster1 cluster2
ml down "cluster*"
ml down -a

PreviousCLI Reference NextInfrastructure

Last updated 1 day ago

Good evening

hashtagQuick reference

hashtagml launch

hashtagml exec

hashtagml status

hashtagml logs

hashtagml queue

hashtagml start

hashtagml stop

hashtagml down

Quick reference

ml launch

ml exec

ml status

ml logs

ml queue

ml start

ml stop

ml down