Workloads

The Mithril CLI manages clusters and jobs through SkyPilotarrow-up-right, an open-source framework for running ML workloads. The commands below — launch, exec, status, logs, queue, start, stop, and down — provide the core workflow: provision GPUs, run tasks, monitor progress, and clean up when you're done. For options not covered here, run ml --help for the full option set, or use ml sky for direct access to SkyPilot's CLI.

Quick reference

Command
Purpose

ml launch

Create cluster and run a task (or start interactive setup).

ml exec

Run a task or command on an existing cluster (or open a shell).

ml status

List clusters and jobs; optionally show IP or endpoints.

ml logs

Stream or download job logs; stream provision/autostop logs.

ml queue

Show job queue for cluster(s).

ml start

Start stopped (or failed) cluster(s).

ml stop

Stop cluster(s); keep disks for later ml start.

ml down

Tear down cluster(s) and delete resources.

For full option lists, run ml <command> --help.


ml launch

Provision a cluster and run a task. With no arguments, runs an interactive setup. With a task YAML or inline command, launches the cluster and (unless --detach-run) streams job logs.

When invoked with no arguments, ml launch starts an interactive onboarding flow:

  1. Creates a starter task YAML (task.yaml) with annotated fields for resources, setup, and run.

  2. Optionally adds an AGENTS.md to your project so coding agents (Claude, Cursor, etc.) can discover the Mithril CLI docs bundled with the package.

  3. Prints a ready-to-use launch command and, if AGENTS.md was written, a prompt for an agent-guided walkthrough.

Synopsis

Arguments

Argument
Description

ENTRYPOINT

Optional. Path to a task YAML (.yaml/.yml) or a single bash command in quotes. Omit to start interactive setup.

Options

Option
Description

-c, --cluster NAME

Cluster name. If the cluster exists, reuses it; otherwise creates it.

--gpus SPEC

GPU type and count (e.g. A100:4, H100:8).

--cpus SPEC

vCPU requirement (e.g. 4, 4+).

--memory SPEC

Memory in GB.

--cloud CLOUD

Cloud provider.

--region REGION

Region.

--num-nodes N

Number of nodes.

-i, --idle-minutes-to-autostop N

Auto-stop cluster after N minutes of idleness.

--down

Tear down the cluster after the job finishes.

-d, --detach-run

Do not stream job logs; return after the job is submitted.

-r, --retry-until-up

Retry provisioning until the cluster is up.

-y, --yes

Skip confirmation prompts.

--dryrun

Print cluster name, task, and resources only; do not launch.

-n, --name NAME

Task name.

--workdir DIR

Local directory to sync as the task workdir.

-e, --env KEY=VALUE

Set environment variables (repeatable).

Examples


ml exec

Run a task or command on an existing cluster without re-provisioning. Use a task YAML or a bash command. For interactive use, open a shell with ml exec CLUSTER or use ml ssh CLUSTER.

Synopsis

Arguments

Argument
Description

CLUSTER

Cluster name.

ENTRYPOINT

Optional. Task YAML path or bash command. Omit to open an interactive shell on the head node.

Options

Option
Description

-d, --detach-run

Submit the job and return; do not stream logs.

Additional task and resource options (e.g. --workdir, --gpus, --env) are supported; run ml exec --help for the full list.

Examples


ml status

List clusters and job information. Updates local SSH config so you can ssh CLUSTER or use ml exec. With one cluster, --ip or --endpoints can be used to get connection details.

Synopsis

Arguments

Argument
Description

CLUSTER

Optional. One or more cluster names. Default: all clusters.

Options

Option
Description

-v, --verbose

Show all fields.

-r, --refresh

Query latest status from the cloud (use when clusters change outside Mithril or with autostop).

--ip

Show head node IP (only with exactly one cluster).

--endpoints

Show all exposed endpoints (only with exactly one cluster).

--endpoint PORT

Show URL for the given port (only with exactly one cluster).

--show-managed-jobs / --no-show-managed-jobs

Include in-progress managed jobs (default: show).

--show-services / --no-show-services

Include Sky Serve services (default: show).

--show-pools / --no-show-pools

Include pools (default: show).

--all-users

Include clusters for all users.

Cluster states

State
Description

UP

Ready; provisioning and setup completed.

STOPPED

Stopped; use ml start to restart.

INIT

Provisioning or setup in progress, or cluster in an inconsistent state.

Examples


ml logs

Stream or download job logs, or stream provisioning/autostop logs.

Synopsis

Arguments

Argument
Description

CLUSTER

Cluster name.

JOB_ID

Optional. Job ID(s). If omitted, uses the latest job. For streaming, at most one job; for --sync-down, multiple allowed.

Options

Option
Description

--provision

Stream cluster provisioning logs (provision.log).

--autostop

Stream autostop hook logs.

-w, --worker ID

Worker ID for logs (only with --provision).

-s, --sync-down

Download job logs to ~/sky_logs (multiple job IDs allowed).

--status

Do not show logs; exit with status code: 0 = succeeded, 100 = failed, 101 = not finished, 102 = not found, 103 = cancelled.

--follow / --no-follow

Stream logs continuously (default: follow).

--tail N

Show only the last N lines (0 = all).

Examples


ml queue

Show the job queue for one or more clusters (pending and running jobs; optionally finished).

Synopsis

Arguments

Argument
Description

CLUSTER

Optional. Cluster name(s). Default: all clusters.

Options

Option
Description

-s, --skip-finished

Show only pending and running jobs.

--all-users

Show queue for all users.

Examples


ml start

Start one or more stopped clusters (or retry provisioning/setup for clusters in INIT). No effect if a cluster is already UP.

Synopsis

Arguments

Argument
Description

CLUSTER

Optional. Cluster name(s). Default: all clusters (or the single cluster if only one exists).

Options

Option
Description

-a, --all

Start all clusters.

-y, --yes

Skip confirmation.

-i, --idle-minutes-to-autostop N

Set autostop after N minutes of idleness.

--down

Use autodown (tear down after idleness); requires --idle-minutes-to-autostop.

-r, --retry-until-up

Retry until the cluster is up on availability failures.

-f, --force

Start even if already UP (e.g. to upgrade SkyPilot runtime).

Examples


ml stop

Stop one or more clusters. Billing for instances stops; attached disks are kept and reattached when you ml start. Spot clusters cannot be stopped.

Synopsis

Arguments

Argument
Description

CLUSTER

Optional. Cluster name(s) or glob (e.g. cluster*).

Options

Option
Description

-a, --all

Stop all clusters.

--all-users

Stop all clusters for all users.

-y, --yes

Skip confirmation.

--graceful

Wait for MOUNT_CACHED uploads to complete (cancels current jobs first).

--graceful-timeout N

Timeout in seconds for --graceful.

Examples


ml down

Tear down one or more clusters. All associated resources are deleted and billing stops; data on attached disks is lost.

Synopsis

Arguments

Argument
Description

CLUSTER

Optional. Cluster name(s) or glob (e.g. cluster*).

Options

Option
Description

-a, --all

Tear down all clusters.

--all-users

Tear down all clusters for all users.

-y, --yes

Skip confirmation.

-p, --purge

(Advanced) Remove cluster(s) from SkyPilot’s table even if cloud teardown failed. Use only when troubleshooting; you are responsible for cleaning up leaked resources.

--graceful

Wait for MOUNT_CACHED uploads before terminating (cancels current jobs first).

--graceful-timeout N

Timeout in seconds for --graceful.

Examples

Last updated