Data & Storage

Three ways to get data onto your cluster and persist results.

Mechanism
YAML field
What it does
Lifecycle

Cloud buckets

file_mounts

Sync data to a path on the cluster

Data lives in your bucket

Persistent volumes

volumes

Mount Mithril network storage

Survives instance termination

Ephemeral volumes

volumes

Fast local scratch space

Deleted with cluster

Your local code is handled separately by workdir — see Syncing your code.

Cloud buckets

Use file_mounts to make data available at a path on the cluster.

file_mounts:
  /data: s3://my-bucket/training-data
  /models: gs://my-bucket/pretrained

Local files

You can also mount local files and directories. They are uploaded to a temporary cloud bucket behind the scenes and synced to the cluster:

file_mounts:
  /remote/path: /local/path/to/data
  /remote/config.yaml: ./config.yaml

Supported sources

Source
Example

AWS S3

s3://my-bucket/path

Google Cloud Storage (GCS)

gs://my-bucket/path

Cloudflare R2

r2://my-bucket

CoreWeave Object Storage

cw://my-bucket

OCI Object Storage

oci://my-bucket@region

Local directory

/absolute/path or ./relative/path

Local file

./config.yaml

For latest list of supported providers → Cloud Buckets (SkyPilot docs)arrow-up-right

Storage modes

Cloud buckets support three access modes:

Mode
Reads
Writes
Best for

MOUNT (default)

Streamed from bucket

Replicated to bucket and visible to other VMs

Shared datasets, multi-node access

COPY

Pre-fetched to local disk

Local only, not synced back

Fast I/O on data that fits on disk

MOUNT_CACHED

Cached locally on access

Cached locally, uploaded in background before task completes

Checkpoints and large writes

Cloud Buckets (SkyPilot docs)arrow-up-right — advanced storage options, bucket creation, CLI management, and YAML reference

Persistent volumes

Use volumes to mount Mithril network storage that survives instance termination, preemption, and restarts. Ideal for training checkpoints and datasets you reuse across runs.

Create a volume

Use in task YAML

Volume interfaces

Interface

--type

Use case

File (NFS)

mithril-file-share

Shared access across multiple instances

Block

mithril-block

Single instance, high throughput

Not all regions support both interfaces. Check the Mithril console for availability.

Manage volumes

Region matching

Volume and cluster must be in the same region:

Ephemeral storage

Every Mithril instance comes with NVMe SSD ephemeral storage at no extra cost, automatically mounted at /mnt/local. No YAML configuration needed — it's available on every instance by default.

Event
Ephemeral storage

VM restart

Retained

Preemption

Wiped (re-mounted on reallocation)

Termination

Wiped

Host maintenance

Wiped

Use /mnt/local for scratch work, caches, and shuffle buffers. Don't store anything you need to keep — use persistent volumes or object storage (via cloud buckets) for that.

Ephemeral Storagearrow-up-right — instance storage specs and detailed behavior

Syncing your code

workdir syncs a local directory to ~/sky_workdir/ on the cluster:

Your run commands execute from ~/sky_workdir/, so relative paths work as expected. The workdir is re-synced on every ml launch and ml exec.

Choosing the right mechanism

Scenario
Use

Training data

file_mounts with bucket URL

Checkpoints you need across runs

volumes (persistent)

Scratch space for shuffling/caching

/mnt/local Node-local NVMe (ephemeral)

Your code and configs

workdir

Workload output (final weights, LoRA adapters, logs, eval artifacts)

Object storage

Complete example

Last updated