flow.sdk.models
TaskSpec
Complete task specification - the core IR model.
Fields:
api_version (
Literal
): IR schema version (default:flow.ir/v1
)name (
str
): Task namecommand (
list
): Command to executeresources (
ResourceSpec
): Resource requirementsmounts (
list
): Volume mountsparams (
RunParams
): Runtime parameters
ResourceSpec
Hardware resource requirements.
Fields:
gpus (
int
): Number of GPUs required (default:0
)gpu_type (
str | None
): GPU type (e.g., 'H100-80GB')cpus (
int
): Number of CPUs required (default:4
)memory_gb (
int
): Memory in GB (default:16
)accelerator_hints (
dict
): Hints for accelerator configuration (MIG, NVLink, SXM/PCIe, compute capability)
MountSpec
Volume mount specification.
Fields:
kind (
Literal
): Type of mountsource (
str
): Source path or URItarget (
str
): Target mount path in containerread_only (
bool
): Whether mount is read-only (default:True
)cache (
dict[str, str] | None
): Cache configuration for remote mounts
RunParams
Runtime parameters for task execution.
Fields:
env (
dict
): Environment variablesworking_dir (
str | None
): Working directoryretry (
int
): Number of retries on failure (default:0
)preemptible_ok (
bool
): Allow preemptible instances (default:False
)time_limit_s (
int | None
): Time limit in secondsimage (
str | None
): Container image to use
TaskStatus
Task lifecycle states.
Values:
PENDING =
"pending"
RUNNING =
"running"
PAUSED =
"paused"
PREEMPTING =
"preempting"
COMPLETED =
"completed"
FAILED =
"failed"
CANCELLED =
"cancelled"
InstanceStatus
Status of a compute instance.
Values:
PENDING =
"pending"
RUNNING =
"running"
STOPPED =
"stopped"
TERMINATED =
"terminated"
ReservationStatus
Reservation lifecycle states.
Values:
SCHEDULED =
"scheduled"
ACTIVE =
"active"
EXPIRED =
"expired"
FAILED =
"failed"
StorageInterface
Storage interface type.
Values:
BLOCK =
"block"
FILE =
"file"
Retries
Retry policy with fixed or exponential backoff.
Fields:
max_retries (
int
): Maximum retry attempts (0-10) (default:3
)backoff_coefficient (
float
): Delay multiplier between retries (default:2.0
)initial_delay (
float
): Initial delay in seconds before first retry (default:1.0
)max_delay (
float | None
): Maximum delay between retries (seconds)
Retries.get_delay
get_delay(self, attempt: int) -> float
Calculate delay for a given retry attempt.
Parameters:
attempt: Retry attempt number (1-based)
Returns:
Delay in seconds before this retry attempt
Retries.validate_delays
validate_delays(self) -> Retries
Ensure max_delay is greater than initial_delay if set.
Task
Task handle with lifecycle control (status, logs, wait, cancel, ssh).
Fields:
task_id (
str
): Task UUIDname (
str
): Human-readable namestatus (
TaskStatus
): Execution stateconfig (
flow.sdk.models.task_config.TaskConfig | None
): Original configurationcreated_at (
datetime
)started_at (
datetime.datetime | None
)completed_at (
datetime.datetime | None
)instance_created_at (
datetime.datetime | None
): Creation time of current instance (for preempted/restarted tasks)instance_type (
str
)num_instances (
int
)region (
str
)cost_per_hour (
str
): Hourly costtotal_cost (
str | None
): Accumulated costcreated_by (
str | None
): Creator user IDssh_host (
str | None
): SSH endpointssh_port (
int | None
): SSH port (default:22
)ssh_user (
str
): SSH user (default:ubuntu
)shell_command (
str | None
): Complete shell commandendpoints (
dict
): Exposed service URLsinstances (
list
): Instance identifiersmessage (
str | None
): Human-readable statusprovider_metadata (
dict
): Provider-specific state and metadata (e.g., Mithril bid status, preemption reasons)
Task.cancel
cancel(self) -> None
Task.get_instances
get_instances(self) -> list[Instance]
Task.get_user
get_user(self) -> Any | None
Task.logs
logs(
self,
follow: bool = False,
tail: int = 100,
stderr: bool = False,
source: str | None = None,
stream: str | None = None
) -> str | Iterator[str]
Task.refresh
refresh(self) -> None
Task.result
result(self) -> Any
Task.shell
shell(
self,
command: str | None = None,
node: int | None = None,
progress_context = None,
record: bool = False
) -> None
Task.stop
stop(self) -> None
Task.wait
wait(self, timeout: int | None = None) -> None
TaskConfig
Complete task specification used by Flow.run()
.
One obvious way to express requirements; fails fast with clear validation.
Fields:
name (
str
): Task identifier (default:flow-task
)unique_name (
bool
): Append unique suffix to name to ensure uniqueness (default:True
)instance_type (
str | None
): Explicit instance typemin_gpu_memory_gb (
int | None
): Minimum GPU memory requirementcommand (
str | list[str] | None
): Command to execute when the task starts. Supports three formats: list format (recommended for precise control), single-line string (shell execution), or multi-line script (for complex workflows). Multi-line commands or scripts starting with shebang (#!) are automatically detected and executed as shell scripts. If not specified, defaults to 'sleep infinity' for interactive sessions.Examples:
['python', 'train.py', '--epochs', '10']
'python train.py --epochs 10'
Multi-line script:
#!/bin/bash pip install -r requirements.txt python train.py python evaluate.py
'nvidia-smi'
image (
str
): Container image (default:nvidia/cuda:12.1.0-runtime-ubuntu22.04
)env (
dict
): Environmentworking_dir (
str
): Container working directory (default:/workspace
)volumes (
list
)data_mounts (
list
): Data to mountports (
list
): Container/instance ports to expose. High ports only (>=1024).allow_docker_cache (
bool
): Allow mounting a volume at /var/lib/docker to persist Docker image layers. Single-node tasks only; use with caution. (default:False
)retries (
flow.sdk.models.retry.Retries | None
): Advanced retry configuration for task submission/executionmax_price_per_hour (
float | None
): Maximum hourly price (USD)max_run_time_hours (
float | None
): Maximum runtime hours; 0 or None disables runtime monitoringmin_run_time_hours (
float | None
): Minimum guaranteed runtime hoursdeadline_hours (
float | None
): Hours from submission until deadlinessh_keys (
list
): Authorized SSH key IDsallocation_mode (
Literal
): Allocation strategy: 'spot' (default, preemptible), 'reserved' (scheduled capacity), or 'auto'. (default:spot
)reservation_id (
str | None
): Target an existing reservation (advanced).scheduled_start_time (
str | None
): When allocation_mode='reserved', schedule start (UTC).reserved_duration_hours (
int | None
): When allocation_mode='reserved', reservation duration in hours (3-336).region (
str | None
): Target regionnum_instances (
int
): Instance count (default:1
)priority (
Literal
): Task priority tier affecting limit price (default:med
)distributed_mode (
Optional
): Distributed rendezvous mode when num_instances > 1: 'auto' lets Flow assign rank and leader IP; 'manual' expects user-set FLOW_* envs.internode_interconnect (
str | None
): Preferred inter-node network (e.g., InfiniBand, IB_3200, Ethernet)intranode_interconnect (
str | None
): Preferred intra-node interconnect (e.g., SXM5, PCIe)upload_code (
bool
): Upload current directory code to job (default:True
)dev_vm (
bool | None
): Hint: this task is a developer VM. When True, provider background code uploads are disabled and Docker startup adapts accordingly. If None, falls back to FLOW_DEV_VM env.upload_strategy (
Literal
): Strategy for uploading code to instances:auto: Use SCP for large (>8KB), embedded for small
embedded: Include in startup script (10KB limit)
scp: Transfer after instance starts (no size limit)
none: No code upload (default:
auto
)
terminate_on_exit (
bool
): When true, a watcher cancels the task as soon as the main container exits. (default:False
)upload_timeout (
int
): Maximum seconds to wait for code upload (60-3600) (default:600
)code_root (
str | pathlib._local.Path | None
): Local project directory to upload when upload_code=True. Defaults to the current working directory when not set.
TaskConfig.to_spec
to_spec(self)
Convert TaskConfig into canonical IR TaskSpec.
Keep mapping minimal and user-facing config simple. Code is modeled as a first-class mount in IR when upload_code=True
, without extra env flags or strategy knobs. Providers decide delivery details.
TaskConfig.to_yaml
to_yaml(self, path: str | Path) -> None
TaskConfig.validate_config
validate_config(self) -> TaskConfig
VolumeSpec
Persistent volume specification (create or attach).
Fields:
name (
str | None
): Human-readable name (3-64 chars, lowercase alphanumeric with hyphens)size_gb (
int
): Size in GB (default:1
)mount_path (
str | None
): Mount path in container (default: /volumes/)volume_id (
str | None
): ID of existing volume to attachinterface (
StorageInterface
): Storage interface type (default:StorageInterface.BLOCK
)iops (
int | None
): Provisioned IOPSthroughput_mb_s (
int | None
): Provisioned throughput
VolumeSpec.validate_volume_spec
validate_volume_spec(self) -> VolumeSpec
Validate volume specification.
MountSpec
Mount specification for volumes, S3, or bind mounts.
Fields:
source (
str
): Source URL or pathtarget (
str
): Mount path in containermount_type (
Literal
): Type of mount (default:bind
)options (
dict
): Provider-specific optionscache_key (
str | None
): Key for caching mount metadatasize_estimate_gb (
float | None
): Estimated size for planning
GPUSpec
Immutable GPU hardware specification used for matching.
Fields:
vendor (
str
): GPU vendor (default:NVIDIA
)model (
str
): GPU model (e.g., A100, H100)memory_gb (
int
): GPU memory in GBmemory_type (
str
): Memory type (HBM2e, HBM3, GDDR6) (default: ``)architecture (
str
): GPU architecture (Ampere, Hopper) (default: ``)compute_capability (
tuple
): CUDA compute capability (default:(0, 0)
)tflops_fp32 (
float
): FP32 performance in TFLOPS (default:0.0
)tflops_fp16 (
float
): FP16 performance in TFLOPS (default:0.0
)memory_bandwidth_gb_s (
float
): Memory bandwidth in GB/s (default:0.0
)
CPUSpec
CPU specification.
Fields:
vendor (
str
): CPU vendor (default:Intel
)model (
str
): CPU model (default:Xeon
)cores (
int
): Number of CPU coresthreads (
int
): Number of threads (0 = same as cores) (default:0
)base_clock_ghz (
float
): Base clock speed in GHz (default:0.0
)
CPUSpec.set_threads_default
set_threads_default(self) -> CPUSpec
Default threads
to cores
when not specified.
MemorySpec
System memory specification.
Fields:
size_gb (
int
): Memory size in GBtype (
str
): Memory type (default:DDR4
)speed_mhz (
int
): Memory speed in MHz (default:3200
)ecc (
bool
): ECC memory support (default:True
)
StorageSpec
Storage specification.
Fields:
size_gb (
int
): Storage size in GBtype (
str
): Storage type (NVMe, SSD, HDD) (default:NVMe
)iops (
int | None
): IOPS ratingbandwidth_mb_s (
int | None
): Bandwidth in MB/s
NetworkSpec
Network specification.
Fields:
intranode (
str
): Intra-node interconnect (SXM4, SXM5, PCIe) (default: ``)internode (
str | None
): Inter-node network (InfiniBand, Ethernet)bandwidth_gbps (
float | None
): Network bandwidth in Gbps
InstanceType
Canonical instance type specification (immutable).
Fields:
gpu (
GPUSpec
)gpu_count (
int
): Number of GPUscpu (
CPUSpec
)memory (
MemorySpec
)storage (
StorageSpec
)network (
NetworkSpec
)id (
uuid.UUID | None
): Unique instance type IDaliases (
set
): Alternative namescreated_at (
datetime
)version (
int
) (default:1
)
InstanceType.compute_id_and_aliases
compute_id_and_aliases(self) -> InstanceType
Compute a stable ID and default aliases.
InstanceMatch
Matched instance with price and availability.
Fields:
instance (
InstanceType
)region (
str
)availability (
int
): Number of available instancesprice_per_hour (
float
): Price in USD per hourmatch_score (
float
): Match quality score (default:1.0
)
Instance
Compute instance entity.
Fields:
instance_id (
str
): Instance UUIDtask_id (
str
): Parent task IDstatus (
InstanceStatus
): Instance statessh_host (
str | None
): Public hostname/IPprivate_ip (
str | None
): VPC-internal IPcreated_at (
datetime
)terminated_at (
datetime.datetime | None
)
AvailableInstance
Available compute resource.
Fields:
allocation_id (
str
): Resource allocation IDinstance_type (
str
): Instance type identifierregion (
str
): Availability regionprice_per_hour (
float
): Hourly price (USD)gpu_type (
str | None
): GPU typegpu_count (
int | None
): Number of GPUscpu_count (
int | None
): Number of CPUsmemory_gb (
int | None
): Memory in GBavailable_quantity (
int | None
): Number availablestatus (
str | None
): Allocation statusexpires_at (
datetime.datetime | None
): Expiration timeinternode_interconnect (
str | None
): Inter-node network (e.g., InfiniBand, IB_3200, Ethernet)intranode_interconnect (
str | None
): Intra-node interconnect (e.g., SXM5, PCIe)
Reservation
Reservation details returned by providers.
Fields:
reservation_id (
str
): Reservation identifiername (
str | None
): Display namestatus (
ReservationStatus
): Lifecycle stateinstance_type (
str
): Instance type identifierregion (
str
): Region/zonequantity (
int
): Number of instancesstart_time_utc (
datetime
): Scheduled start time (UTC)end_time_utc (
datetime.datetime | None
): Scheduled end time (UTC)price_total_usd (
float | None
): Quoted/actual total priceprovider_metadata (
dict
)
ReservationSpec
Provider-agnostic spec for creating a reservation.
Fields:
name (
str | None
): Optional reservation name for displayproject_id (
str | None
): Provider project/workspace IDinstance_type (
str
): Explicit instance type (e.g., 'a100', '8xh100')region (
str
): Target region/zone for the reservationquantity (
int
): Number of instances to reserve (default:1
)start_time_utc (
datetime
): Reservation start time (UTC)duration_hours (
int
): Reservation duration in hours (3-336)ssh_keys (
list
): Authorized SSH key IDsvolumes (
list
): Volume IDs to attach (provider-specific)startup_script (
str | None
): Optional startup script executed when instances boot
FlowConfig
Flow SDK configuration settings.
Immutable configuration for API authentication and default behaviors. Typically loaded from environment variables or config files.
Fields:
api_key (
str
): Authentication keyproject (
str
): Project identifierregion (
str
): Default deployment region (default:us-central1-b
)api_url (
str
): API base URL (default:https://api.mithril.ai
)
Project
Project metadata.
Fields:
name (
str
): Project identifierregion (
str
): Primary region
ValidationResult
Configuration validation result.
Fields:
is_valid (
bool
): Validation statusprojects (
list
): Accessible projectserror_message (
str | None
): Validation error
SubmitTaskRequest
Task submission request.
Fields:
config (
TaskConfig
): Task specificationwait (
bool
): Block until complete (default:False
)dry_run (
bool
): Validation only (default:False
)
SubmitTaskResponse
Task submission result.
Fields:
task_id (
str
): Assigned task IDstatus (
TaskStatus
): Initial statemessage (
str | None
): Status details
ListTasksRequest
Task listing request.
Fields:
status (
flow.sdk.models.enums.TaskStatus | None
): Status filterlimit (
int
): Page size (default:100
)offset (
int
): Skip count (default:0
)
ListTasksResponse
Task listing result.
Fields:
tasks (
list
): Task collectiontotal (
int
): Total availablehas_more (
bool
): Pagination indicator
User
User identity information.
Fields:
user_id (
str
): Unique user identifier (e.g., 'user_kfV4CCaapLiqCNlv')username (
str
): Username for displayemail (
str
): User email address
Volume
Backwards-compatible alias to the canonical Volume model.
Kept for import stability of the legacy Volume class while delegating to the real implementation in the volume module, which supports both persistent volumes and bind mounts (local/remote/read_only).
Fields:
local (
str | None
): Source path on hostremote (
str | None
): Target path in containerread_only (
bool | None
): Mount read-onlyvolume_id (
str | None
): Volume IDname (
str | None
): Volume namesize_gb (
int | None
): Capacity (GB)region (
str | None
): Storage regioninterface (
flow.sdk.models.enums.StorageInterface | None
): Storage interface typeattached_to (
list
): Attached instance IDscreated_at (
Any | None
): Creation timestamp
Last updated