Inference API overview
Overview
Mithril Inference API lets you execute a large number of requests asynchronously. It is ideal for tasks that do not require immediate responses, such as:
Running evaluations
Classifying large datasets
Bulk inference tasks
Using the Batch API provides:
Reduced Costs: Significantly lower pricing compared to synchronous requests.
Higher Throughput: Greater rate limits for large-scale operations.
Asynchronous Convenience: Submit requests and retrieve results later, within a clear completion window.
The API largely follows the structure as OpenAI's Batch API. This API supports uploading batch jobs, checking job status, and retrieving results.
Input file format
The input for batch inference uses the .jsonl
OpenAI batch file format, which consists of a series of JSON objects, each on a new line. Each line represents a separate request.
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
Implementation notes:
Currently, Mithril only support the
/v1/chat/completions
endpoint.Every request in the file should be a complete and valid request to the specified endpoint.
The
custom_id
field is used to track individual requests within the batch.All requests in a file must be made to the same model.
Retrieving results
Upon completion, results and any errors are provided as downloadable .jsonl
files accessible via the Batch API endpoints.
Important notes:
Output and input files are stored for a maximum of 30 days from the job completion date. After 30 days, files are permanently purged and cannot be retrieved.
Jobs that exceed computational limits or runtime thresholds will be automatically paused, checkpointed, and returned to you partially complete.
Batch lifecycle
validating
Your input file is being validated before the batch begins.
failed
Your batch failed validation or encountered an error.
in_progress
Your batch is actively being processed.
completed
Your batch successfully completed; results are ready.
expired
Your batch did not complete within the 24-hour time window.
cancelling
Your batch cancellation request is currently being processed.
cancelled
Your batch was cancelled successfully.
Limits and constraints
We impose certain system limits by default. If you need to increase the limits, please reach out to our support.
Maximum File Size
150 MB
Input files larger than this limit will be rejected.
Maximum Requests per Job
25,000 per batch
Each batch job can include up to 25,000 individual requests.
Completion Window
24 hours
We maintain a target SLA of 24 hours. This means that we expect most jobs to complete with high confidence within at least 24 hours of their submission time.
Maximum Jobs per Day
300
Total jobs per day
Maximum Requests per Day
200,000
Total requests per day
Maximum Tokens per day
500,000,000
Total input and output tokens per day
Billing
Mithril Inference operates on a pay-as-you-go model, charging usage based on completed batch jobs and the input/output token pricing specified. Learn more about how billing works at Billing.
Model support
The batch API currently supports a variety of popular language models, and we continuously expand this selection based on user needs and feedback.
To check which models are available for batch processing, visit our Active Models page.
If you're interested in a model not currently listed or have specific model-related requests, please reach out at [email protected].
Data retention and privacy
Batch input and output files are securely stored for 30 days, after which they're permanently deleted.
Ensure retrieval of your outputs within this timeframe to avoid data loss.
Last updated