Skip to content

Run any AI model
as an API

within seconds

Low latency serverless API
to run and deploy ML models

  • customer logo for Vellum

    Company: Vellum

  • customer logo for Charisma AI

    Company: Charisma AI

  • customer logo for Hypotenuse AI

    Company: Hypotenuse AI

  • customer logo for SensusFuturis

    Company: SensusFuturis

  • customer logo for Seelab

    Company: Seelab

  • customer logo for Renovate AI

    Company: Renovate AI

Our product is Crafted through millions of ML runs


Developers using our API


AI models deployed

The easiest way to get an API
endpoint from any ML model

All the infrastructure required to run AI models with a simple API call

curl -X POST ''
  -H 'Authorization: Bearer YOUR_TOKEN'
  -H 'Content-Type: application/json'
  -d '{"pipeline": "meta/llama2-70B-chat:latest", "inputs": [{"type":"string","value":"Hello World!"}]}'
  • Only pay for inference time

    Pay per second with serverless pricing on our shared cluster. Pay only for the inference you use.

  • Inference within 0.035s

    Within a few milliseconds our scheduler decides the optimal strategy of queuing, routing and scaling.

  • API first and Python lovers

    RESTful API to call your model from anywhere. Python SDK to upload your own models.

How to get started

Run any model built by the community, dive into one of our tutorials, or start uploading your own models.

Beginner friendly

Explore AI models built by the community

Our community uploads AI models and makes them available for everyone to use. They are ready to try and use as an API.


Upload your own AI pipeline

A pipeline contains all the code required to run your AI model as an endpoint. You can define the inputs to the endpoint, any pre-processing code, the inference pass, post-processing code and outputs to be returned back from the endpoint.

Learn how to leverage our cold-start optimizations, createcustom environments, enable debugging mode & logging, load model from file, and much more.


def foo(bar: str) -> str:
    return f"Input string: {bar}"

with Pipeline() as builder:
    bar = Variable(str)
    output_1 = foo(bar)


my_pipeline = builder.get_pipeline()"test")


Pay per second

Start from as little as $0.1/h

$20 free credits

Run your models on our shared cluster and pay
only for the inference time.

View pricing


Looking to run AI on your own infrastructure?

Our enterprise solution offers maximum privacy and scale. Run AI models as an API within your own cloud or infrastructure of choice.

Learn about our Enterprise solution
Enterpise diagram showing mystic ontop of your cloud providers, logos shown are AWS, Azure, and GCP