Serverless GPU inference for ML models

import requests

response = requests.post(
    headers={"Authorization": "Bearer YOUR_API_TOKEN"},
        "pipeline_id": "pipeline_67d9d8ec36d54c148c70df1f404b0369",
        "data": [
            ["Mountain winds, and babbling springs, and moonlight seas"],
              "seed": 1,
              "num_inference_steps": 50,
              "guidance_scale": 7.5,
              "width": 512,
              "height": 512,
              "eta": 0.0,
              "num_samples": 3
  • Cheaper than AWS or GCP

    Reduced GPU usage with serverless.

  • Up-to-date enterprise hardware

    NVIDIA Ampere and Volta GPUs.

  • Save engineering time

    We handle the cloud infrastructure as you scale.

  • Unlimited requests

    No changes required as your product grows.

  • Reduced cold start

    Low latency and reliable response times.

  • Rapid support

    Personal specialist help.

Custom models

Deploy your own ML models on Pipeline Catalyst

Upload your model and instantly get an inference API endpoint.

With our open source library, you can convert your model to a pipeline and get access to our API within minutes.

Access pre-trained models

State-of-the-art AI models, one API call away.

Explore our list of pre-trained AI models available as an API.

Select one of our top models above or click below to view them all in our dashboard.


Pipeline Catalyst gives you access to our fast hardware on our own data center for running ML models, allowing you to quickly test, validate and deploy your ML.

1 month free
/s of compute
$20 free compute
