sorry can't get this for you

5 minutes read

September 2, 2022

Run Machine Learning ONNX models in one line of python

TUTORIAL: How to deploy serverless GPU machine learning on

I’ve been working on a new serverless GPU inference feature on that I’m proud to be releasing this week.

You’ll be able to deploy ONNX formatted machine learning models with a single line in python and serve API inference calls on GPUs.

The simplicity and cost effectiveness of this service is really *chef’s kiss*. As an ML app developer a year ago, who struggled with deployment and ended up settling for slow serverless cpu infra that had long spin up times, this is my gift to my past self.

I’ll go through a simple example of deploying an ML model for production via ONNX with the pipeline-ai python package in 5 steps:

  1. Convert your model to ONNX format
  2. Create an account on
  3. Install pipeline-ai python package
  4. Upload your model
  5. Make inference API calls


Open Neural Network Exchange, see:

In this example, we will be using an open-source image background removal ML model called MODNet — you can download the pretrained ONNX model as linked to on the official MODNet repo here.If you’re already ONNX pilled and just want to follow the tutorial, feel free to skip to step 2.

In its own words, ONNX is:

“an open format built to represent machine learning models”

Developed by Microsoft, it aims to be a cross platform standard for ML models with conversion support for all major ML framework formats. If you’re looking to deploy ML models in production, it’s a really good idea to use ONNX. Not only can you easily get a 3x reduction in model size, ONNX Runtime (ONNX’s inference engine) has built in inference optimisations that can speed up inference as much as 17x.

ONNX is actively maintained (it recently extended its support for transformer models) and has great documentation. Here are examples for converting your own ML models into ONNX format. If you want to read more about ONNX runtime, you can find more info here.

2. Create an account on is a cloud service that provides serverless GPU inference for ML models. We’ll upload our ONNX models there, where we can then make API calls to run them on GPUs.Make an account here: have a great trial offer for new users to get started. You get $20 in compute credit (equivalent to 10 hours of GPU compute) and one month of the developer subscription free of charge (usually $12.99).You only pay while your model is running inference (there’s not even billing for spin up times of servers!) meaning our service is truely serverless and the savings on cost compared to renting GPU servers on an hourly basis is tremendous.Our pricing page is here. TL;DR billing per millisecond of inference time at $0.00055/ms ($2/hour) with a fixed monthly charge of $12.99.

3. Install pipeline-ai python package

To install the latest version of pipeline-ai, make sure you have pip installed on python version 3.9. Then run the following command in terminal:

1pip install pipeline-ai

4. Upload your model

Finally, we’re ready to do some programming. We’ll need the file path of our downloaded modnet.onnx file from earlier and our API token that we can get from the Pipeline dashboard in: Settings → API Tokens.
Now, let’s run the ONNX model upload script.
1from pipeline import PipelineCloud, onnx_to_pipeline
3# This line creates a pipeline from our onnx file
4onnx_pipeline = onnx_to_pipeline("MODNET_FILEPATH")
6# Authenticate with PipelineCloud
7api = PipelineCloud(token="YOUR_API_TOKEN")
9# Upload pipeline to PipelineCloud
10uploaded_pipeline = api.upload_pipeline(onnx_pipeline) 
11# Keep track of the returned pipeline id for making API calls
12print(f"Uploaded pipeline: {}")
If the upload is successful, a pipeline id string will be returned (pipelines are our abstraction of a sequence of runnable functions/models within the PipelineCloud architecture). You will need the pipeline id string to make API calls to that pipeline for inference.If you’ve followed this far, congratulations! 🥳 You’ve now uploaded your first ONNX model to PipelineCloud!

5. Make inference API calls

The MODNet model removes the background from portrait photos. Here is the result from the example we’re about to run (you can download the starting image of Dr. Mike Levin here):

Portrait photo | Predicted alpha matte | Alpha composite

The ONNX model takes a preprocessed image as input and returns an alpha matte. Compositing the alpha matte with the original image gives us the foreground result. Pretty Neat.Pre and post processing takes up most of our python script; there’s actually only 2 lines that interact with PipelineCloud.
1import cv2
2import numpy as np
3from PIL import Image
4from pipeline import PipelineCloud
6# read image img = cv2.imread('IMAGE_FILEPATH')
7img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
8im_h, im_w, im_c = img.shape
10def preprocessing(im):
11    # determines input resolution to MODNet's model
12    ref_size = 512
13    # Get resized dim for MODNet input
14    def get_resize(im_h, im_w, ref_size):
15        if im_w >= im_h:
16            im_rh = ref_size
17            im_rw = int(im_w / im_h * ref_size)
18        elif im_w < im_h:
19            im_rw = ref_size
20            im_rh = int(im_h / im_w * ref_size)
22        im_rw = im_rw - im_rw % 32
23        im_rh = im_rh - im_rh % 32
25        return im_rw, im_rh
27    # unify image channels to 3
28    if len(im.shape) == 2:
29        im = im[:, :, None]
30    if im.shape[2] == 1:
31        im = np.repeat(im, 3, axis=2)
32    elif im.shape[2] == 4:
33        im = im[:, :, 0:3]
35    # normalize values to scale it between -1 to 1
36    im = (im - 127.5) / 127.5
37    # get resize dimensions for MODNet inference
38    x, y = get_resize(im_h, im_w, ref_size)
40    # resize image
41    im = cv2.resize(im, (x,y), interpolation = cv2.INTER_AREA)
42    # prepare input shape
43    im = np.transpose(im)
44    im = np.swapaxes(im, 1, 2)
45    im = np.expand_dims(im, axis = 0).astype('float32')
47    return im
49def post_processing(result,im):
50    matte = (np.squeeze(result) * 255).astype('uint8')
51    # resize matte to original image dim
52    matte = cv2.resize(matte, (im_w, im_h), interpolation = cv2.INTER_AREA)
54    def combined_display(image, matte):
55        # calculate display resolution
56        w, h = image.width, image.height
57        rw, rh = 800, int(h * 800 / (3 * w))
59        # obtain predicted foreground
60        image = np.asarray(image)
61        if len(image.shape) == 2:
62            image = image[:, :, None]
63        if image.shape[2] == 1:
64            image = np.repeat(image, 3, axis=2)
65        elif image.shape[2] == 4:
66            image = image[:, :, 0:3]
67        matte = np.repeat(np.asarray(matte)[:, :, None], 3, axis=2) / 255
68        foreground = image * matte + np.full(image.shape, 255) * (1 - matte)  
70        # combine image, foreground, and alpha into one line
71        combined = np.concatenate((image, matte * 255, foreground), axis=1) 
72        combined = Image.fromarray(np.uint8(combined)).resize((rw, rh))
73        return combined
75    # show composite
76    combined_display(Image.fromarray(im), matte).show()
78im = preprocessing(img)
79# authenticate with api
80api = PipelineCloud(token="YOUR_API_TOKEN")
81# inference call
82result_detailed = api.run_pipeline("PIPELINE_ID",[["output"], {"input": im}])
83# get MODNet result without metadata
84result = result_detailed['result_preview']
85# create and show composite
As you can see, the args for running inference through the PipelineCloud API are just the onnxruntime args in a list. The only caveat being that a None “output_names” arg to onnxruntime should either be passed in as an empty list or as a list containing the explicit output names strings in PipelineCloud. You won’t have to worry over this in our example but here’s some explanation if you are using your own ONNX model.For more information, you can check out docs here.


It’s dangerous to go alone! Take this.
MLOps is a dark and treacherous landscape and finding your way through can be a gruelling endeavour. builds the tools that make that journey easier.Don’t hesitate to contact us with questions/suggestions. We’re very friendly and always happy to help. GLHF!


Pipeline AI makes it easy to work with ML models and to deploy AI at scale. The self-serve platform provides a fast pay-as-you-go API to run pretrained or proprietory models in production. If you are looking to deploy a large product and would like to sign up as an Enterprise customer please get in touch.

Follow us on Twitter and Linkedin.