Serverless Deployment of fastai on Azure
Flying Nobita
by Flying Nobita
14 min read


How can you deploy a machine learning model without provisioning ANY servers?

UPDATE: This guide is also referenced in the course v3 (Part 1). The only difference is the course version doesn’t mention and use the Recommendations (pipenv & pyenv) thus making it a bit less opinionated. Otherwise the two guides are identical.

Table of Contents

  1. FaaS - Function as a Service (aka serverless)
    1. Supported Languages
    2. Storage And Memory Limitations
    3. Time Limitation
    4. fastai Doesn’t Compile In Windows WSL Ubuntu Using pip (unexpected)
    5. Storage Limit From Amazon Lambda and Google Cloud Functions Is Too Small (expected)
  2. Microsoft Azure Functions
    1. Pricing
  3. Requirements
    1. Software
    2. Accounts
  4. Recommendations
  5. 1 - Local Setup
    1. Setup Project Directory
    2. Create Azure Functions project
    3. Create Azure Function
    4. Install fastai & Dependencies
    5. Update Function
      1. /<FUNCTION_NAME>/
      2. /<FUNCTION_NAME>/function.json
      3. export.pkl
    6. Test Function
    7. Check Test Outputs
  6. 2 - Docker Setup
    1. Build Docker image
    2. Test Docker image
    3. Push Docker image to Docker Hub
  7. 3 - Azure Setup
    1. Setup Azure Resources
      1. Create Resource Group
      2. Create Storage Account
      3. Create a Linux App Service Plan
      4. Create the App & Deploy the Docker image from Docker Hub
      5. Configure the function app
    2. Run your Azure Function
    3. Delete Resource Group
  8. Conclusion
  9. References:

In the previous article, we’ve looked at different ways to deploy a trained machine learning model for a mobile app. This includes implementing the inference process on the mobile device and on different cloud-based architectures (IaaS, VPS, PaaS, ML PaaS).

In this article, I will explore the serverless architecture, the newest kid on the block, and see what are its characteristics, who the major service providers are, and implement a simple image classifier in fastai/PyTorch using one of the providers.

FaaS - Function as a Service (aka serverless)

This category of server implementation brings PaaS to a whole new level.

You write your code, which is called a function, it can access resources you set up in the cloud provider, such as online storage for the photos. Then you set up events that trigger the function to run.

There’re four main advantages of going serverless:

  1. no need to provision or manage any hardware
  2. no need to pay for any idle resource time
  3. infrastructure can scale automatically depending on load
  4. availability and fault tolerance of the servers are built in

This is an attractive list of qualities. Sounds like everyone should be going serverless.

But should you?

In reality, there are hidden costs in both dollar and time that you should be aware of. Though these costs only become problematic if your app is being heavily utilized (I mean like millions of times a month). First world problems.

What is relevant for your app is the limitations imposed by the cloud provider that can make your deployment problematic. Some of the main ones are:

  • Supported Languages

    • As the responsibility of setting up and maintaining the software framework for running the code falls to the cloud provider, they have to make most of their resources and only support the most popular languages and frameworks. Your choices will be limited, down to the version number.
  • Storage And Memory Limitations

    • You are usually limited to the amount of disk space and memory that your code has access to. This is especially a problem for ML applications because:
      • the application usually has a long list of dependencies and their dependencies (besides ML framework such as scikit-learn, PyTorch, TensorFlow, they also have dependencies such as numpy, pandas, etc.)
      • the model file that contains the pre-trained weights can be big
  • Time Limitation

    • Each function is allowed a certain amount of time to run (usually 5-10 mins) before it is forced to terminate.

Serverless is still a new approach to the cloud, and both companies and developers are beginning to embrace it. However, there is already a lot of service providers to choose from. We can define 2 categories of serverless service providers:

  1. own hardware and provides API for access
  2. do not own any hardware but provide its own API to access the previous category’s hardware

Here we have a list of the major providers from the first category:

  Python Runtime Version Deployment Package Size Memory Timeout
AWS Lambda 2.7, 3.6, 3.7 50 MB (compressed)
250 MB (uncompressed)
3 GB 900 sec
Google Cloud Functions 3.7.1 (beta) Source: 100 MB (compressed)
Source + Modules: 500 MB (uncompressed)
2 GB 540 sec
IBM OpenWhisk 2.7.15, 3.6.8, 3.7.2 48 MB 2 GB 600 sec
Microsoft Azure Functions 3.6 (preview) ? 1.5 GB 600 sec (Consumption Plan)
Unlimited (App Service Plan)

For a simple image classification app, the function shouldn’t have any problem staying within the memory limits and the timeout limits. What might be a problem is the size of the deployment package. Basically, the upload of the deployment package directly to the serverless architecture will probably fail as it is likely to be bigger than the limits.

A workaround to the disk space limitation is to stripdown the ML libraries such that you are only left with what’s absolutely needed. Additionally, you can also separate the libraries into submodules such that each module can be fitted into its own cloud function. Making an inference function call would trigger a chain of cloud functions, with the last function returning the prediction result to you.

While these methods work, it seems to me that it introduces another problem. Because the slimming down of the ML libraries isn’t officially supported, there will be some work that needs to be done in order to upgrade the library to the latest version. Given the fast-paced development of all the ML framework today, this might not be a very sustainable solution.

There is an interesting method (and probably the proper method) of using AWS Lambda Layers with AWS Lambda to bypass the storage limits. AWS Lambda Layers allows you to organize and store dependency libraries in the form of ZIP archives in AWS. These archives can be called from a Lambda function as needed and thus keep the Lambda function deployment package to a minimal, avoiding the 250 MB (uncompressed) size limit. Layers can be made public and shared. And there is a layer which contains PyTorch v1 running on Python 3.6 that the aforementioned method uses.

Note from the above table that Microsoft Azure Functions doesn’t state any limits.

Let’s continue and look at the second category of serverless providers:

Provider Remarks
Zappa Python-only API wrapper for AWS Lambda
Zeit API wrapper around AWS Lambda
Kubeless Kubernetes-centric API wrapper that supports a number of serverless providers
Serverless Framework API wrapper that supports most major serverless providers

These providers offer an API wrapper that attempts to make the serverless experience friendlier and add other values to the user where they see fit. Providers like Serverless Framework and Kubeless supports multiple serverless infrastructure providers (our first category providers). This makes them especially useful because you can use one API to deploy to any of their supported providers and helps with mitigating the problem of provider lock-ins.

Out of these providers, Serverless Framework seems the most interesting because its free API wrapper supports the most serverless infrastructure providers and in a number of languages. It has a large community that has written many plugins which add extra functionality to the core API wrapper.

Let’s start to use Serverless and deploy to Amazon Lambda (without using Layers) and Google Cloud Compute and see what problems we might encounter:

fastai Doesn’t Compile In Windows WSL Ubuntu Using pip (unexpected)

All the serverless architectures require the use of pip and requirements.txt to install dependencies, thus I couldn’t use conda to install fastai. This led to a lot compiling issues which didn’t come up when I use conda. I found this somewhat surprising as I’ve never encountered differences between Ubuntu on WSL and straight up Ubuntu. This makes sense in hindsight as I only used WSL Ubuntu for ruby or node development. Whereas I always used conda under Windows for Python developments, which is the time when I use libraries that have more compiling complications which conda help solve.

Storage Limit From Amazon Lambda and Google Cloud Functions Is Too Small (expected)

Once the compilation problems went away after I started deploying on a real Ubuntu machine, I was hitting the storage limits.

Unfortunately, the trained model file for MNIST is already 80 MB. When you add fastai, PyTorch, and their dependencies, there’s no way everything can fit in GCF, or in AWS Lambda even if you compressed the libraries and remove unnecessary files by enabling slim package.

Microsoft Azure Functions

Let’s turn our attention to Azure. This wasn’t the first choice because the Serverless documentation lacked Python implementation examples and their Azure plugin hasn’t got any updates for a while. This is in stark contrast when compared with the official Azure documentation which is detailed with good support for Python. Perhaps Azure has been moving along quickly and there isn’t enough time for the wrapper APIs to catch up yet.

In order to try Azure, we will need to forego the Serverless Framework (and all the benefits that a wrapper API provides) and directly use the Azure. It’s worth a try.


Microsoft Azure Functions offers two kinds of pricing, Consumption plan and App Service plan. The main difference is that the Consumption plan allows you to pay only when your function runs. It will scale the architecture for you if needed but you don’t have any control over how it scales. See here for the Consumption plan pricing.

With the App Service plan, you can pick the level of computing resources that you want your function to run on. You are then charged for as long as your resources are defined, regardless of whether your function is running or not. See here for the App Service plan pricing.

Currently, python is still in preview stage in Azure Functions and fastai only works when you provide your own custom Docker image on the App Service plan.



  • real Linux (Windows WSL Ubuntu isn’t sufficient. Below is using Ubuntu 18.04)
  • Docker (to compile fastai dependencies that don’t support manylinux-compatible wheels from PyPI e.g. Bottleneck)
  • Python 3.6 (the only Python runtime currently supported by Azure Functions)
  • Azure Functions Core Tools version 2.x
  • Azure CLI



  • pipenv (Azure Function require virtualenv, so might as well use pipenv which uses virtualenv underneath)
  • pyenv (in case you use a Python version other than 3.6. Besides, pyenv is natively supported by pipenv)

1 - Local Setup

Setup Project Directory

Replace <PROJECT_DIR> with your own project directory name.

pipenv --python 3.6
pipenv shell

Create Azure Functions project

Create an Azure Function Project that uses the Python runtime. This will generate several files in the <PROJECT_DIR>.

func init --docker

When prompted, select python:

  • Select a worker runtime: python

Create Azure Function

Create a function with name <FUNCTION_NAME> using the template HttpTrigger. Replace <FUNCTION_NAME> with your own function name.

func new --name <FUNCTION_NAME> --template "HttpTrigger"

Install fastai & Dependencies

Add Azure’s dependencies to Pipfile.

pipenv install -r requirements.txt 

Install fastai and any other dependencies your app needs in the virtual environment.

pipenv install fastai

Then output all the dependencies to requirements.txt which will be used when you build the Docker image.

pipenv lock -r > requirements.txt

Update Function

Modify the following files in the directory:


This is where your inference function lives. The following is an example of using a trained image classification model.

import logging
import os

import azure.functions as func
from import *
import requests

def main(req: func.HttpRequest) -> func.HttpResponse:

    path = Path.cwd()
    learn = load_learner(path)

    request_json = req.get_json()
    r = requests.get(request_json['url'])

    if r.status_code == 200:
        temp_image_name = "temp.jpg"        
        with open(temp_image_name, 'wb') as f:
        return func.HttpResponse(f"Image download failed, url: {request_json['url']}")

    img = open_image(temp_image_name)
    pred_class, pred_idx, outputs = learn.predict(img)

    return func.HttpResponse(f"request_json['url']: {request_json['url']}, pred_class: {pred_class}")


Update the function authorization so that it can be called without any additional security key. Replace the corresponding line in the file with the following:

      "authLevel": "anonymous",


Copy your trained model file export.pkl to <PROJECT_DIR>.

Test Function

Run the following command to start the function on your local machine:

func host start

This will give you an output with the URL for testing:

Now listening on:
Application started. Press Ctrl+C to shut down.

Http Functions:

    inference_function: [GET,POST] http://localhost:7071/api/<FUNCTION_NAME>

Check Test Outputs

To check that your function is running properly, visit http://localhost:7071 and you should see the following:

Azure Docker Running Successful Screenshot

You can send an HTTP POST method to http://localhost:7071/api/<FUNCTION_NAME> to check that your inference function is working. Replace <URL_TO_IMAGE> with a URL that points to an image for inferencing.

POST http://localhost:7071/api/<FUNCTION_NAME> HTTP/1.1
content-type: application/json

    "url": "<URL_TO_IMAGE>"

You should then see a HTTP response:

HTTP/1.1 200 OK
Connection: close
Date: Sun, 17 Mar 2019 06:30:29 GMT
Content-Type: text/plain; charset=utf-8
Server: Kestrel
Content-Length: 216

request_json['url']: <URL_TO_IMAGE>, pred_class: <PREDICTED_CLASS>

You should see the class that your inference function predicts in <PREDICTED_CLASS>.

You can press Ctrl+C to stop the testing when you’re ready.

2 - Docker Setup

Build Docker image

You can now build the Docker image that will contain your app and all the python libraries that it needs to run.

docker build --tag <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG> .

Test Docker image

Start the Docker image on your local machine for testing.

docker run -p 8080:80 -it <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG>

Your app in the Docker image is now running at the localhost:8080. You can run the same tests in Check Test Outputs with the new URL and you should see the same test output as before.

You can press Ctrl+C to stop the testing when you’re ready.

Push Docker image to Docker Hub

Log in to Docker from the command prompt. Enter your Docker Hub password when prompted.

docker login --username <DOCKER_HUB_ID>

You can now push the Docker image created earlier to Docker Hub.


3 - Azure Setup

Setup Azure Resources

Login to Microsoft Azure with Azure CLI if you haven’t already.

az login

Execute the following commands to create Azure resources and run the inference app on Azure Functions.

The following example uses the lowest pricing tier, B1.

Replace the following placeholders with your own names:

    • name of the Resource Group that all other Azure Resources created for this app will fall under
    • e.g. ResourceGroup
    • run the following command to see the list of available locations:
      • az appservice list-locations --sku B1 --linux-workers-enabled
    • e.g. centralus
    • name of the Azure Storage Account, which is a general-purpose account to maintain information about your function
    • must be between 3 and 24 characters in length and may contain numbers and lowercase letters only
    • e.g. inferencestorage
    • name of the Azure Function App that you will be creating
    • will be the default DNS domain and must be unique across all apps in Azure
    • e.g. inferenceapp123

Create Resource Group

az group create \
--location <LOCATION_ID>

Create Storage Account

az storage account create \
--location <LOCATION_ID> \
--resource-group <RESOURCE_GROUP> \
--sku Standard_LRS

Create a Linux App Service Plan

az appservice plan create \
--name <LOCATION_ID> \
--resource-group <RESOURCE_GROUP> \
--sku B1 \

Create the App & Deploy the Docker image from Docker Hub

az functionapp create \
--resource-group <RESOURCE_GROUP> \
--name <FUNCTION_APP> \
--storage-account  <STORAGE_ACCOUNT> \
--plan <LOCATION_ID> \
--deployment-container-image-name <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG>

Configure the function app

The following assumes the Docker image uploaded earlier in your Docker Hub profile is public. If you have set it to private, you can see here to add your Docker credentials so that Azure can access the image.

storageConnectionString=$(az storage account show-connection-string \
--resource-group <RESOURCE_GROUP> \
--query connectionString --output tsv)  

az functionapp config appsettings set --name <FUNCTION_APP> \
--resource-group <RESOURCE_GROUP> \
--settings AzureWebJobsDashboard=$storageConnectionString \

Run your Azure Function

After the previous command, it will generally take 15-20 minutes for the app to deploy on Azure. You can also see your app in the Microsoft Azure Portal under Function Apps.

The URL for your app will be:

You can run the same tests in Check Test Outputs with the new URL and you should see the same output as before.

Delete Resource Group

When you are done, delete the Resource Group.

az group delete \

Remember that with the App Service plan, you are being charged for as long as you have resources running, even if you are not calling the function. So it is best to delete the resource group when you are not calling the function to avoid unexpected charges.


Microsoft Azure Function seems to be the simplest to deploy, without the need for any manual tinkering on our inference code or on the ML library. Unfortunately, their App Service pricing plan works like other non-serverless pricing and foregoes one of the major advantages of a serverless architecture, paying for resources only when the function runs. But as the serverless solutions mature, it won’t be long before we can run PyTorch functions like how we’re promised.

Feel free to let me know in the comments if you know of other ways to deploy fastai/PyTorch on a serverless architecture.


Create your first Python function in Azure (preview)
Create a function on Linux using a custom image
Azure Functions Python developer guide

Discuss on Twitter