AI Model deployment using REST API | by Ali Arslan | Oct, 2023

Imagine building a cutting-edge AI model capable of identifying diseases from medical images or predicting customer behaviours in an e-commerce platform. You’ve spent months fine-tuning your model, achieving impressive accuracy on your dataset, and validating it through rigorous testing. It’s a moment of triumph.

However, the true value of AI lies not in its ability to perform well in a controlled environment, but in its capacity to deliver results when integrated into real-world systems. This is where AI model deployment comes into play.

In this Blog, a pre-trained model trained on the ImageNet dataset is gonna be deployed using a REST API.

In this AI inference process, the programming language going to be used is Python. The framework for REST API development is going to be Flask. The framework for handling AI computation is Keras.

Importing all the necessary libraries:

import io
import os
import numpy as np
from flask import Flask, request, jsonify, abort
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing import image
from waitress import serve
import os
import tempfile

The io and numpy libraries are used for the preprocessing and loading of the image into the memory. The Flask library is used for developing the REST API. The Keras library is used for all the computations related to the AI model. Flask is unable to handle concurrent requests. Waitress makes this possible by providing concurrency to the flask application.

app = Flask(__name__)

# Load the pre-trained VGG16 model
model = VGG16(weights='imagenet')

# Define a function to preprocess an image for model input
# Define a function to preprocess an image for model input
def preprocess_image(image_bytes):
# Create a temporary file to save the uploaded image
with tempfile.NamedTemporaryFile(delete=False, suffix=".jpg") as temp_file:
temp_file_path =

# Load the image from the temporary file
img = image.load_img(temp_file_path, target_size=(224, 224))
img = image.img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)

# Remove the temporary file
return img

The VGG16 is a pre-trained image Classification model trained on the ImageNet dataset. It can classify 1,000 different objects. Keras framework is used to perform all the computations on that model. The Architecture of VGG-16 can be seen below:

The preprocess_image(image_bytes) is a function. It converts the image to an array and adds an extra dimension that corresponds to the dimension of batch size in the model.

@app.route('/predict', methods=['POST'])
def predict():
# Check if the 'Authorization' header is set correctly
expected_token = '/PR?nHjHo&=L0O$<O[bR<I5}6t=*B#' # Replace with your secret token
auth_header = request.headers.get('Authorization')
if auth_header is None or auth_header != f'Bearer {expected_token}':
abort(401, 'Unauthorized')

# Check if an image file is in the request
if 'image' not in request.files:
abort(400, 'No image file provided')

# Get the image file from the request
image_file = request.files['image']

# Ensure the file has an allowed extension (e.g., JPEG, PNG)
allowed_extensions = {'jpg', 'jpeg', 'png'}
if not image_file.filename.lower().endswith(tuple(allowed_extensions)):
abort(400, 'Unsupported file format')

# Preprocess the image
img = preprocess_image(io.BytesIO(

# Make predictions
predictions = model.predict(img)
decoded_predictions = decode_predictions(predictions, top=1)[0]

# Get the top prediction result

_, class_label, confidence = decoded_predictions[0]

# Return the prediction result as JSON
result = {
'class_label': class_label,
'confidence': str(confidence)

return jsonify(result)

if __name__ == '__main__':
serve(app, host='', port=80)

The decorators represented by @app.route(‘/predict’, methods=[‘POST’]) on the top of the function def predict(): is responsible for forwarding the incoming requests at the route /predict to the predict() function.

We need the following configurations:

  1. Save the code above as a file or clone the repository.

you can clone this whole project by running the following command

git clone
  1. Install Python and then the following packages: Numpy, Keras, Pillow, waitress and Flask by running the command
pip install -r requirements.txt

2. Run the command API Script:


Now your application is live and running on port 80.

  1. Take a dummy image. I have taken an image of a Cat.
  2. Make a file and replace the code below in it. Replace the path of the image in the code with the path in your pc.
import requests

# Define the API endpoint
api_url = 'http://localhost:80/predict' # Update the URL if your API is hosted elsewhere

# Replace 'YOUR_SECRET_TOKEN' with your actual authorization token
headers = {'Authorization': 'Bearer /PR?nHjHo&=L0O$<O[bR<I5}6t=*B#'}

# Specify the image file you want to test with
image_path = './test_images/cat_image.jpg' # Replace with the actual path to your image

# Create a dictionary to hold the image data
files = {'image': open(image_path, 'rb')}

# Send the POST request to the API
response =, headers=headers, files=files)

# Check the response
if response.status_code == 200:
result = response.json()
print('Predicted Class:', result['class_label'])
print('Confidence:', result['confidence'])
print('Error:', response.status_code, response.text)

3. Run the Python command:


4. You shall receive a response with confidence and the class of the image.

5. The API script can be deployed to any cloud service by opening port 80 on the Compute instance. Hence the REST API can be accessed globally.

It is recommended to have GPU installed and configured with the ML-Library to make the request-response quick. Higher computation in the AI model can sometimes lead to timeout errors.

All the source code can be accessed on Github:

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.