Model Management with MLflow, Azure, and Docker | by Sabrine Bendimerad

You can clone this folder to find all the necessary scripts for this tutorial.

To host the MLflow server, we start by creating a Docker container using a Dockerfile. Here’s an example configuration:

# Use Miniconda as the base image
FROM continuumio/miniconda3# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
# Install necessary packages
RUN apt-get update -y && \
apt-get install -y --no-install-recommends curl apt-transport-https gnupg2 unixodbc-dev
# Add Microsoft SQL Server ODBC Driver 18 repository and install
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
apt-get update && \
ACCEPT_EULA=Y apt-get install -y msodbcsql18 mssql-tools18
# Add mssql-tools to PATH
RUN echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bash_profile && \
echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bashrc
# define default server env variables
ENV MLFLOW_SERVER_HOST 0.0.0.0
ENV MLFLOW_SERVER_PORT 5000
ENV MLFLOW_SERVER_WORKERS 1
# Set the working directory
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install Python dependencies specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make sure the launch.sh script is executable
RUN chmod +x /app/launch.sh
# Expose port 5000 for MLflow
EXPOSE 5000
# Set the entrypoint to run the launch.sh script
ENTRYPOINT ["/app/launch.sh"]

This Dockerfile creates a container that runs an MLflow server. It installs necessary tools, including the Microsoft SQL Server ODBC driver, sets up the environment, and installs Python dependencies. It then copies our files in the app folder into the container, exposes port 5000 (mandatory for MlFlow), and runs a launch.sh script to start the MLflow server.

The launch.sh contains only the command that launches the mlflow server.

Build the Docker Image in the same directory where your Dockerfile is :

docker build . -t mlflowserver
# if your are on mac, use :
# docker build - platform=linux/amd64 -t mlflowserver:latest .

Run the Docker container:

docker run -it -p 5000:5000 mlflowserver

After running these commands, the MLflow server starts locally, and you can access the MLflow UI by navigating to http://localhost:5000. This confirms the server is successfully deployed on your local machine. However, at this stage, while you can log experiments to MLflow, none of the results, artifacts, or metadata will be saved in the SQL database or artifact store, as those have not been configured yet. Additionally, the URL is only accessible locally, meaning your data science team cannot access it remotely.

Start by creating an Azure account and grabbing your Subscription ID from the Azure Portal.

To deploy your MLflow server and make it accessible to your team, follow these simplified steps:

Clone the Repository: Clone this folder to your local machine.
Run the Deployment Script: Execute the deploy.sh script as a shell script. Make sure to update the Subscription ID variable in the script before running it.

While Azure offers a graphical interface for setting up resources, this guide simplifies the process by using the deploy.sh script to automate everything with a single command.

Here’s a breakdown of what deploy.sh script does step-by-step:

1.Login and Set Subscription: First, log into your Azure account and set the correct subscription where all your resources will be deployed (retrieve the subscription ID from the Azure Portal).

az login az account set --subscription $SUBSCRIPTION_ID

2.Create a Resource Group: Create a Resource Group to organize all the resources you’ll deploy for MLflow.

az group create --name $RG_NAME --location

3.Set Up Azure SQL Database: Create an Azure SQL Server and an SQL Database where MLflow will store all experiment metadata.

az sql server create \
--name $SQL_SERVER_NAME \
--resource-group $RG_NAME \
--location $RG_LOCATION \
--admin-user $SQL_ADMIN_USER \
--admin-password $SQL_ADMIN_PASSWORDaz sql db create \
--resource-group $RG_NAME \
--server $SQL_SERVER_NAME \
--name $SQL_DATABASE_NAME \
--service-objective S0

4.Configure SQL Server Firewall: Allow access to the SQL Server from other Azure services by creating a firewall rule.

az sql server firewall-rule create \
--resource-group $RG_NAME \
--server $SQL_SERVER_NAME \
--name AllowAllAzureIPs \
--start-ip-address 0.0.0.0 \
--end-ip-address 0.0.0.0

5.Create Azure Storage Account: Set up an Azure Storage Account and a Blob Container to store artifacts (e.g., models, experiment results).

az storage account create \
--resource-group $RG_NAME \
--location $RG_LOCATION \
--name $STORAGE_ACCOUNT_NAME \
--sku Standard_LRSaz storage container create \
--name $STORAGE_CONTAINER_NAME \
--account-name $STORAGE_ACCOUNT_NAME

6.Create Azure Container Registry (ACR): Create an Azure Container Registry (ACR) to store the Docker image of your MLflow server.

az acr create \
--name $ACR_NAME \
--resource-group $RG_NAME \
--sku Basic \
--admin-enabled true

7.Build and Push Docker Image to ACR: Build your Docker image for the MLflow server and push it to the Azure Container Registry. For that, you need first to retrieve the ACR Username and Password and to log into ACR.

export ACR_USERNAME=$(az acr credential show --name $ACR_NAME --query "username" --output tsv)
export ACR_PASSWORD=$(az acr credential show --name $ACR_NAME --query "passwords[0].value" --output tsv)docker login $ACR_NAME.azurecr.io \
--username "$ACR_USERNAME" \
--password "$ACR_PASSWORD"

# Push the images
docker tag $DOCKER_IMAGE_NAME $ACR_NAME.azurecr.io/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG
docker push $ACR_NAME.azurecr.io/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG

8.Create App Service Plan: Set up an App Service Plan to host your MLflow server on Azure.

az appservice plan create \
--name $ASP_NAME \
--resource-group $RG_NAME \
--sku B1 \
--is-linux \
--location $RG_LOCATION

9.Deploy Web App with MLflow Container: Create a Web App that uses your Docker image from ACR to deploy the MLflow server.

az webapp create \
--resource-group $RG_NAME \
--plan $ASP_NAME \
--name $WEB_APP_NAME \
--deployment-container-image-name $ACR_NAME.azurecr.io/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG

10.Configure Web App to Use Container Registry: Set up your Web App to pull the MLflow Docker image from ACR, and configure environment variables.

az webapp config container set \
--name $WEB_APP_NAME \
--resource-group $RG_NAME \
--docker-custom-image-name $ACR_NAME.azurecr.io/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG \
--docker-registry-server-url https://$ACR_NAME.azurecr.io \
--docker-registry-server-user $ACR_USERNAME \
--docker-registry-server-password $ACR_PASSWORD \
--enable-app-service-storage trueaz webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings WEBSITES_PORT=$MLFLOW_PORT
az webapp log config \
--name $WEB_APP_NAME \
--resource-group $RG_NAME \
--docker-container-logging filesystem

11.Set Web App Environment Variables: Set the necessary environment variables for MLflow, such as storage access, SQL backend, and port settings.


echo "Retrive artifact, access key, connection string"
export STORAGE_ACCESS_KEY=$(az storage account keys list --resource-group $RG_NAME --account-name $STORAGE_ACCOUNT_NAME --query "[0].value" --output tsv)
export STORAGE_CONNECTION_STRING=`az storage account show-connection-string --resource-group $RG_NAME --name $STORAGE_ACCOUNT_NAME --output tsv`
export STORAGE_ARTIFACT_ROOT="https://$STORAGE_ACCOUNT_NAME.blob.core.windows.net/$STORAGE_CONTAINER_NAME"#Setting environment variables for artifacts and database
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings AZURE_STORAGE_CONNECTION_STRING=$STORAGE_CONNECTION_STRING
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings BACKEND_STORE_URI=$BACKEND_STORE_URI
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings MLFLOW_SERVER_DEFAULT_ARTIFACT_ROOT=$STORAGE_ARTIFACT_ROOT
#Setting environment variables for the general context
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings MLFLOW_SERVER_PORT=$MLFLOW_PORT
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings MLFLOW_SERVER_HOST=$MLFLOW_HOST
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings MLFLOW_SERVER_FILE_STORE=$MLFLOW_FILESTORE
az webapp config appsettings set \
--resource-group $RG_NAME \
--name $WEB_APP_NAME \
--settings MLFLOW_SERVER_WORKERS=$MLFLOW_WORKERS

Once the deploy.sh script has completed, you can verify that all your Azure services have been created by checking the Azure portal.

Go to the App Services section to retrieve the URL of your MLflow web application.

Your MLflow Tracking URL should now be live and ready to receive experiments from your data science team.

Here’s a Python script demonstrating how to log an experiment using MLflow with a simple scikit-learn model, such as logistic regression. Ensure that you update the script with your MLflow tracking URI:

import os
import mlflow
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import joblib# Load Iris dataset
iris = load_iris()
# Split dataset into X features and Target variable
X = pd.DataFrame(data = iris["data"], columns= iris["feature_names"])
y = pd.Series(data = iris["target"], name="target")
# Split our training set and our test set 
X_train, X_test, y_train, y_test = train_test_split(X, y)
# Set your variables for your environment
EXPERIMENT_NAME="experiment1"
# Set tracking URI to your Heroku application
mlflow.set_tracking_uri("set your mlflow tracking URI")
# mlflow.set_tracking_uri("http://localhost:5000")
# Set experiment's info 
mlflow.set_experiment(EXPERIMENT_NAME)
# Get our experiment info
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
# Call mlflow autolog
mlflow.sklearn.autolog()
with open("test.txt", "w") as f:
f.write("hello world!")
with mlflow.start_run(experiment_id = experiment.experiment_id):
# Specified Parameters 
c = 0.1
# Instanciate and fit the model 
lr = LogisticRegression(C=c)
lr.fit(X_train.values, y_train.values)
# Store metrics 
predicted_qualities = lr.predict(X_test.values)
accuracy = lr.score(X_test.values, y_test.values)
# Print results 
print("LogisticRegression model")
print("Accuracy: {}".format(accuracy))
# Log Metric 
mlflow.log_metric("Accuracy", accuracy)
# Log Param
mlflow.log_param("C", c)
mlflow.log_artifact('test.txt')

By running this script, you should be able to log your models, metrics, and artifacts to MLflow. Artifacts will be stored in Azure Blob Storage, while metadata will be saved in the Azure SQL Database.

1Check MLflow Tracking: Visit your MLflow tracking URL to find your experiment, run names, and all associated metrics and model parameters

Check MLflow Artifacts: Access the artifacts in the MLflow UI and verify their presence in Azure Blob Storage

You and your team can now submit experiments to MLflow, track them via the tracking URI, and retrieve model information or files from Azure Storage. In the next tutorial, we will explore how to create an API to read models stored in Azure Storage.

You’ve successfully set up MLflow with Azure for tracking and managing your machine learning experiments. Keep in mind that depending on your computer and operating system, you might encounter some issues with Docker, MLflow, or Azure services. If you run into trouble, don’t hesitate to reach out for help.

Next, we’ll explore how to use MLflow models stored in Azure Blob Storage to create an API, completing the automation workflow.

Thank you for reading!

Note: Some parts of this article were initially written in French and translated into English with the assistance of ChatGPT.

If you found this article informative and helpful, please don’t hesitate to 👏 and follow me on Medium | LinkedIn.

Source link

Model Management with MLflow, Azure, and Docker | by Sabrine Bendimerad | Sep, 2024

Be the first to comment

Leave a Reply Cancel reply