Building Deep Learning Pipelines with Tensorflow Extended | by Piero Esposito

You will see how easy it is to build Deep Learning pipelines like the big guys do

Source. Acessed in (2022–06–27) Licensed under Apache 2.0

In this tutorial, I aim to:

Explain the function of Machine Learning pipelines for production
Show how to get started with Tensorflow Extended locally
Show how to move a Tensorflow Extended pipeline from local environment to Vertex AI
Give you some code samples to adapt and get started with TFx.

You can check the code for this tutorial here.

Once you finish your model experimentation it is time to roll things to production. Rolling Machine Learning to production is not just a question of wrapping the model binaries with a REST API and starting to serve it, but and making it possible to re-create (or update) and re-deploy your model.

That means the steps from preprocessing data to training the model to roll it to production (we call this a Machine Learning Pipeline) should be deployed and able to be run as easily as possible while making it possible to track it and parameterize it (to use different data, for example).

In this post, we will see how to build a Machine Learning Pipeline for a Deep Learning model using Tensorflow Extended (TFx), how to run and deploy it to Google Vertex AI and why should we use it.

We will start with an overview of TFx and its components, implement and minimal-working pipeline and then show how to run it on a Jupyter Lab (on Google Colab) and on Vertex AI Pipelines. The code used here is adapted from “Building Machine Learning Pipelines” by Hannes Hapke.

TFx is the production framework for TensorFlow. Rather than only letting the user serve a model in a highly performant way, TFx wraps all the parts of the Tensorflow ecosystem (from Keras to Data Validation to Serving) to let the user build and run scalable and performant Machine Learning pipelines.

It is organized as a sequence of components that require you writing less code but are also flexible.

The pipelines are compiled into a series of blueprints that are adaptable to different infrastructure. You can run them locally or on Vertex AI. You can run some expensive data processing steps on Apache Beam or even on Google Dataflow. TFx won’t vendor-lock you and will adapt to your resources and infrastructure.

On this post, we will use the following components for our pipeline:

ExampleGen

The entry point of a TFx pipeline is the ExampleGen component. Given a local (or Cloud Storage-like) path, it gathers files on .tfrecords format according to the input specifications.

We will use a smaller dataset, and thus a .csv file would be a better suit. We will use a CsvExampleGen file pointing to a local csv file. When we move to Vertex AI, this file will be uploaded to a Google Cloud Storage bucket.

Transform

The Transform component is not always needed, but it is useful when expensive preprocessing needs to be done. To do that, we create a pure Tensorflow function called preprocessing_fn on a module file. TFx will apply this transformation to all datapoints fed by the ExampleGen component.

This function needs to be pure (meaning no side effects) and use only Tensorflow operations, because (i) this preprocessing step can be serialized to a TF Graph and baked to the final model, and (ii) this step can be run on Apache Beam, which means it will be massively parallelized (good for very large datasets).

preprocessing_fn receives a tf.Example object (think of it as a fancy dictionary) and should return a dictionary , that will be serialized into a tf.Example to be fed into the model. On this step, as long as you use only TF functions, you can do whatver you feel like.

We will see an example of preprocessing_fn as well as its args when building the pipeline.

We point to this module file to build the component.

Trainer

Here is where the model.fit step happens. Same as on the Transform step, we point toa module file with a run_fn on it. The run_fn function receives a series of fn_args that will be explored when building the pipeline.

run_fn receives the actual preprocessed examples from Transform. It feeds it to the model on train step, then returns the model. The returned model will be further validated and then pushed to a deployment context.

The trained component can be run on different infrastructures. You can run it locally, with local resources, or set infrastructure requisites (such as GPUs, and CPUs) and run it as a managed container on Vertex AI.

Pusher

Pusher is the last component of a TFx pipeline. Its purpose is to, given the resulting model from Trainer, push it to a Cloud Storage bucket from where it will be served (either by a TF Serving instance listening to the bucket or by a web/mobile app that uses TF Lite).

Before pushing, there are several validations that can be made on top of the model, for instance evaluating it for its metrics on the test set or even checking if it actually runs on a TF Serving container. We will keep this simple and post a more complex pipeline if people like this one.

With those components in place, we can start implementing the pipeline to see how it actually works.

We will start implementing it on a Jupyter Notebook, you can open it on Colab here. We will run it interactively and check how things run. After that, we will build the same pipeline, but deploy it on Vertex AI pipelines.

After running the setup cell, we have to import tfx and its InteractiveContext :