Building Deep Learning Pipelines with Tensorflow Extended | by Piero Esposito


You will see how easy it is to build Deep Learning pipelines like the big guys do

Source. Acessed in (2022–06–27) Licensed under Apache 2.0

In this tutorial, I aim to:

  • Explain the function of Machine Learning pipelines for production
  • Show how to get started with Tensorflow Extended locally
  • Show how to move a Tensorflow Extended pipeline from local environment to Vertex AI
  • Give you some code samples to adapt and get started with TFx.

You can check the code for this tutorial here.

Once you finish your model experimentation it is time to roll things to production. Rolling Machine Learning to production is not just a question of wrapping the model binaries with a REST API and starting to serve it, but and making it possible to re-create (or update) and re-deploy your model.

That means the steps from preprocessing data to training the model to roll it to production (we call this a Machine Learning Pipeline) should be deployed and able to be run as easily as possible while making it possible to track it and parameterize it (to use different data, for example).

In this post, we will see how to build a Machine Learning Pipeline for a Deep Learning model using Tensorflow Extended (TFx), how to run and deploy it to Google Vertex AI and why should we use it.

We will start with an overview of TFx and its components, implement and minimal-working pipeline and then show how to run it on a Jupyter Lab (on Google Colab) and on Vertex AI Pipelines. The code used here is adapted from “Building Machine Learning Pipelines” by Hannes Hapke.

TFx is the production framework for TensorFlow. Rather than only letting the user serve a model in a highly performant way, TFx wraps all the parts of the Tensorflow ecosystem (from Keras to Data Validation to Serving) to let the user build and run scalable and performant Machine Learning pipelines.

It is organized as a sequence of components that require you writing less code but are also flexible.

Source. Acessed in (2022–06–27) Licensed under Apache 2.0

The pipelines are compiled into a series of blueprints that are adaptable to different infrastructure. You can run them locally or on Vertex AI. You can run some expensive data processing steps on Apache Beam or even on Google Dataflow. TFx won’t vendor-lock you and will adapt to your resources and infrastructure.

Source. Acessed in (2022–06–27) Licensed under Apache 2.0

On this post, we will use the following components for our pipeline:

ExampleGen

The entry point of a TFx pipeline is the ExampleGen component. Given a local (or Cloud Storage-like) path, it gathers files on .tfrecords format according to the input specifications.

We will use a smaller dataset, and thus a .csv file would be a better suit. We will use a CsvExampleGen file pointing to a local csv file. When we move to Vertex AI, this file will be uploaded to a Google Cloud Storage bucket.

Generated by me

Transform

The Transform component is not always needed, but it is useful when expensive preprocessing needs to be done. To do that, we create a pure Tensorflow function called preprocessing_fn on a module file. TFx will apply this transformation to all datapoints fed by the ExampleGen component.

This function needs to be pure (meaning no side effects) and use only Tensorflow operations, because (i) this preprocessing step can be serialized to a TF Graph and baked to the final model, and (ii) this step can be run on Apache Beam, which means it will be massively parallelized (good for very large datasets).

preprocessing_fn receives a tf.Example object (think of it as a fancy dictionary) and should return a dictionary , that will be serialized into a tf.Example to be fed into the model. On this step, as long as you use only TF functions, you can do whatver you feel like.

We will see an example of preprocessing_fn as well as its args when building the pipeline.

We point to this module file to build the component.

Generated by me

Trainer

Here is where the model.fit step happens. Same as on the Transform step, we point toa module file with a run_fn on it. The run_fn function receives a series of fn_args that will be explored when building the pipeline.

run_fn receives the actual preprocessed examples from Transform. It feeds it to the model on train step, then returns the model. The returned model will be further validated and then pushed to a deployment context.

The trained component can be run on different infrastructures. You can run it locally, with local resources, or set infrastructure requisites (such as GPUs, and CPUs) and run it as a managed container on Vertex AI.

Generated by me

Pusher

Pusher is the last component of a TFx pipeline. Its purpose is to, given the resulting model from Trainer, push it to a Cloud Storage bucket from where it will be served (either by a TF Serving instance listening to the bucket or by a web/mobile app that uses TF Lite).

Before pushing, there are several validations that can be made on top of the model, for instance evaluating it for its metrics on the test set or even checking if it actually runs on a TF Serving container. We will keep this simple and post a more complex pipeline if people like this one.

Generated by me

With those components in place, we can start implementing the pipeline to see how it actually works.

We will start implementing it on a Jupyter Notebook, you can open it on Colab here. We will run it interactively and check how things run. After that, we will build the same pipeline, but deploy it on Vertex AI pipelines.

After running the setup cell, we have to import tfx and its InteractiveContext :

Our data will be at the relative data_local directory, so we just have to crate a CsvExampleGen from it. Notice that we run our component with the interactive context, and then will be able to hand their outputs.

It is a good practice to get statistics and the schema of a dataset before starting to work on it. In the future, this can be used to validate the data not just for schema, but also for drift. We use SchemaGen and StatisticsGen for it:

We will now go preprocess our data using the preprocess function at the module.py file we have in our current dir:

We are only imputing missing values and one-hot-encoding categorical features. It is as simple as:

We then proceed to train our model. Notice that we use the results of the transform component and set the eval and training args. As we are using those components to illustrate the TFx pipeline creation, we won’t train them for very long:

In our module file, we are get our data from the transform component. We internally have access to a fn_args.train_files and fn_args.eval_files which were generated after the transform component, and made available internally because we passed examples as a keyword argument to the train component.

After training, we also get acess to the TensorFlow graph of the preprocessing operations which will be useful to roll the model into production without having to handle processing by hand. We also ensure that the same preprocessing applied to the data used when training and evaluating the model will be used in production.

After training, we only have to save our model where it is going to be saved. On this tutorial we are saving it locally, but it should be made clear that we can push the model to a Cloud Storage bucket, where it will be acessed by the productive environment (a TF Seving container hearing on the bucket, for example).

TFx is awesome because you can use the same components and run them on the cloud. In some cases (as training and transforming data) it is possible to set it to run on specific infrastructure that handles large data, (as Dataflow, where you can massive parallelize the transform step).

You can also specify resources needed for the train step and Vertex AI will train the container on a VM with said resources. We will now see how to do that in the simples way possible.

To do that, let’s open the notebook for the Vertex AI part here.

We have the same module.py with our train code, and a base_pipeline.py that connects the components similarly to what we did with the local run, but without the interactive context:

What changes here is that we will get this pipeline run and, rather than running it using InteractiveContext , we will create a .json blueprint and then run it on Vertex AI.

The magic comes now: with the same pipeline, we will build the json blueprints to run it in Kubeflow , and then use Vertex AI Pipelines (a fancy Kubeflow Pipelines engine) to actually run it.

Before running it, go to pipeline_vertex.py , and change line 29 to your project name and lines 19 and 20 to a bucket you created. Then upload the local data to YOUR_BUCKET/data and module.py to YOUR_BUCKET/components.

You should also have a service account set up for that.

Then just run all the cells on that notebook and open Vertex AI pipelines, you should see something alike to that: (and the model binaries on the bucket you created).

Generated by me on Vertex AI Pipelines

We can use the same code base on TFx and run train jobs locally (to debug and work on small stuff) and, when want or need, move it to Vertex AI with little changes. That way, we can handle large datasets and models and train stuff in a feasible.

It is also nice to see how TFx brings together all the power of the Tensorflow ecosystem in a production suited framework.

This tutorial is intended to be simple and show how easy it is to get started with TFx, while giving some code examples for you to get started. Most of the code used here is adapted or, at least inspired by Building Machine Learning Pipelines from Hannes Hapke, of which the Github repository is released under a MIT License.

If you find any bugs, have doubts, want to discuss the topic further or just want to make a friend, feel free to reach me at piero.skywalker@gmail.com.

Thanks!



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*