TFDV201: An Intermediate Guide to TensorFlow Data Validation | by Josh Kim | Oct, 2023


Image by Robert Armstrong from Pixabay

Click here for link to TFDV101: A Guide to TensorFlow Data Validation

Introduction

Continuing the TFDV Guide series, in this article we’ll further explore the TensorFlow Data Validation, which is an open-source library developed by Google that provides tools and capabilities for analyzing and validating datasets for machine learning. Personally speaking, my primary use case of TFDV is in machine learning model monitoring to detect unexpected behaviours in data to prevent model score degradation. The monitoring component that I’ve developed heavily relies on TFDV functions. It is engineered to generate training and serving data statistics, detect skews and drifts in the serving data and prediction values (anomalies), and finally visualize the statistics and anomalies in an .html format for it to be easily analyzed and assessed for model re-training.

I believe there are still more to be learned from TFDV for me, so I’m continuing to explore more capabilities of this library in order to make further improvements to my ML framework. For now, I am going to share what I know already and find useful because there aren’t too many online resources available on TFDV compared to other libraries. I wish my article serves as a useful resource for data professionals who are new to TFDV. After today’s session, I hope that you become familiar with the library and be confident enough to apply your learnings to your own work; whether it be an ML framework, a standalone UDF function, or for your analysis in Jupyter Notebook. Without further ado, let’s dive and jive with TFDV.

An Intermediate Guide to TFDV

In this section, I’m going to cover the intermediate level of TFDV usage by introducing some of the patterns that I found useful, and to do that we’ll use the same dataset as the last article (TFDV101), which is Taxi Trips dataset released by the City of Chicago.

  1. Serving Data Validation

When developing an ML model, we often extract data from multiple data sources. For example, you may extract…



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*