TensorFlow Decision Forests: A Comprehensive Introduction | by Antons Tocilins-Ruberts


Train, tune, evaluate, interpret and serve the tree-based models using TensorFlow

11 min read

Apr 14, 2023

Photo by Javier Allegue Barros on Unsplash

Two years ago, TensorFlow (TF) team has open-sourced a library to train tree-based models called TensorFlow Decision Forests (TFDF). Just last month they’ve finally announced that the package is production ready, so I’ve decided that it’s time to take a closer look. The aim of this post is to give you a better idea about the package and show you how to (effectively) use it. Below you can see the structure of this post, feel free to skip to any part that interests you the most.

  1. What is TFDF and why use it?
  2. Train Random Forest (RF) and Gradient Boosted Trees (GBT) models using TFDF
  3. Hyper-parameter tuning with TFDF and Optuna
  4. Model inspection
  5. Serving GBT model using TF Serving

You can find all the code in my repo, so make sure to star it if you haven’t already. In this post we’ll be training a few models for loan default prediction using the U.S. Small Business Administration dataset (CC BY-SA 4.0 license) dataset. Models will be trained using already pre-processed data but you can find a notebook in the repo that describes the processing and feature engineering steps. Make sure to follow them if you want to directly replicate my code here. Alternatively, use this code as a starting point and adapt it to your dataset (my recommended approach).

Installing TensorFlow Decision Forests is quite straightforward, just run pip install tensorflow_decision_forests and most of the time this should work. There are some issues reported with M1 and M2 Macs but it worked fine for me personally with the latest version of TFDF.

What is TFDF?

TensorFlow Decision Forest is actually built on top of the C++ library called Yggdrasil Decision Forests which also developed by Google. The original C++ algorithms are designed to build scalable decision tree models that can handle large datasets and high-dimensional feature spaces. By…



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*