Accelerated TensorFlow model training on Intel Mac GPUs | by Bryan M. Li


TensorFlow introduced PluggableDevice in mid-2021 which enables hardware manufacturers to seamlessly integrate their accelerators (e.g. GPUs, TPUs, NPUs) into the TensorFlow ecosystem. This allows users to enjoy accelerated training on non-CUDA devices with minimal modification in their code. More importantly, hardware manufacturers no longer have to fork and implement their own version of TensorFlow (e.g. AMD ROCm port) and can purely focus on the communication layers between TensorFlow and device-level operations. With the recent public release of macOS Monterey, Apple has added Metal support for the PluggableDevice architecture, hence, it is now possible to train TensorFlow models with the dedicated GPU (dGPU) on MacBook Pros and iMacs with ease (sort of).

In this mini-guide, I will walk through how to install tensorflow-metal to enable dGPU training on Intel MacBook Pro and iMac. In addition, I train a simple CNN image classifier on my MacBook Pro, equipped with an AMD Radeon Pro 560X, to demonstrate the accelerated performance.

Create development environment

I personally prefer miniconda, but other environment managers such as anaconda and virtualenv should also work in a similar fashion.

We first create a new conda environment named tf-metal with Python 3.8

conda create -n tf-metal python=3.8

We then activate the environment

conda activate tf-metal

Install Metal enabled TensorFlow

We have to install the following pip packages: tensorflow-macos and tensorflow-metal. Normally, you can simply do pip install tensorflow-macos tensorflow-metaland Bob’s your uncle. However, you might receive the following error since both packages are built against post-macOS 11 SDK:

ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos

To bypass the version compatibility issue, we need to use the following flag SYSTEM_VERSION_COMPAT=0 along with pip install :

SYSTEM_VERSION_COMPAT=0 pip install tensorflow-macos tensorflow-metal

Both packages should now be installed:

(tf-metal) ➜  ~ pip list
Package Version
----------------------- ---------
absl-py 0.15.0
astunparse 1.6.3
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
clang 5.0
flatbuffers 1.12
gast 0.4.0
google-auth 2.3.1
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.41.1
h5py 3.1.0
idna 3.3
keras 2.6.0
Keras-Preprocessing 1.1.2
Markdown 3.3.4
numpy 1.19.5
oauthlib 3.1.1
opt-einsum 3.3.0
pip 21.2.4
protobuf 3.19.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
setuptools 58.0.4
six 1.15.0
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow-estimator 2.6.0
tensorflow-macos 2.6.0
tensorflow-metal 0.2.0
termcolor 1.1.0
typing-extensions 3.7.4.3
urllib3 1.26.7
Werkzeug 2.0.2
wheel 0.37.0
wrapt 1.12.1

Check physical devices in TensorFlow

We can use tf.config.list_physical_devices() to check all available physical devices:

>>> import tensorflow as tf
>>>
>>> tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

We can see that, in the case of my 2018 MacBook Pro with the AMD Radeon Pro 560X dGPU, there are two physical devices: a CPU and a GPU.

Similar to using a native device or CUDA device in TensorFlow, we can declare a variable or define operations to run on a specific device using the with tf.device() syntax:

>>> with tf.device('/GPU'):
... a = tf.random.normal(shape=(2,), dtype=tf.float32)
... b = tf.nn.relu(a)
...
2021-10-26 12:51:24.844280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 560X
systemMemory: 16.00 GB
maxCacheSize: 2.00 GB
2021-10-26 12:51:24.845013: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-26 12:51:24.845519: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
>>>
>>> a
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-1.6457689, -0.2130392], dtype=float32)>
>>> b
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 0.], dtype=float32)>

You can see the print-out during initialization that the Metal device AMD Radeon Pro 560X is being set.

Training a CNN classifier

To demonstrate the training performance with tensorflow-metal against vanilla tensorflow (i.e. on CPU), I have written a script that trains a simply CNN model on MNIST using RMSProp for 50 epochs. Note that I am using TensorFlow Datasets to download MNIST , so please do pip install tensorflow_datasets if you want to run the exact same code.

The following are the training results and Activity Monitor screenshots of the CNN model trained with tensorflow (CPU) and tensorflow-metal (GPU).



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*