
TensorFlow introduced PluggableDevice in mid-2021 which enables hardware manufacturers to seamlessly integrate their accelerators (e.g. GPUs, TPUs, NPUs) into the TensorFlow ecosystem. This allows users to enjoy accelerated training on non-CUDA devices with minimal modification in their code. More importantly, hardware manufacturers no longer have to fork and implement their own version of TensorFlow (e.g. AMD ROCm port) and can purely focus on the communication layers between TensorFlow and device-level operations. With the recent public release of macOS Monterey, Apple has added Metal support for the PluggableDevice architecture, hence, it is now possible to train TensorFlow models with the dedicated GPU (dGPU) on MacBook Pros and iMacs with ease (sort of).
In this mini-guide, I will walk through how to install tensorflow-metal
to enable dGPU training on Intel MacBook Pro and iMac. In addition, I train a simple CNN image classifier on my MacBook Pro, equipped with an AMD Radeon Pro 560X, to demonstrate the accelerated performance.
Create development environment
I personally prefer miniconda, but other environment managers such as anaconda and virtualenv should also work in a similar fashion.
We first create a new conda
environment named tf-metal
with Python 3.8
conda create -n tf-metal python=3.8
We then activate the environment
conda activate tf-metal
Install Metal enabled TensorFlow
We have to install the following pip packages: tensorflow-macos
and tensorflow-metal
. Normally, you can simply do pip install tensorflow-macos tensorflow-metal
and Bob’s your uncle. However, you might receive the following error since both packages are built against post-macOS 11 SDK:
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
To bypass the version compatibility issue, we need to use the following flag SYSTEM_VERSION_COMPAT=0
along with pip install
:
SYSTEM_VERSION_COMPAT=0 pip install tensorflow-macos tensorflow-metal
Both packages should now be installed:
(tf-metal) ➜ ~ pip list
Package Version
----------------------- ---------
absl-py 0.15.0
astunparse 1.6.3
cachetools 4.2.4
certifi 2021.10.8
charset-normalizer 2.0.7
clang 5.0
flatbuffers 1.12
gast 0.4.0
google-auth 2.3.1
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.41.1
h5py 3.1.0
idna 3.3
keras 2.6.0
Keras-Preprocessing 1.1.2
Markdown 3.3.4
numpy 1.19.5
oauthlib 3.1.1
opt-einsum 3.3.0
pip 21.2.4
protobuf 3.19.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
requests 2.26.0
requests-oauthlib 1.3.0
rsa 4.7.2
setuptools 58.0.4
six 1.15.0
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow-estimator 2.6.0
tensorflow-macos 2.6.0
tensorflow-metal 0.2.0
termcolor 1.1.0
typing-extensions 3.7.4.3
urllib3 1.26.7
Werkzeug 2.0.2
wheel 0.37.0
wrapt 1.12.1
Check physical devices in TensorFlow
We can use tf.config.list_physical_devices()
to check all available physical devices:
>>> import tensorflow as tf
>>>
>>> tf.config.list_physical_devices()
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
We can see that, in the case of my 2018 MacBook Pro with the AMD Radeon Pro 560X dGPU, there are two physical devices: a CPU
and a GPU
.
Similar to using a native device or CUDA device in TensorFlow, we can declare a variable or define operations to run on a specific device using the with tf.device()
syntax:
>>> with tf.device('/GPU'):
... a = tf.random.normal(shape=(2,), dtype=tf.float32)
... b = tf.nn.relu(a)
...
2021-10-26 12:51:24.844280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 560XsystemMemory: 16.00 GB
maxCacheSize: 2.00 GB2021-10-26 12:51:24.845013: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-26 12:51:24.845519: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
>>>
>>> a
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-1.6457689, -0.2130392], dtype=float32)>
>>> b
<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 0.], dtype=float32)>
You can see the print-out during initialization that the Metal device AMD Radeon Pro 560X
is being set.
Training a CNN classifier
To demonstrate the training performance with tensorflow-metal
against vanilla tensorflow
(i.e. on CPU
), I have written a script that trains a simply CNN model on MNIST
using RMSProp
for 50 epochs. Note that I am using TensorFlow Datasets to download MNIST
, so please do pip install tensorflow_datasets
if you want to run the exact same code.
The following are the training results and Activity Monitor screenshots of the CNN model trained with tensorflow
(CPU) and tensorflow-metal
(GPU).
Be the first to comment