Apple M2 Max GPU vs Nvidia V100 (Part 2): Big Models and Energy Efficiency | by Fabrice Daniel | Feb, 2024

Compare Apple Silicon M2 Max GPU performances and energy efficiency to Nvidia V100 for training CNN big models with TensorFlow

In my previous article, I compared M2 Max GPU with Nvidia V100, P100, and T4 on MLP, CNN, and LSTM training. The results show that M2 Max can perform very well, exceeding Nvidia GPUs on small model training. But as stated in the article:

[…] these metrics can only be considered for similar neural network types and depths as used in this test.

So this second part tests bigger models, focusing on CNN only and comparing M2 Max with the most powerful GPU previously tested: the Nvidia V100.

Another point considered in this test is memory management. While the Nvidia GPU is losing a lot of time in memory transfer, the M2 Max GPU has direct access to the unified memory, so it doesn’t require any delay before training the model. Since, as the results shown in the previous article, this makes a big difference for small models trained on a small number of epochs, we remove this effect for bigger models to compare the pure training time only.

For this purpose, we train models on ten epochs, but instead of using the total training time, we capture and average the step’s training duration from the second epoch to the last one. This removes the initialization and memory transfer overhead, which is also partially reflected in the first epoch.

And the last, but nowadays most crucial point, is the energy consumed by the GPUs to train a big model. As we will show here, this is where M2 Max is a real game changer.

In this article, you will find the following tests:

  • Training four custom CNN ranging from 122,570 to 1,649,482 parameters on CIFAR-10¹ with batch size ranging from 32 to 1024
  • Training ResNet50 model on CIFAR-10 with batch size ranging from 32 to 1024

Then, in the two cases, I will compare:

  • the raw training performances (epoch duration in milliseconds)
  • the energy consumption per epoch
  • the energy efficiency ratio between the two GPUs

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.