Setting up the GPU-Based LLM Training Machine | by Leo Tisljaric, PhD | Feb, 2024


Setup guide for your local Ubuntu-based machine for training PyTorch and TensorFlow AI models using GPU acceleration

Supercomputer (Image by: Author; Source: OpenAI DALL-E)

Installing all the necessary tools and drivers on your local machine can be very frustrating. Especially when you need to track compatibilities and dependencies of different tools. In this article, you will find a guide to set up your local Ubuntu-based machine for training AI models using GPU acceleration.

Before we continue, give me a second of your time. If you want to support my work, you can do it through a secure PayPal link :

You can use this article as a knowledge base or as a learning material that will help you grasp the quite complicated GPU-based machine setup.

Contents:

  1. Assumptions & Limitations
  2. Preparations
  3. Installation — CUDA & CUDNN
  4. Test Installation

This article will show you how to setup the machine with several assumptions & limitations:

  • You are using Ubuntu (Ubuntu 22.04).
  • You have Nvidia GPU/s installed on your machine.
  • You have sudo privileges on your machine.

So, if you are aware of the mentioned assumptions, you will find this article useful. Let’s start!

Check GPU compatibility and compute capability:

CUDA compute capability (Source: https://developer.nvidia.com/cuda-gpus#compute)

Mine GPU is a GeForce RTX 2080 SUPER with a computing capability of 7.5. Let’s check drive support for this GPU:



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*