Building a Computer Vision Portfolio: Image Classification with TensorFlow and Keras | by Luis IH

Photo by Clark Street Mercantile on Unsplash

As a starting data scientist venturing into the exciting realm of computer vision, building a portfolio is a crucial step toward showcasing your skills. In this article, we’ll tackle the hands-on task of creating an image classification model using TensorFlow and Keras. Don’t worry if you’re new to the field — we’ll break down the process step by step.

Picture this: you’re about to embark on a journey where your code can teach a computer to recognize fashion items in images. This isn’t just a theoretical exercise — it’s a tangible project that you can proudly add to your portfolio, demonstrating your growing expertise in computer vision.

Before we delve into the code, let’s briefly demystify Convolutional Neural Networks (CNNs). As a data scientist, understanding the basics is key. Think of CNNs as specialized networks designed for image-related tasks. They automatically learn patterns and features within images, mimicking the way humans recognize objects visually.

In simpler terms, CNNs break down images into smaller, manageable pieces, allowing the model to understand intricate details. It’s like teaching a computer to see and understand the visual world — a powerful skill in the realm of computer vision.

If you need to dive deeper on CNNs here is an article that can help:

Now, let’s get our hands dirty with some code! For this project, we’ll use the beginner-friendly Fashion MNIST dataset, available in TensorFlow. This dataset contains grayscale images of fashion items, making it perfect for honing your image classification skills.

Let’s prepare the dataset:

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt# Load the Fashion MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()
# Shuffle the dataset
indices = np.arange(train_images.shape[0])
np.random.shuffle(indices)
train_images, train_labels = train_images[indices], train_labels[indices]
# Split the dataset into training, validation, and test sets
split_ratio = 0.8  # 80% for training, 10% for validation, and 10% for testing
total_samples = len(train_images)
train_split = int(total_samples * split_ratio)
train_data, train_labels = train_images[:train_split], train_labels[:train_split]
validation_data, validation_labels = train_images[train_split:-1], train_labels[train_split:-1]
# Preprocess the data to have float values between 0 and 1
train_data = train_data / 255.0
validation_data = validation_data / 255.0
test_data = test_images / 255.0

Let’s visualize one image for each of the 10 classes to understand better our dataset

# Display one image from each class.
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]plt.figure(figsize=(10, 5))
for i in range(10):
class_images = train_data[train_labels == i]
plt.subplot(2, 5, i + 1)
plt.imshow(class_images[0].reshape(28, 28), cmap='gray')
plt.title(class_names[i])
plt.axis('off')
plt.show()

Quick comment about one hot encoding of the labels: Because our classes (names of clothes) have no order (like for instance classifying clothes sizes like S, M or L) associating them with 1 number (0,1…9) is not ideal for a neural network as it might learn, for instance, that a shoe is greater than a dress. One hot encodeing helps by mapping the number i to an array of all zero elements except the one at the position of index i. (i=3 maps into [0,0,0,1,0,0,0,0,0,0]. Let’s add this encoding step:

# One-hot encode the labels
train_labels = to_categorical(train_labels)
validation_labels = to_categorical(validation_labels)
test_labels = to_categorical(test_labels)

Perfect! Now we need to create the model.

def create_cnn_model():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
return model

Let’s explain some aspects:
1-MaxPooling: It is a best practice to add a Max pooling layer after a Convolutional layer. It reduces the number of parameters and computations in the network. By selecting the maximum value from a group of neighboring pixels, it retains the most important features while discarding less significant information. This reduction in dimensionality helps in speeding up training and reducing the risk of overfitting. More details here.
2-The Last Layer: In a classification you need to have a dense layer at the end to return the winner class this is the softmax activation layer.
3-The compile Method: Here is where we indicate the optimizer, the loss function and metrics to track. All these are quite standard values for a multiclass (more than binary) classification.

Still with me? Great! Let’s create and print the model

model = create_cnn_model()
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)           (None, 26, 26, 32)        320       max_pooling2d_2 (MaxPooling  (None, 13, 13, 32)       0         
2D)                                                             
conv2d_4 (Conv2D)           (None, 11, 11, 64)        18496     
max_pooling2d_3 (MaxPooling  (None, 5, 5, 64)         0         
2D)                                                             
conv2d_5 (Conv2D)           (None, 3, 3, 64)          36928     
flatten_1 (Flatten)         (None, 576)               0         
dense_2 (Dense)             (None, 64)                36928     
dense_3 (Dense)             (None, 10)                650       
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0

Alright let’s start the fun part and train the model!

# This might take a few minutes
history = model.fit(train_data, train_labels, epochs=10, validation_data=(validation_data, validation_labels))
# Final results: loss: 0.1428 - accuracy: 0.9452 - val_loss: 0.2674 - val_accuracy: 0.9109

Finally! our model is trained! Now we need to test it on images that have not been used during training and validation.

test_loss, test_accuracy = model.evaluate(test_data, test_labels)
# loss: 0.2968 - accuracy: 0.9059

You might be wondering how to improve your model even more. There are many ways like adding more data, adding random transformations to augment the data you already have… You can also improve the results by using a pretrained model.

Fantastic job! You’ve just taken a significant step toward establishing your presence in the world of computer vision. By tackling image classification with TensorFlow and Keras, you’ve not only learned the basics but also created a portfolio-worthy project.

As you continue to explore and expand your skills, remember that every line of code contributes to your growth. You’re now armed with a valuable project showcasing your capabilities in computer vision. Keep coding, keep learning, and watch your portfolio flourish!

Source link

Building a Computer Vision Portfolio: Image Classification with TensorFlow and Keras | by Luis IH | Mar, 2024

Be the first to comment

Leave a Reply Cancel reply