Basic Image Categorization:
Embarking on our machine learning journey for image categorization, it’s helpful to begin with the basics. Similar to starting with simple recipes when learning to cook, our initial venture is into basic image processing with simple, grayscale images. The Fashion MNIST dataset, an educational staple in machine learning, offers us a collection of basic grayscale images of fashion items. This dataset provides an ideal starting point, devoid of the complexities of color and detailed backgrounds, allowing both us and our model to focus on learning the fundamental distinctions between image classes. Below is a glimpse of what the images look like:
Let’s dive into the code!
import tensorflow as tf
from tensorflow.keras import layers, models# (1) Loading a built in dataset in keras - Fashion MNIST
fashion_dataset = tf.keras.datasets.fashion_mnist
(train_imgs, train_labels), (eval_imgs, eval_labels) = fashion_dataset.load_data()
# (2) Normalizing the pixel values
# (this just converts this to a value between 0 and 1)
train_imgs_normalized = train_imgs / 255.0
eval_imgs_normalized = eval_imgs / 255.0
# (3) Building the neural network by declaring the layers.
fashion_model = models.Sequential([
layers.Flatten(input_shape=(28, 28)),
layers.Dense(150, activation='relu'),
layers.Dense(10, activation='softmax')
])
# (4) Compiling the model by specifying the optimizer and loss algorythms to use.
fashion_model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# (5) Training the model by running 10 epochs (iterations)
fashion_model.fit(train_imgs_normalized, train_labels, epochs=10)
So let’s unpack what is happening here.
#1
: We’re loading the Fashion MNIST dataset, which contains 60,000 training images and 10,000 test images of fashion items (like shirts, bags, shoes, etc.) in grayscale. Each image measures 28×28 pixels. A sample of these images is displayed above.#2
: Normalizing the image pixel values to a range between 0 and 1 is crucial as it accelerates the training process and enhances model performance. The pixel intensity ranges from 0 (white) to 255 (black).#3
: Here, we construct our neural network by defining the input, middle, and output layers.Flatten()
specifies the input shape of the data entering our neural network. Each layer passes its output to the next, with the dimensions needing to match.
–Sequential
: This model structure indicates a linear stack of layers, a common architecture in neural networks.
–Flatten
: The first layer, flattening the 28×28 images into a 1D array of 784 pixels. This layer focuses on reformatting the data dimensions without any ‘learning’.
–Dense
: These fully connected neural layers include the first Dense layer with 150 nodes (or neurons), a modifiable hyperparameter. Therelu
(Rectified Linear Unit) activation function returns positive numbers needed for classifying one of the dataset’s 10 classes. The second Dense layer is the output layer with 10 nodes, each corresponding to a clothing class, utilizing thesoftmax
activation function for probability distribution across the classes.#4
: Compiling the model involves specifying the optimizer, loss function, and evaluation metrics.
–sgd
optimizer is an efficient gradient descent algorithm. An optimizers role is to take the values, the previous guess, and the errors (loss) and it tries to improve the next guess.
–sparse_categorical_crossentropy
is used as the loss function for multi-class classification problems like ours. A loss function’s job is to take the guess and compare it to the correct answer (label) and determine how well it did. It then tells the optimizer what the “loss” or errors were.
–accuracy
is telling the the model we want to track and output the accuracy as we go through training epochs.#5
: The model is trained with the training images and labels. Settingepochs=10
means the entire dataset passes through the neural network ten times, aiding the model’s learning process. However, this is a simplification, as we’ll explore further in overfitting discussions.
Now let’s run this code and we will see an output similar to this:
From our training, we achieved an accuracy of 87% — a commendable result for such a basic model. However, this accuracy reflects the model’s performance on familiar training data, not its effectiveness on new, unseen data. This distinction is crucial for understanding overfitting, a topic we’ll be exploring further. To illustrate the difference in accuracy between training data and actual tests (with unfamiliar images), let’s evaluate the model using test data:
# We are changing previous code from above to show epochs
# (this will help demo overfitting)
fashion_model.fit(train_imgs_normalized, train_labels, epochs=10)# Evaluate the model on the test dataset
test_loss, test_accuracy = fashion_model.evaluate(eval_imgs_normalized, eval_labels)
print('Test set accuracy:', test_accuracy)
Here, we use .evaluate()
to gauge the model’s performance on test images. The output will look like this:
We will now get the following output when run the code:
We see Test set accuracy: 0.841...
, this is saying that after running all 313 test sets, the model got 84% accuracy for test data. That’s 3 points lower than the training accuracy. This is actually pretty good! So let’s do what might appear to be the obvious way to improve accuracy.. Let’s train more… let’s do 100 epochs!
# Again, let's change this line of our code to have keras train our model 100 epochs
fashion_model.fit(train_imgs_normalized, train_labels, epochs=100)
The last five epochs of model training show this accuracy:
And here’s the evaluation performance:
After substantially more training, we only see a 4% increase in test accuracy, despite the training accuracy jumping from 87% to 94%. This is a classic example of overfitting, where the model becomes overly specialized to the training data. A more in-depth exploration of overfitting will follow in parts 2 and 3 of this series!
Be the first to comment