Deep Learning: Neural Network for Classification with Tensor Flow | by Naveed Ul Mustafa | Sep, 2023


Model Building:

Similar process goes into model building, as it did in regression, however, changes occurs in hyperparameters.

import tensorflow as tf

# set random seed
tf.random.set_seed(42)

# build a model
model_1 = tf.keras.Sequential([
tf.keras.layers.Dense(1, input_shape = (2,))
])

# Compile a model
model_1.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.SGD(),
metrics = ["accuracy"])

# fit the model
history = model_1.fit(X,y,epochs=100, verbose=0)

# model Evalustion
model_1.evaluate(X,y)

32/32 [==============================] - 0s 2ms/step - loss: 0.7042 - accuracy: 0.5000
[0.70424884557724, 0.5]

Apparently, the model is showing an accuracy of 50%. This shows that model is not learning any pattern and loss of 70% indicates how much model alignment with actual label is off. These are some massive numbers, and needs more investigation. This can be possible through model’s prediction visualization.

import numpy as np

# plot predicted values against actual data
def plot_decision_boundary(model, X, y):
"""
plot the decision boundary created by model predicting on X.
"""
# coordinates
x_min, x_max = X[:,0].min()-0.1, X[:,0].max()+0.1
y_min, y_max = X[:,1].min()-0.1, X[:,1].max()+0.1

# meshgrid
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), # 100 values evenly b/w x_min & x_max
np.linspace(y_min, y_max, 100))
# Create X values
x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together

# Make prediction
y_pred = model.predict(x_in)

# check for multi-class
if len(y_pred[0])>1:
print("doing Multiclass classification")

# Reshape the prediction
y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
else:
print("Binary classification")
y_pred = np.round(y_pred).reshape(xx.shape)

# plot the decision boundary
plt.contourf(xx,yy, y_pred, cmap=plt.cm.RdYlBu)
# plot the original Data
plt.scatter(X[:,0], X[:,1], c=y, s=40, cmap = plt.cm.RdYlBu)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())

# check what prediction our model is making
plot_decision_boundary(model_1, X=X, y=y)

The function `plot_decision_boundary()` takes the model, features (X) and labels (y). Further, it creates a meshgrid of the different values of X, and use our model to make predictions on those meshgrid values, and plots the prediction regions and line between zones.

It looks, our model predicting a linear prediction, to a non-linear dataset. This calls for adjusting the hyperparameters. What I learned from tweaking, in order:

  1. Add ‘relu’ activation function to input layer of neuron, and use Adam optimizer instead of SGD().
  2. Add 1 hidden layer with ‘relu’ activation function and an output layer. increases epochs to 600, to learn how long it takes to reach 99% accuracy and 0.3% loss — however, its is not sustainable, though I achieved the required figures.
  3. Adjust neuron in the input layer and hidden layer, add ‘sigmoid’ activation function to output layer, and return epochs to 100.
# Building NN with non-linear activation function + adding extra layers
tf.random.set_seed(42)

# create a model
model_7 = tf.keras.Sequential([
tf.keras.layers.Dense(4,activation=tf.keras.activations.relu),
tf.keras.layers.Dense(4,activation=tf.keras.activations.relu),
tf.keras.layers.Dense(1,activation=tf.keras.activations.sigmoid)
])

# compile
model_7.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.Adam(lr=0.001),
metrics=["accuracy"])

# fit
history = model_7.fit(X,y,epochs=100)

Well this time it appeared to have a clear boundary. this is the power of reevaluating the steps taken for the model accuracy.

Distinct from binary classification where labels depict a simple ‘yes’ or ‘no’ dichotomy, multi-class classification grapples with data where each sample could belong to one of more than two classes. Consider, for instance, identifying handwritten digits where each image could represent any number from 0 to 9. This broader categorization presents its own set of challenges and nuances compared to both binary and regression model building.

Here the focus is more towards visualization and model improvement.

Data Preparation:

The data is imported from `tensorflow.keras.datasets`. The dataset is about fashion_mnist, where a greyscale image belongs to either 1 of 10 classes. The dataset come prepared with training and test set, so not much to do with dataset. What needed to identify the train, test set and compare the unique labels with class names, given in the data description.

import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

# The data is already being sorted in train and test
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

# Create a small list, so we can index onto our training labels (to make it human readable)
class_name = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

# plot an example with its label
index_of_choice = 17
plt.imshow(train_data[index_of_choice], cmap = plt.cm.binary)
plt.title(class_name[train_labels[index_of_choice]])

output

Model building:

It is important to understand the core merits for flatten the data. It has advantages interms of reduction in computational complexiety, memory efficiency, etc, however, it’s essential to be aware that while flattening is useful in specific scenarios (like transitioning from CNNs to fully connected layers), it’s not always the best approach. For example, at the beginning of a CNN, you’d want to maintain an image’s 2D or 3D structure to ensure convolutional filters can capture spatial features effectively. Flattening too early in such architectures would lead to a loss of valuable spatial information. I applied the flattening for the same reasons.

Steps taken before model building:

  1. Since our labels are not available in one-hot encoding, therefore, instead of `tf.keras.losses.CategoricalCrossentropy()` i will be using `tf.keras.losses.SparseCategoricalCrossentropy()`.
  2. normalize the data — as features are images, with greyscale values from 0–255, this causes alot of trouble in the model and with finding appropriate patterns. So normalizing is the way to go.
# Normalization
train_data_norm = train_data/255.0
test_data_norm = test_data/255.0

# check the min & max of normalized data
train_data_norm.min(), train_data_norm.max()

# set random seed
tf.random.set_seed(42)

# create the model
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(4,activation = "relu"),
tf.keras.layers.Dense(4,activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax"),
])

model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])

# Fit the model
norm_history = model.fit(train_data_norm,
train_labels,
epochs=10,
validation_data = (test_data_norm, test_labels))

Normalizing shoots out training accuracy to 78.4% and test accuracy to almost 77%. The improvement of performance has been observed.

# visualize the loss
import pandas as pd

# plotnormalized data
pd.DataFrame(norm_history.history).plot(title = "Normalized data")

output

This clearly shows a rapid decrease in losses. This sharp decline indicates that the model is effectively learning and optimizing its weights to fit the data better with each passing epoch. In tandem with the decrease in loss, the model’s accuracy on the training data experienced a significant upward trajectory. Approaching 80% by the end of the 10th epoch reinforces the idea that the model is improving its predictive performance with each epoch.The normalization of data (by dividing by 255) likely played a key role in helping the neural network converge more efficiently. Normalized data generally leads to faster convergence and can help the optimizer navigate the loss surface more effectively. Even though there’s a significant improvement over the epochs, the fact that accuracy is at 80% and loss is at 60% by the end suggests there might still be room for further optimization. Depending on the complexity of the dataset and the problem at hand, further training, model tweaking, or additional data augmentation strategies might boost performance.

Evaluating a Multi-Class Classification Model

Building a model is just the first step. The subsequent processes of evaluation, optimization, and deployment are equally critical in the journey from data to actionable insights or products.

One way to evaluate a model is to produce a confusion Matrix.

from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(y_true, y_pred, classes=None, figsize = (10,10), text_size = 15):
# Create confusion matrix
cm = confusion_matrix(y_true, y_pred)
cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize the confusion matrix
n_classes = cm.shape[0]

# Make it more attractive
fig, ax = plt.subplots(figsize=figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap=plt.cm.Blues)
fig.colorbar(cax)

# Label the axis
if classes:
labels = classes
else:
labels = np.arange(cm.shape[0])

ax.set(title="Confusion matrix",
xlabel="Predicted label",
ylabel="True label",
xticks=np.arange(n_classes),
yticks=np.arange(n_classes),
xticklabels=labels,
yticklabels=labels)

# Set x-axis labels to the bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()

# Adjust label size
ax.yaxis.label.set_size(text_size)
ax.xaxis.label.set_size(text_size)
ax.title.set_size(text_size)

# Set threshold for different colors
threshold = (cm.max() + cm.min()) / 2

# Plot the text on each cell
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j] * 100:.1f}%)",
horizontalalignment="center",
color="white" if cm[i, j] > threshold else "black",
size= text_size)

plt.show()

# predicted labels
test_pred = model.predict(test_data_norm)

# convert all test_preds to integers
test_pred = test_pred.argmax(axis=1)

$ plot Confusion Matrix
plot_confusion_matrix(y_true = test_labels, y_pred = test_pred,
classes = class_name,
figsize = (20,10),
text_size = 7.5)

Output

I borrowed the confusion matrix code from Daniel. This confusion matrix indicates the losses model makes. model confuses T-shirt/top with shirt and dresses, Pullover with Coat & Shirt, Sneakers with Ankle boot. one appropriate way to merge T-shirt/Top with Shirt, in order to reduce the confusion.

The realm of classification in deep learning offers a fascinating blend of challenges and opportunities. As we delved deep into the nuances of binary and multi-class classification using neural networks, it’s evident that while the foundational steps in constructing these models remain consistent, the intricacies lie in the careful calibration of hyperparameters, data handling, and architecture choices. Through visualization and methodical evaluation, we can continually refine our models, ensuring they not only understand the patterns in our data but also generalize well to unseen samples.

Follow me on Github & X.





Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*