*Model Building:*

*Model Building:*

Similar process goes into model building, as it did in regression, however, changes occurs in hyperparameters.

`import tensorflow as tf`# set random seed

tf.random.set_seed(42)

# build a model

model_1 = tf.keras.Sequential([

tf.keras.layers.Dense(1, input_shape = (2,))

])

# Compile a model

model_1.compile(loss = tf.keras.losses.BinaryCrossentropy(),

optimizer = tf.keras.optimizers.SGD(),

metrics = ["accuracy"])

# fit the model

history = model_1.fit(X,y,epochs=100, verbose=0)

# model Evalustion

model_1.evaluate(X,y)

`32/32 [==============================] - 0s 2ms/step - loss: 0.7042 - accuracy: 0.5000`

[0.70424884557724, 0.5]

Apparently, the model is showing an accuracy of 50%. This shows that model is not learning any pattern and loss of 70% indicates how much model alignment with actual label is off. These are some massive numbers, and needs more investigation. This can be possible through model’s prediction visualization.

`import numpy as np`# plot predicted values against actual data

def plot_decision_boundary(model, X, y):

"""

plot the decision boundary created by model predicting on X.

"""

# coordinates

x_min, x_max = X[:,0].min()-0.1, X[:,0].max()+0.1

y_min, y_max = X[:,1].min()-0.1, X[:,1].max()+0.1

# meshgrid

xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), # 100 values evenly b/w x_min & x_max

np.linspace(y_min, y_max, 100))

# Create X values

x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together

# Make prediction

y_pred = model.predict(x_in)

# check for multi-class

if len(y_pred[0])>1:

print("doing Multiclass classification")

# Reshape the prediction

y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)

else:

print("Binary classification")

y_pred = np.round(y_pred).reshape(xx.shape)

# plot the decision boundary

plt.contourf(xx,yy, y_pred, cmap=plt.cm.RdYlBu)

# plot the original Data

plt.scatter(X[:,0], X[:,1], c=y, s=40, cmap = plt.cm.RdYlBu)

plt.xlim(xx.min(), xx.max())

plt.ylim(yy.min(), yy.max())

# check what prediction our model is making

plot_decision_boundary(model_1, X=X, y=y)

The function `plot_decision_boundary()` takes the model, features (X) and labels (y). Further, it creates a meshgrid of the different values of X, and use our model to make predictions on those meshgrid values, and plots the prediction regions and line between zones.

It looks, our model predicting a linear prediction, to a non-linear dataset. This calls for adjusting the hyperparameters. What I learned from tweaking, in order:

- Add ‘relu’ activation function to input layer of neuron, and use Adam optimizer instead of SGD().
- Add 1 hidden layer with ‘relu’ activation function and an output layer. increases epochs to 600, to learn how long it takes to reach 99% accuracy and 0.3% loss — however, its is not sustainable, though I achieved the required figures.
- Adjust neuron in the input layer and hidden layer, add ‘sigmoid’ activation function to output layer, and return epochs to 100.

`# Building NN with non-linear activation function + adding extra layers`

tf.random.set_seed(42)# create a model

model_7 = tf.keras.Sequential([

tf.keras.layers.Dense(4,activation=tf.keras.activations.relu),

tf.keras.layers.Dense(4,activation=tf.keras.activations.relu),

tf.keras.layers.Dense(1,activation=tf.keras.activations.sigmoid)

])

# compile

model_7.compile(loss = tf.keras.losses.BinaryCrossentropy(),

optimizer = tf.keras.optimizers.Adam(lr=0.001),

metrics=["accuracy"])

# fit

history = model_7.fit(X,y,epochs=100)

Well this time it appeared to have a clear boundary. this is the power of reevaluating the steps taken for the model accuracy.

Distinct from binary classification where labels depict a simple ‘yes’ or ‘no’ dichotomy, multi-class classification grapples with data where each sample could belong to one of more than two classes. Consider, for instance, identifying handwritten digits where each image could represent any number from 0 to 9. This broader categorization presents its own set of challenges and nuances compared to both binary and regression model building.

Here the focus is more towards visualization and model improvement.

## Data Preparation:

The data is imported from `tensorflow.keras.datasets`. The dataset is about fashion_mnist, where a greyscale image belongs to either 1 of 10 classes. The dataset come prepared with training and test set, so not much to do with dataset. What needed to identify the train, test set and compare the unique labels with class names, given in the data description.

`import tensorflow as tf`

from tensorflow.keras.datasets import fashion_mnist# The data is already being sorted in train and test

(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

# Create a small list, so we can index onto our training labels (to make it human readable)

class_name = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

# plot an example with its label

index_of_choice = 17

plt.imshow(train_data[index_of_choice], cmap = plt.cm.binary)

plt.title(class_name[train_labels[index_of_choice]])

## Model building:

It is important to understand the core merits for flatten the data. It has advantages interms of reduction in computational complexiety, memory efficiency, etc, however, it’s essential to be aware that while flattening is useful in specific scenarios (like transitioning from CNNs to fully connected layers), it’s not always the best approach. For example, at the beginning of a CNN, you’d want to maintain an image’s 2D or 3D structure to ensure convolutional filters can capture spatial features effectively. Flattening too early in such architectures would lead to a loss of valuable spatial information. I applied the flattening for the same reasons.

Steps taken before model building:

- Since our labels are not available in one-hot encoding, therefore, instead of `tf.keras.losses.CategoricalCrossentropy()` i will be using `tf.keras.losses.SparseCategoricalCrossentropy()`.
- normalize the data — as features are images, with greyscale values from 0–255, this causes alot of trouble in the model and with finding appropriate patterns. So normalizing is the way to go.

`# Normalization`

train_data_norm = train_data/255.0

test_data_norm = test_data/255.0# check the min & max of normalized data

train_data_norm.min(), train_data_norm.max()

`# set random seed`

tf.random.set_seed(42)# create the model

model = tf.keras.Sequential([

tf.keras.layers.Flatten(input_shape=(28,28)),

tf.keras.layers.Dense(4,activation = "relu"),

tf.keras.layers.Dense(4,activation = "relu"),

tf.keras.layers.Dense(10,activation = "softmax"),

])

model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(),

optimizer = tf.keras.optimizers.Adam(),

metrics = ["accuracy"])

# Fit the model

norm_history = model.fit(train_data_norm,

train_labels,

epochs=10,

validation_data = (test_data_norm, test_labels))

Normalizing shoots out training accuracy to 78.4% and test accuracy to almost 77%. The improvement of performance has been observed.

`# visualize the loss`

import pandas as pd# plotnormalized data

pd.DataFrame(norm_history.history).plot(title = "Normalized data")

This clearly shows a rapid decrease in losses. This sharp decline indicates that the model is effectively learning and optimizing its weights to fit the data better with each passing epoch. In tandem with the decrease in loss, the model’s accuracy on the training data experienced a significant upward trajectory. Approaching 80% by the end of the 10th epoch reinforces the idea that the model is improving its predictive performance with each epoch.The normalization of data (by dividing by 255) likely played a key role in helping the neural network converge more efficiently. Normalized data generally leads to faster convergence and can help the optimizer navigate the loss surface more effectively. Even though there’s a significant improvement over the epochs, the fact that accuracy is at 80% and loss is at 60% by the end suggests there might still be room for further optimization. Depending on the complexity of the dataset and the problem at hand, further training, model tweaking, or additional data augmentation strategies might boost performance.

## Evaluating a Multi-Class Classification Model

Building a model is just the first step. The subsequent processes of evaluation, optimization, and deployment are equally critical in the journey from data to actionable insights or products.

One way to evaluate a model is to produce a confusion Matrix.

`from sklearn.metrics import confusion_matrix`def plot_confusion_matrix(y_true, y_pred, classes=None, figsize = (10,10), text_size = 15):

# Create confusion matrix

cm = confusion_matrix(y_true, y_pred)

cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize the confusion matrix

n_classes = cm.shape[0]

# Make it more attractive

fig, ax = plt.subplots(figsize=figsize)

# Create a matrix plot

cax = ax.matshow(cm, cmap=plt.cm.Blues)

fig.colorbar(cax)

# Label the axis

if classes:

labels = classes

else:

labels = np.arange(cm.shape[0])

ax.set(title="Confusion matrix",

xlabel="Predicted label",

ylabel="True label",

xticks=np.arange(n_classes),

yticks=np.arange(n_classes),

xticklabels=labels,

yticklabels=labels)

# Set x-axis labels to the bottom

ax.xaxis.set_label_position("bottom")

ax.xaxis.tick_bottom()

# Adjust label size

ax.yaxis.label.set_size(text_size)

ax.xaxis.label.set_size(text_size)

ax.title.set_size(text_size)

# Set threshold for different colors

threshold = (cm.max() + cm.min()) / 2

# Plot the text on each cell

for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):

plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j] * 100:.1f}%)",

horizontalalignment="center",

color="white" if cm[i, j] > threshold else "black",

size= text_size)

plt.show()

# predicted labels

test_pred = model.predict(test_data_norm)

# convert all test_preds to integers

test_pred = test_pred.argmax(axis=1)

$ plot Confusion Matrix

plot_confusion_matrix(y_true = test_labels, y_pred = test_pred,

classes = class_name,

figsize = (20,10),

text_size = 7.5)

I borrowed the confusion matrix code from Daniel. This confusion matrix indicates the losses model makes. model confuses T-shirt/top with shirt and dresses, Pullover with Coat & Shirt, Sneakers with Ankle boot. one appropriate way to merge T-shirt/Top with Shirt, in order to reduce the confusion.

The realm of classification in deep learning offers a fascinating blend of challenges and opportunities. As we delved deep into the nuances of binary and multi-class classification using neural networks, it’s evident that while the foundational steps in constructing these models remain consistent, the intricacies lie in the careful calibration of hyperparameters, data handling, and architecture choices. Through visualization and methodical evaluation, we can continually refine our models, ensuring they not only understand the patterns in our data but also generalize well to unseen samples.

## Be the first to comment