We looked at MNIST data in the previous article .The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems.
#First look at Neural Networks using Keras
#Classifying 28*28 grayscale images of 10 digits into their categories (classes) 0-9
#Using MNIST dataset ( 60K training and 10K test examples -samples )#Loading the dataset
from tensorflow.keras.datasets import mnist
(train_images,train_labels),(test_images,test_labels) = mnist.load_data()
The training data is a combination of array of 60K images of shape 28*28 and corresponding training labels , an array of 60K labels identifying the digits in the image ( between 0 and 9).
from matplotlib import pyplot as plt
from matplotlib import image as mpimgplt.title("Sample Image")
plt.xlabel("X pixel scaling")
plt.ylabel("Y pixels scaling")
sample_image = test_images[0]
plt.imshow(sample_image,cmap=plt.cm.binary)
plt.show()
Next, we import keras and layers from keras.
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([layers.Dense(512,activation ="relu"),
layers.Dense(10,activation="softmax")
])# model -> 2 dense (Fully connected) layers ,layer 1 : activation relu
# layer 2 : softmax classification wih 10 classes (for 10 digits)
This creates a network with two dense or fully connected layers.In a fully connected layer, each neuron or node is connected to every neuron in the previous and subsequent layers. The first layer is defined with “relu” activation. Relu or Rectified Linear unit is used to ensure non-linearity and is one of the most commonly used activation function in neural networks.For more information on relu refer to A Gentle Introduction to the Rectified Linear Unit (ReLU).
The second is Softmax layer which is used for multi class classification.It is often used as the activation function for the last layer.
It is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.[wikipedia]
(Also refer to Softmax Activation Function — How It Actually Works)
Network Topology or Architecture
A deep layer network is graph of layers. We used Sequential models above. however there are a wide variety of models available.For instance chatGPT are a type Generative Pre-trained Transformers, commonly known as GPT, are a family of neural network models that uses the transformer architecture and is a key advancement in artificial intelligence (AI) powering generative AI applications such as ChatGPT.[what is GPT?]
The compile( ),fit( ) and predict( ) methods
After defining the architecture we compile the model , that is , we configure the training method . In this step we choose the following:
- Loss : the quantity to be minimised to minimise inaccuracies of the model
- Optimizer: determines how the network will be updated .It is an implementation of the gradient descent algorithm .
- Metrics: the metrics that define the success of the model such as classification accuracy.
model.compile(optimizer ="rmsprop",
loss = "sparse_categorical_crossentropy",
metrics=["accuracy"])#RMSprop optimizer : The RMSprop optimizer is similar to the
# gradient descent algorithm with momentum.
#sparse categorical crossentropy : in multi class classification
# when classes are mutually exclusive
# accuracy : fraction of images correctly classifiedReferences
After compile comes the fit( ) method :
The fit method here specifies data to be trained on, the number of epochs or loops to trained for and the batch size for the mini-gradient descent algorithm.
train_images = train_images.reshape((60000,28*28))
train_images = train_images.astype("float32")/255
test_images = test_images.reshape((10000,28*28))
test_images = test_images.astype("float32")/255#images shape : 60000,28,28 of type uint8 with values in [0,255]
# reshape images to 60000,28*28 of type float32 with values in [0,1]
model.fit(train_images,train_labels,epochs=10, batch_size=128)
#Evaluate Accuracy
test_loss,test_acc = model.evaluate(test_images,test_labels)
print(f"test_acc is {test_acc}")
After the training is complete we evaluate the accuracy of the model of new data ,i.e the data the model hasn’t “seen” earlier . This is either the validation or the test data used to assess model accuracy.
Now the model is ready to be used and we can test it against new images of our own.This step is “inference” and we use the predict( ) method to test this model with an image of a handwritten digit.
image_test=mpimg.imread("test_image.png")
plt.imshow(image_test)
We will have to convert this image into a format suitable for the model.
from mnst_image_format_converter import imageprepare
image_test = imageprepare('test_image.png')
plt.imshow(image_test,cmap=plt.cm.binary )
import numpy as np
ImgNumpyData = np.array(image_test)
# Preprocess the input
type(ImgNumpyData)
ImgNumpyData.shape
ImgNumpyData = ImgNumpyData.reshape((1,784))
ImgNumpyData = ImgNumpyData.astype("float32")/255# Predict the digit
prediction_test = model.predict(ImgNumpyData)
Now we can check the prediction of the model by simply printing the output. The output is an array of 10 probabilities ( between 0 and 1) corresponding to the ten digits 0 to 9 , the index with the highest probability is the digit it most likely predicts the image to be.
Thus if we print prediction_test the result will be something like -:
array([[0.0000000e+00, 0.0000000e+00, 1.0000000e+00, 0.0000000e+00,
0.0000000e+00, 9.8792420e-30, 4.7834834e-25, 9.7836847e-37,
0.0000000e+00, 0.0000000e+00]], dtype=float32)
This doesn’t seem very promising but the predictions can be improved by processing the input image .
Comparing the predictions for the test images we can see that the model did much better on the test images .For instance, if we test the prediction for the sample image test_images[0] which was the digit 7, we get the following output-:
array([[4.2679893e-10, 7.8849511e-14, 4.1464507e-09, 9.0293481e-04,
1.2242080e-15, 1.0460559e-09, 4.0501939e-18, 9.9909711e-01,
1.4266670e-09, 5.2804108e-08]], dtype=float32)
Thus, counting from 0 , the seventh index has a value of 9.9909711e-01 or almost 0.999097 which is close to 1.
Enhancing model accuracy involves a variety of steps like optimizing data quality, feature engineering, selecting appropriate algorithms, tuning hyperparameters, and utilizing advanced techniques like ensembling or transfer learning.
Summary
The intention of this article is to serve as a starting point for understanding Keras as well as Tensorflow , particularly as the usage of Keras will potentially expand with the recent developments and the introduction of Keras 3.0.I would recommend the Keras website and the book Deep Learning with Python by François Chollet for those interested in exploring Keras further.
References :
Deep Learning with Python by François Chollet : https://www.manning.com/books/deep-learning-with-python-second-edition
https://www.tensorflow.org/guide/keras
Keras for Researchers: https://keras.io/getting_started/intro_to_keras_for_researchers/
Difference between Keras and TensorFlow : https://www.geeksforgeeks.org/difference-between-tensorflow-and-keras/
Wikipedia : https://en.wikipedia.org/wiki/TensorFlow and https://en.wikipedia.org/wiki/Keras
Be the first to comment