The Convolutional Classifier. Introduction | by DhanushKumar_idk | Nov, 2023

The term “Convolutional Classifier” typically refers to a type of neural network architecture used for image classification tasks. Convolutional Neural Networks (CNNs) are a class of deep neural networks designed to process and analyze visual data. They have proven to be highly effective in tasks such as image recognition, object detection, and image classification.

Convolution is the mathematical operation that gives the layers of a convnet their unique structure.

Here’s a brief overview of the key components and concepts related to Convolutional Classifiers:

Convolutional Layers:

  • The core building blocks of CNNs are convolutional layers. These layers apply convolution operations to input data, allowing the network to learn hierarchical features from the input images. Convolutional operations involve sliding small filters (also called kernels) over the input data to extract patterns and features.

Pooling Layers:

  • Pooling layers are often used in conjunction with convolutional layers. Pooling reduces the spatial dimensions of the input volume, which helps in decreasing the computational complexity and controlling overfitting. Common pooling operations include max pooling and average pooling.

Activation Functions:

  • Non-linear activation functions, such as ReLU (Rectified Linear Unit), are applied to the output of convolutional and pooling layers. These functions introduce non-linearity to the model, enabling it to learn complex relationships in the data.

Fully Connected Layers:

  • Following the convolutional and pooling layers, one or more fully connected layers are typically added to the network. These layers connect every neuron in one layer to every neuron in the next layer, allowing the model to make predictions based on the learned features.

Softmax Activation:

  • For classification tasks, a softmax activation function is often applied to the final layer. This function converts the network’s raw output into probability scores, indicating the likelihood of each class.

Loss Function:

  • The choice of a loss function depends on the specific task. For classification, cross-entropy loss is commonly used. It measures the difference between the predicted probabilities and the true labels.


  • CNNs are trained using backpropagation and optimization algorithms (e.g., stochastic gradient descent) to minimize the loss function. The training process involves feeding input data through the network, computing the loss, and updating the network’s parameters to improve performance.

A convnet used for image classification consists of two parts: a convolutional base and a dense head.

The base is used to extract the features from an image. It is formed primarily of layers performing the convolution operation, but often includes other kinds of layers as well.

The head is used to determine the class of the image. It is formed primarily of dense layers, but might include other layers like dropout.

The whole process goes something like this:

The features actually extracted look a bit different, but it gives the idea.

The goal of the network during training is to learn two things:

  1. which features to extract from an image (base),
  2. which class goes with what features (head).

These days, convnets are rarely trained from scratch. More often, we reuse the base of a pretrained model. To the pretrained base we then attach an untrained head. In other words, we reuse the part of a network that has already learned to do 1. Extract features, and attach to it some fresh layers to learn 2. Classify.

Because the head usually consists of only a few dense layers, very accurate classifiers can be created from relatively little data.

Reusing a pretrained model is a technique known as transfer learning. It is so effective, that almost every image classifier these days will make use of it.

The Architecture includes ranging from input,convolution and so on ,end with output layer.

Input Layer:

  • Represents the raw input data, such as an image or a sequence of words.

Convolutional Layers:

  • These layers perform convolution operations to extract features from the input data. Convolutional filters slide across the input to detect patterns like edges, textures, and more complex structures.
  • The convolution operation involves sliding a filter (or kernel) over the input data to compute the dot product between the filter and the local regions of the input. Mathematically, for a 2D convolution:
(I∗K)(i,j)=∑m ∑n (I(i+m,j+n)⋅K(m,n))

I is the input image, K is the convolutional filter, and(IK)(i,j) is the result of the convolution at position (i,j)

Activation Function (ReLU) Layers:

  • Typically applied after convolutional layers to introduce non-linearity. The Rectified Linear Unit (ReLU) activation function is commonly used.

Pooling (Subsampling or Down-sampling) Layers:

  • Pooling layers reduce the spatial dimensions of the input volume, helping to decrease computation and control overfitting. Max pooling and average pooling are common operations.
Max Pooling(i,j)=max(region(i,j))

region(i,j) refers to the local region centered at position

Fully Connected (Dense) Layers:

  • Neurons in a fully connected layer are connected to all neurons in the previous layer. These layers are often used in the later stages of the network to make predictions based on the learned features.

Here, yi​ is the output of neuron i, wij​ is the weight between neuron i and input j, xj​ is the input from neuron j, bi​ is the bias term for neuron i, and f(⋅) is the activation function.

Output Layer:

  • Produces the final output of the network. The number of neurons in this layer depends on the task. For example, in classification tasks, the output layer might have one neuron per class with a softmax activation for multi-class classification.

Flatten Layer:

  • Flattens the output from the previous layer into a 1D vector. This is often used to transition from convolutional/pooling layers to fully connected layers.

In more complex architectures, you might see additional layers, such as:

Batch Normalization Layers:

  • Applied to normalize the input of each layer, reducing internal covariate shift and potentially speeding up training.

Dropout Layers:

  • Randomly “drops out” (ignores) a fraction of neurons during training to prevent overfitting.

Skip Connections (Residual Connections):

  • Connections that skip one or more layers to ease the training of very deep networks.

This article demonstrates training a simple Convolutional Neural Network (CNN) to classify CIFAR images.

Import TensorFlow
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image:

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']

for i in range(25):
# The CIFAR labels happen to be arrays,
# which is why you need the extra index

The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. color_channels refers to (R,G,B).

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Let’s display the architecture of your model so far:

Model: "sequential"
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 30, 30, 32) 896

max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0

conv2d_1 (Conv2D) (None, 13, 13, 64) 18496

max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0

conv2d_2 (Conv2D) (None, 4, 4, 64) 36928

Total params: 56320 (220.00 KB)
Trainable params: 56320 (220.00 KB)
Non-trainable params: 0 (0.00 Byte)

Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

To complete the model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs.

model.add(layers.Dense(64, activation='relu'))

Here’s the complete architecture of your model:

Model: "sequential"
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 30, 30, 32) 896

max_pooling2d (MaxPooling2 (None, 15, 15, 32) 0

conv2d_1 (Conv2D) (None, 13, 13, 64) 18496

max_pooling2d_1 (MaxPoolin (None, 6, 6, 64) 0

The network summary shows that (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.


history =, train_labels, epochs=10,
validation_data=(test_images, test_labels))

Epoch 1/10
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1698386490.372362 489369 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
1563/1563 [==============================] - 10s 5ms/step - loss: 1.5211 - accuracy: 0.4429 - val_loss: 1.2497 - val_accuracy: 0.5531
Epoch 2/10
1563/1563 [==============================] - 6s 4ms/step - loss: 1.1408 - accuracy: 0.5974 - val_loss: 1.1474 - val_accuracy: 0.6023
Epoch 3/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.9862 - accuracy: 0.6538 - val_loss: 0.9759 - val_accuracy: 0.6582
Epoch 4/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.8929 - accuracy: 0.6879 - val_loss: 0.9412 - val_accuracy: 0.6702
Epoch 5/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.8183 - accuracy: 0.7131 - val_loss: 0.8830 - val_accuracy: 0.6967
Epoch 6/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.7588 - accuracy: 0.7334 - val_loss: 0.8671 - val_accuracy: 0.7039
Epoch 7/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.7126 - accuracy: 0.7518 - val_loss: 0.8972 - val_accuracy: 0.6897
Epoch 8/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.6655 - accuracy: 0.7661 - val_loss: 0.8412 - val_accuracy: 0.7111
Epoch 9/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.6205 - accuracy: 0.7851 - val_loss: 0.8581 - val_accuracy: 0.7109
Epoch 10/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.5872 - accuracy: 0.7937 - val_loss: 0.8817 - val_accuracy: 0.7113
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')

test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

313/313 - 1s - loss: 0.8817 - accuracy: 0.7113 - 655ms/epoch - 2ms/step

The simple CNN has achieved a test accuracy of over 70%.The following articles would cover more content related to convolution.

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.