Unlocking Autoencoders’ Potential: Transforming Dimensionality Reduction | by Mateusz Maj | Jun, 2024


Autoencoders, which have an amazing capacity to reduce dimensions, represent the state of the art in image processing technology. This publication explores the complexities of dimensionality reduction using autoencoders, highlighting how revolutionary they can be.

Principal Component Analysis (PCA)

One of the most well-known dimensionality reduction methods is principal component analysis (PCA), widely applied in data exploration, preprocessing, and visualization. Let’s consider an example involving 100×100 pixels MMA fight images, each comprising three layers, one for each RGB color scale. As a result, each image forms a dataset of 3x100x100 (= 30,000) elements on a scale of 0–255, which makes it challenging to process and analyze.

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca_result = pca.fit_transform(prepared_images)

The scikit-learn library provides a straightforward component that allows the application of PCA to transform images into two-dimensional data. It is important to note that these data must be “flattened,” meaning they can no longer remain as three-layer matrices. The plot below shows over 730 MMA fight images in six categories (according to the article Decoding the Dynamics of MMA Through Neural Networks: An Image Classification Journey: 0 — fighter, 1 — ground fight, 2 — kick, 3 — neutral position, 4 — punch, 5 — wrestling, clinch). The overlap of points from different categories indicates the complexity of the problem and the ambiguity of the images, which also translates into difficulties in classification.

Autoencoders

Autoencoders consist of two main components: an encoder that processes the data and a decoder that reconstructs it. The connecting element between the encoder and the decoder is the so-called latent dimension, functioning as a bottleneck in the network.

def build_encoder():
encoder_input = layers.Input(shape=(100, 100, 3), name='encoder_input')
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(encoder_input)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
encoder_output = layers.MaxPooling2D((2, 2), padding='same', name='encoder_output')(x)
encoder = models.Model(encoder_input, encoder_output, name='encoder')
return encoder

def build_decoder():
decoder_input = layers.Input(shape=(25, 25, 16), name='decoder_input')
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(decoder_input)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
decoder_output = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same', name='decoder_output')(x)
decoder = models.Model(decoder_input, decoder_output, name='decoder')
return decoder

encoder = build_encoder()
decoder = build_decoder()

autoencoder_input = encoder.input
autoencoder_output = decoder(encoder(autoencoder_input))
autoencoder = models.Model(autoencoder_input, autoencoder_output, name='autoencoder')

First, the autoencoder model must be trained using training and validation sets. Like with any neural network, the training process involves compilation along with the definition of a loss function and a training protocol (e.g., batch size, number of epochs, etc.). Since autoencoders work with unlabeled data (unsupervised learning), X and Y share the same dataset.

The trained autoencoder can be used for further analysis. Above, I present examples of test images and their corresponding representations obtained using the encoder. The encoder (as below) transformed the image data from 100x100x3 to 25x25x16 (10,000 elements). To obtain such a representation, prediction using the encoder model on a selected image must be performed. Although such a representation may initially seem abstract, it allows for further transformations, such as denoising and image colorization. Reconstruction from such a representation does not always lead to a perfect reproduction of the original, but it retains the primary features.

Dimensionality reduction in autoencoders is accomplished using convolutional layers and pooling. The convolutional layer analyzes the elements of the image, while pooling reduces the dimension of individual elements (MaxPooling selects the maximum value in the analyzed area, AveragePooling calculates the average value). For instance, in the described model (the schema below), an image with dimensions of 100×100 is first reduced to 50×50, and then to 25×25. Adding more elements to the architecture can lead to further dimensionality reduction. The goal is to obtain a two-dimensional representation. In this case, the above representation must be “flattened” and reduced to the desired number of dimensions.

Encoder

The encoder architecture resembles the architecture used in image classification. The difference is that we use image data instead of category data for y, and we specify the expected number of dimensions instead of the number of categories. The applied model yields results similar to PCA (although PCA is a linear transformation). As with PCA, the obtained representation does not allow for unambiguous separation of individual categories, which is a result of the complexity of the data. For clearer datasets, such as the MNIST digits, more distinct results are obtained. It is worth noting that for image classification in Decoding the Dynamics of MMA Through Neural Networks: An Image Classification Journey, I used very complex deep neural network architectures (e.g., VGG19 consists of five blocks with several convolutional layers), whereas the autoencoder architecture above was not as complicated.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*