Handwritten Image Augmentation. Image augmentation techniques such as… | by Mahendran Narayanan

Image augmentation techniques such as adjusting brightness, contrast, and rotation primarily target general images. However, specific techniques like Cutout and Cutmix are designed for images but are highly unlikely to be suitable for handwritten images. Implementing such augmentation methods in handwritten image processing could potentially alter the meaning of the characters to a significant extent.

For handwritten images spanning various languages, there is currently no dedicated augmentation technique available. We are introducing a universal handwritten image augmentation method that is language-agnostic. This groundbreaking technique can be applied to handwritten images in any language worldwide, marking it as the first of its kind. We believe that this approach will significantly enhance accuracy when processing handwritten images. Our proposed methods have undergone testing on datasets such as MNIST, English MNIST, and Japanese MNIST to validate their effectiveness.

There are four methods for handwritten images which are

ThickOCR
ThinOCR
Elongate OCR
Line Erase OCR

Handwritten augmentation is data specific. This affects pixel-wise which are effective for the character level datasets. This augmentation methods are introduced in such a way, even the dot over a character is not lost completely while using it. Created the method for all OCR based texts. Each method is explained down below.

The ThickOCR method amplifies the boldness of characters pixel by pixel while ensuring that it doesn’t alter their meaning. As depicted in the Figure, this method serves as a valuable augmentation technique, significantly enhancing performance. There are two types of ThickOCR: “complete” and “random.” In the complete method, additional pixels are added to all strokes in the character, whereas in the random method, pixels are added randomly to the strokes.

This method stands in contrast to the previous ThickOCR method. In ThinOCR, certain character pixels were removed, especially in the bolder portions of handwritten images, resulting in distinctive characteristics compared to the original handwritten image. The figure illustrates the impact of ThinOCR, and the results are presented in the table for reference.

This method duplicates a particular section of image and appear to be elongated. Expansion occurs exclusively within a single row or column, ensuring that the character remains contained within the pixel matrix. The ElongateOCR method comprises two types that operate along two distinct axes: horizontally (x-axis) and vertically (y-axis). In the x-axis expansion, the entire row is duplicated, while in the y-axis expansion, the entire column is duplicated. Figure illustrates the elongation along both the x-axis and y-axis.

The LineEraseOCR method shares a similarity with ElongateOCR in that it also focuses on a specific row or column within the resulting pixel matrix. However, in contrast to ElongateOCR, LineEraseOCR deletes the targeted row or column. It completely removes the selected row or column from the original image. It’s important to note that LineEraseOCR does not change the meaning or content of the image.

To evaluate the three augmentation methods, we conducted experiments on well-documented datasets, including MNIST, Kuzhiji MNIST, and EMNIST. Each experiment was carried out over 30 epochs, utilizing a learning rate of 0.0005 and a batch size of 50. The reported accuracy corresponds to the highest test accuracy achieved across all epochs, representing the best score obtained during the experiments.

The experimental results for the MNIST, English MNIST, and Kuzushiji MNIST datasets are presented in the following tables