
Not having enough training data is one of the biggest problems in deep learning today.
A promising solution for computer vision tasks is the automatic generation of synthetic images with annotations.
In this article, I will first give an overview of some image generation techniques for synthetic image data.
Then, we generate a training dataset with zero manual annotations required and use it to train a Faster R-CNN object detection model.
Finally, we test our trained model on real images.
In theory, synthetic images are perfect. You can generate an almost infinite number of images with zero manual annotation effort.
Training datasets with real images and manual annotations can contain a significant amount of human labeling errors, and they are often imbalanced datasets with biases (for example, images of cars are most likely taken from the side/front and on a road).
However, synthetic images suffer from a problem called the sim-to-real domain gap.
The sim-to-real domain gap arises from the fact that we are using synthetic training images, but we want to use our model on real-world images during deployment.
There are several different image generation techniques that attempt to reduce the domain gap.
Cut-And-Paste
One of the simplest ways to create synthetic training images is the cut-and-paste approach.
As shown below, this technique requires some real images from which the objects to be recognized are cut out. These objects can then be pasted onto random background images to generate a large number of new training images.
While Georgakis et al. [2] argue that the position of these objects should be realistic for better results (for example, an object…
Be the first to comment