Object Detection in KerasCV from the scratch: Part 1 — Creating TFRecord from JSON annotated images | by Slawomir Telega, PhD

As pointed out many times before, while the other two main subjects in AI Computer Vision (that is Classification and Segmentation) are widely described in the literature and/or Internet, tutorials devoted to the Object Detection (OD) are rather sparse in numbers. It is mainly due to the fact, I think, that OD is much heavier from conceptual and coding point of view, plus it practically (in contrary to the other advanced CV tasks) can not be done “at home”. What’s more, it simply requires a wide background in coding plus at least intermediate understanding of TensorFlow environment — what somehow seems strange while using Keras, thought as high level interface, saving you the need to know/understand tf’s inner workings.

My idea is to start from the scratch, and build our way up to the fully functioning OD. And from the scratch I mean we have some photos which we want to annotate. I’ll try to provide explanations all along the way and hope we’ll learn something useful. I won’t go into the details of TensorFlow— my idea is that the code works, everything is shown — one may use it, as it is, or in case one gets curious/wants to understand everything, has to learn it from other sources (say TensorFlow docs). That said, let’s get to work.

This tutorial wouldn’t be possible if not for a great work of Dimitre Oliveira (see reference [2]) — I highly recommend reading it.

So, we have some images, and we want to annotate them with bounding boxes for the sake of performing an OD later on. The most convenient way, in my opinion, is using VGG VIA online tool, kindly shared with the community by a Visual Geometry Group at the University of Oxford [1]. It is really simple to use — you read in the images, draw rectangles around the chosen features and add classes (even via a dropdown list). The important thing is to export annotations as a JSON file — that way we can have several objects on an image at once. Saving in other available formats (COCO and CSV) enforce redundancy — e.g. if you have a satellite photo with, say, 60 objects on it, saving in COCO format will result in 60 lines, separately for each bounding box, while if you use JSON, you’ll get one element with a ragged tensor containing all the bboxes in the image. Neat, huh? Especially if you take into account that in OD, in real life scenarios, we should have at least 1000–1500 annotations per class, and putting the same image into a TFRecord file more than once may cause some storage problems (imagine once again the 60 bboxes satelite hi-res photo packed 60 times instead of just once. And then imagine, that we usually will have like thousands of such photos…).

One more word of advice, if one, for some reason, decides to go with csv/COCO format, be sure to define categories in VGG VIA as a dropdown list — otherwise it won’t work (bug in the code, I presume)

Next step consists of reading the relevant data from JSON and images,so as we have input for the TFRecord.

#imports
import json
import pprint
import tensorflow as tf
import keras_cv # define directories and filenames
img_dir = "images/"
tfr_dir = "tfrecords/"
tfr_filename = "images.tfrec"
json_filename = "via_project.json"

We have to define some helper functions. The parsing of JSON is prepared for the VGG VIA output JSON — if you plan on using another format of annotations, please do adjust. The functions *_feature/*_feature_list are based on the tutorial “Creating TFRecords” by Dimitre Oliveira — I highly suugest looking it up, as it is a great article.

def read_json(json_filename):
# function reads json object from file
with open(json_filename) as f:
json_obj = json.load(f)
return json_objdef get_img_names(json_obj):
# function reads image names from the json - they are the top level in 
# json file hierarchy
img_names = []
for img_name in list(json_obj):
img_names.append(img_name)
return img_names
# Next 3 functions are used to define how to encode features in TFRecord.
def image_feature(value):
return tf.train.Feature(
bytes_list=tf.train.BytesList(value=[tf.io.encode_jpeg(value).numpy()]))
def bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.encode()]))
def float_feature_list(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))

As mentioned before, I won’t be going too deep into TF’s inner working — enough to know, that “example” is a single description of an image (i.e. image + bounding_boxes, classes, path and whatever you think you may need), that gets converted into the byte form and later on put sequentialy into TFRecord file — element after element. You may have noticed, that bounding box gets split into four elements (x, y, width, height), that later form respective lists. It is due to the fact, that TFRecord is sequential, therefore it is not designed to store multidimentional structures (such as n by m array/list). It is much easier to generate lists for x, y, width and height and but bounding boxes back together while reconstructing from TFRecord file. It will be shown in details in the second part of the tutorial. On also has to take into account that each of these lists size may vary from image to image — depends how many objects are in an image. We’ll deal with this also in the second part. For now do not worry how. The important thing is we have everything in one file, what makes it extremaly easy transfering — e.g. to Google Colab (one file vs. thousands), plus everything is neat and tidy, as you’ve only got ONE file.

# function creates a single example 
def create_example(image, path, category, x, y, w, h):
# define features, to be written into the example
feature = {
"image": image_feature(image),
"path": bytes_feature(path),
"class": float_feature_list(category),
"xmin": float_feature_list(x),
"ymin": float_feature_list(y),
"width": float_feature_list(w),
"height": float_feature_list(h)}
# return an example
return tf.train.Example(features=tf.train.Features(feature=feature))
# function writes tfrecord file
def write_tfrecord(json_fn, jpg_dir, tfr_dir, tfr_filename):
# get images' filenames from annotations json
json_obj = read_json(json_filename)
img_names = get_img_names(json_obj)
# start TFRecordWriter:
with tf.io.TFRecordWriter(tfr_dir+tfr_filename) as writer:
# loop over all the images
for img in img_names:
# read and decode jpeg
jpeg_path = jpg_dir+json_obj[img]["filename"]
image = tf.io.decode_jpeg(tf.io.read_file(jpeg_path))
# define lists for bbox and class params
img_class = []
xmin = []
ymin = []
width = []
height = []
# loop over "regions" for a given image in json
for region in json_obj[img]["regions"]:
# now add values to respective lists
img_class.append(float(region["region_attributes"]["class"]))
xmin.append(float(region['shape_attributes']["x"]))
ymin.append(float(region["shape_attributes"]["y"]))
width.append(float(region["shape_attributes"]['width']))
height.append(float(region["shape_attributes"]["height"]))
# create example using the above data
example = create_example(image, jpeg_path, img_class, xmin, ymin, width, height)
# write it to the tfrecord file
writer.write(example.SerializeToString())

Running the code is easy. It’s a one-liner, you only have to provide relevant dirs and filenames. As an output you get a *.tfrec file with your images and annotations.

write_tfrecord(json_filename, img_dir, tfr_dir, tfr_filename)

This is the end of part 1 of the tutorial. In the next installment we’ll try and read the tfrec file. Next we’ll check if everything works after the double transition and if it wil be so, we’ll try to convert it into KerasCV-friendly form, to be ready for part 3, where we’ll train the network using YOLO v8 Object Detection algorithm.

Abhishek Dutta and Andrew Zisserman. 2019. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), October 21–25, 2019, Nice, France. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3343031.3350535.
Dimitre Oliveira, Creating TFRecords, https://keras.io/examples/keras_recipes/creating_tfrecords/

Source link

Object Detection in KerasCV from the scratch: Part 1 — Creating TFRecord from JSON annotated images | by Slawomir Telega, PhD | Feb, 2024

Be the first to comment

Leave a Reply Cancel reply