Get started with Tensorflow lite/micro by Sony Spresense

Introduction

This project describes how to model a neural network using Tensorflow 2.8.0 (Keras), generated the trained model, and tried to run it on Sony Spresense. Tensorflow has become much easier to understand since Keras 2.0, but porting it to microcontrollers has been a challenge. Many people may have been frustrated by the incomprehensibility of the Tensorflow lite/micro code.

This time, I introduce Spresense’s Arduino customized board package I made that is very easy to program by Arduino IDE, and explain how to program it.

This project separates into two parts. One is the explanation of how to make the trained model using Tensorflow on Python. Another is the explanation how to run the trained model on Sony Spresense by Arduino IDE.

Let’s started with Tensorflow code on Python to get a trained model.

Setup the development environment for Tensorflow

Tensorflow can be installed with pip. The target version is 2.8.0. Please specify the version and install it.

pip install tensorflow==2.8.0

When modeling with Tensorflow, it is useful to use “Jupyter notebook”. You can find many ways to use it by searching.

Snapshot of jupyter notebook

Explanation of the python code for getting the TFLite model

You can get the trained TFLite model by running “tf_mnist_traing.py”. This chapter describes the python code and how it works.

Import libraries

Import Tensorflow. If you set the version to “2.8.0”, you will get a lot of WARNINGs, so I suppress them with set_verbosity(0) and setLevel(logging.ERROR).

## TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras## Helper libraries
import numpy as np
import binascii
import logging
## Check the version of tensorflow (should be 2.8.0)
print(tf.__version__)
## To silent verbose
tf.autograph.set_verbosity(0)
logging.getLogger("tensorflow").setLevel(logging.ERROR)

Download MNIST data and normalization

MNIST is used for the dataset. The output was changed to 10 outputs since CategoricalCrossEntropy was used. For example, if the label is [‘3’], it needs to change the format of the output to [‘0’, ‘0’, ‘0’, ‘1’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’].

## MNIST download
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# 60,000 training data and 10,000 test data of 28x28 pixel images
print("train_images shape", train_images.shape)
print("train_labels shape", train_labels.shape)
print("test_images shape", test_images.shape)
print("test_labels shape", test_labels.shape)## Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images/255.0;
test_images = test_images/255.0;
train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)
print('Datasets are normalized')

Define MODEL and run training

The model was defined as follows It is a general convolutional neural network. Here, we have even performed the training. You will get 98% accuracy with this code.

## Model definition
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=6, kernel_size=(5, 5), 
    padding='same', activation=tf.nn.relu, name="conv2d_6"), 
  keras.layers.MaxPooling2D(pool_size=(2, 2), padding='same'),
  keras.layers.Flatten(),
  keras.layers.Dense(32, activation=tf.nn.relu, name="dense_32"),
  keras.layers.Dense(10),
  keras.layers.Activation(tf.nn.softmax)
])model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
## Output the summary of the model
model.summary()
## Training the model
model.fit(x=train_images, y=train_labels, 
  batch_size=128, epochs=30, verbose=1, validation_split=0.1)
## Evaluate the model using all images in the test dataset.
test_loss, test_acc = model.evaluate(x=test_images, y=test_labels, verbose=1)
print('Accuracy = %f' % test_acc)

Convert trained models to Tensorflow Lite format

Convert the trained model to Tensorflow Lite format, check the size, and save it to disk. This time, the output model is quantized. Here is a reference for the size comparison of the Non-quantized model and the quantized model based on the same model.

Non-quantized model : 152kBs

Quantized model: 41kBs

The quantization model is very compact and is an essential technique for microcontrollers.

## Convert the Keras model to a quantized TFLite model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
def representative_dataset_gen():
   for i in range(100):
      input_image = tf.cast(test_images[i], tf.float32)
      input_image = tf.reshape(input_image, [1,28,28])
      yield ([input_image])converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
## Show the quantized model size in KBs.
tflite_model_size = len(tflite_model) / 1024
print('Quantized model size = %dKBs.' % tflite_model_size)
# Save the model to disk
open('qmodel.tflite', "wb").write(tflite_model)

Output trained model to C-style header

Output the trained model in Tensorflow Lite format to a C-style header file. This header file should be in the same place as the Arduino sketch described as follows.

## Output the quantized tflite model to a c-style header
def convert_to_c_array(bytes) -> str:
  hexstr = binascii.hexlify(bytes).decode("UTF-8")
  hexstr = hexstr.upper()
  array = ["0x" + hexstr[i:i + 2] for i in range(0, len(hexstr), 2)]
  array = [array[i:i+10] for i in range(0, len(array), 10)]
  return ",\n  ".join([", ".join(e) for e in array])tflite_binary = open('model.tflite', 'rb').read()
ascii_bytes = convert_to_c_array(tflite_binary)
header_file = "const unsigned char model_tflite[] = {\n  " + ascii_bytes + "\n};\nunsigned int model_tflite_len = " + str(len(tflite_binary)) + ";"
with open("model.h", "w") as f:
    f.write(header_file)

Setup the environment for Arduino Sketch

This chapter describes how to incorporate the trained model into Sony Spresense and how to test it.

Install Spresense Arduino Board Package for Tensorflow

To use Tensorflow with Sony Spresense, you need to install the dedicated Arduino Package I made. Please install from the following site.

Spresense Arduino Board Package for Tensorflow

If you are interested in making custom Arduino Package for Spresense. The following document would be for your reference.

How to create Spresense Arduino customized package

You can find the test image (28×28 pixels) like this.

Test images for Tensorflow lite/micro on Sony Spresense

Install Bitmap Image Library

This sketch uses an image library for handling test images. You can download the BMP library for SPRESENSE from the following site, unzip it, and put it in the Arduino/libraries folder.

BmpImage_ArduinoLib

Flash test images to Spresense flash ROM

You can get the test images of “0” to “9” from the following site.

Test Bitmap Images

Download and flash them to Spresense flash ROM by the following commands.

$ xmodem_writer -c COM3 0009.bmp

xmodem_writer is prepared on Spresense SDK tools. You can get it from here.

xmodem_writer for Windows

xmodem_writer for Linux

xmodem_writer for macOS

Explanation of the sketch to run the trained model on Spresense

The sketch for Spresense can be found in “Spresense_tf_mnist.ino”. I will not describe the whole program code but try to explain the points.

“qmodel.h” output by Python code of “tf_mnist_training.py” should be placed in the same folder as the sketch.

kTensorArenaSize, which allocates memory for Tensorflow, should be set to an appropriate value by each models. You’d be better to set it larger at first. The amount of memory used can be checked with interpreter->arena_used_bytes().

#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"#include "qmodel.h"  /* quantized model */
#define TEST_FILE "0003.bmp"
tflite::ErrorReporter* error_reporter = nullptr;
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
int inference_count = 0;
/* you have to adjust kTensorArenaSize by each models */
constexpr int kTensorArenaSize = 30000;
uint8_t tensor_arena[kTensorArenaSize];
#include <Flash.h>
#include <BmpImage.h>
BmpImage bmp;

To run the trained model, you need to build the TensorFlow interpreter. When “interpreter->AllocateTensors()” returns ERROR due to a lack of the buffer size of tensor_arena, please adjust the size of kTesorArenaSize larger.

You can get input buffer and output buffer from the interpreter. This time, a test image data will be set to input buffer. And output buffer will be set as inference results.

/* Build an interpreter to run the model with. */
static tflite::MicroInterpreter static_interpreter(
    model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
interpreter = &static_interpreter;/* Allocate memory from the tensor_arena for the model's tensors. */
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
  Serial.println("AllocateTensors() failed");
  return;
} else {
  Serial.println("AllocateTensor() Success");
}
size_t used_size = interpreter->arena_used_bytes();
Serial.println("Area used bytes: " + String(used_size));
input = interpreter->input(0);
output = interpreter->output(0);

The sketch reads a test image from Spresense’s flash ROM, and normalizes the data to set the input buffer of the interpreter.

/* read test data */
File myFile = Flash.open(TEST_FILE);
if (!myFile) { Serial.println(TEST_FILE " not found"); return; }/* formatted to Bitmap image */
bmp.begin(myFile);
/* check the format of the bitmap image (should be GRAY8) */
BmpImage::BMP_IMAGE_PIX_FMT fmt = bmp.getPixFormat();
if (fmt != BmpImage::BMP_IMAGE_GRAY8) {
  Serial.println("support format error");
  return;
}
/* the width and height of the test image should be 28x28 pixels */
int width = bmp.getWidth();
int height = bmp.getHeight();
Serial.println("width:  " + String(width));
Serial.println("height: " + String(height));
uint8_t* img = bmp.getImgBuff();
/* normzlized the data */
for (int i = 0; i < width*height; ++i) {
  input->data.f[i] = (float)(img[i]/255.0);
}

Now that everything is prepared. Let’s run the trained model and get the inference results.

Serial.println("Do inference");
TfLiteStatus invoke_status = interpreter->Invoke();
if (invoke_status != kTfLiteOk) {
  Serial.println("Invoke failed");
  return;
}
  
/* show the result */
for (int n = 0; n < 10; ++n) {
  float value = output->data.f[n];
  Serial.println("[" + String(n) + "] " + String(value)); 
}

The result of the sketch

If there’s no problem, you will get the message on the serial monitor. In this case, I used a test image of ‘0002.bmp’.

The result of tensorflow lite/micro on Sony Spresense

Source link

Get started with Tensorflow lite/micro by Sony Spresense

Be the first to comment

Leave a Reply Cancel reply