Activity Classification with TensorFlow | by Benjamin Griffiths


A note to the reader, this article as a high-level narrative of what happened mixed with a splash of tech. For the nerds like me, the code is available in a Github respository.

Notion is my preferred tool when it comes to managing projects. It’s easy to customise pages to suit any application and has tonnes of built-in features for project management.

I used it throughout the project lifecycle to keep me on track and organised. Take a look at my project structure and the daily logs: Notion page

Given that I had 6 weeks to work on the project, I decided to break it down into the following stages:

  1. Getting started: Planning and exploring the problem.
  2. Creating the data processing software.
  3. Get a basic model working with TensorFlow.
  4. Create the best model possible using research papers.
  5. Deploy the model so that it can be used by others.
  6. Evaluate the work, clean up the code and write this article.

So, we’ve discussed planning and organising the project. Let’s start writing some code.

Like all good machine learning projects, it starts and ends with data.

Luckily I already had data collected from some willing friends who were happy for me to play around with it and present it for this article.

The data was collected from a 3-axis accelerometer worn on the shank and an activity monitor worn on the thigh. My plan was to use the thigh worn device to give me accurate posture measurements throughout the data collection period and try to predict the postures using the shank acceleration.

I needed to separate the data into 15-second windows where each window had 3 axes of acceleration data and a corresponding true posture code. This could have been 1 of 4 postures; sitting, standing, stepping or lying.

To do this, I wanted to create an object-oriented system.

I’m a self-taught programmer. I learned to code so that I could process data for research projects and my passion for it grew from there.

So I’ve had to learn the hard way.

The drawbacks of using different programming methods, the common mistakes you get taught to avoid when learning through formal education.

But although I’ve dabbled in OOP before, I’ve never made a system that I thought truly needed OPP. However, this was a great opportunity. I could picture the different ‘objects’ that would be responsible for the stages of the processing, their methods and their properties.

I created an object that represented the device itself, a ‘posture stack’ that would be responsible for the data labels (if they existed), a dataset that would be responsible for separating out the raw data and a model that would be able to make predictions and assess the accuracy.

These objects could easily be extended, for example, to create different length windows or to use different models to make predictions from the data.

Now all I needed to worry about was loading in the data.

Unfortunately, the files were exported by the device to CSV and the problem with 5 days of 3-axis acceleration data, sampled every 20th of a second, is that you’re left with some large CSVs.

I tried a few different ways of importing the data, but my only option was to separate the data into chunks using pandas. This was slow…. very slow.

If anyone has a faster way of importing this data then let me know.

There’s also logic in there to deal with a strange issue I was observing in the CSV headers.

Anyway, after just over 12-hours of processing I was ready to start playing with TensorFlow.

I had saved my data as 2 numpy arrays; 1 contained the raw acceleration data with the shape…

(number of epochs, 295, 3)

Where 295 represented the raw acceleration data (15 seconds * 20 samples and 5 samples were dropped to ensure all the epochs were the same shape) and 3 represented the 3 acceleration data channels. The other array contained the corresponding posture codes that could be 1 of 4 values (0–3).

To get things underway, I set up a Google Colab notebook and imported my data to start experimenting.

First, I needed to make sure my data was in the correct format. Luckily, after looking over a few tutorials, I knew that I needed to convert my data into tensors. As far as I understand it, tensors are x dimension numpy arrays that make it easy for CPUs & GPUs to interface with for deep learning, which is why I made sure I saved my data as a numpy array for easy conversion.

I also needed to make sure my posture codes were one-hot encoded, my acceleration data was normalised and I removed any posture codes and corresponding data I didn’t want in my model. If you’re interested in the details of the additional processing or want to run your own experiment, check out the collab notebook.

Now it was time to experiment with different models.

I tried replicating a model I found within a TensorFlow tutorial and quickly learnt that ensuring your data has the correct shape and is fed into the input layer correctly is important. Not only will the program throw an error when it comes to training, but it could also impact the accuracy of the model.

INPUT_SHAPE = X_train.shape[1:]INPUT_SHAPE_RESHAPED = X_train_reshaped.shape[1:]OUTPUT_SHAPE = len(unique_classes_train)input shape: (295, 3) 
input shape reshaped: (1, 295, 3)
output shape: 4

Here I’ve created 2 different input datasets with different shapes and you’ll see why in a moment.

I managed to get my first basic model working.

dnn_model = tf.keras.Sequential([
tf.keras.Input(shape=INPUT_SHAPE_RESHAPED),
tf.keras.layers.Dense(100, activation=’relu’),
tf.keras.layers.Dense(100, activation=’relu’),
tf.keras.layers.Dense(100, activation=’relu’),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(OUTPUT_SHAPE, activation=’softmax’)],
name=’DNN-Model’)

I wanted to compare all my models so I created a function that would train and save the model using the same parameters.

callback = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)EPOCHS = 50model_to_train.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
history = model_to_train.fit(X_train, y_train, epochs=EPOCHS, validation_split=0.2, batch_size=32, callbacks=[callback], verbose=1)

I used 50 epochs and an early stopping callback to prevent overfitting. Categorical cross-entropy is the standard loss function for multi-class classification and the adam optimizer is a typical optimisation function.

The model worked reasonably well and achieved accuracy on a training set of 87% accuracy (f1-score).

Not bad for a first try. Let’s take a look at the confusion matrix

Image by author: Confusion matrix for first deep neural network

We can see that the model works well on lying a stepping, as we would expect because the signals are very different. However, it struggles more at differentiating between sitting and standing because the postures are static and the devices are orientated the same way.

What can we do to improve on this model? Well, I had a dig around for some research papers on human activity recognition and found that convolutional neural networks (CNN) are used frequently.

CNNs are often associated with image classification and perform convolution and pooling stages on pixel values to make it easier to identify specific features in images. It seems that this can be translated to multi-axis sensor data and makes better models than DNNs alone.

After having a look through some online tutorials I managed to get one up and running.

cnn_model = tf.keras.Sequential([tf.keras.layers.Conv2D(32, kernel_size = (1, 12), strides = (1, 1), input_shape=INPUT_SHAPE_RESHAPED, padding=”valid”),
tf.keras.layers.MaxPooling2D((1,4), (1, 2)),
tf.keras.layers.Conv2D(64, kernel_size = (1, 4), strides = (1, 1), input_shape = (1, 295, 3), padding=”valid”), tf.keras.layers.MaxPooling2D((1,4), (1, 2)), tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=’relu’),
tf.keras.layers.Dense(OUTPUT_SHAPE, activation=’softmax’)],
name=’CNN-Model’)

This is why I created a re-shaped version of my training data. The convolutional laters require a 4-dimensional tensor, so I needed to arrange my data slightly differently.

The rest of the parameters were taken from previous examples of CNN models and I’m still not sure how you come to a decision on these, outside of experimenting.

If anyone has information on this then I’d love to hear about it.

Anyway, this model was slightly worse than my previous model with 84% accuracy.

Pretty disappointing given I thought this would make my model better, I can only assume I’ve not optimised the parameters or my data doesn’t need this type of model to be optimised.

However, there was 1 last thing I wanted to try.

On my hunt for papers, I can across some information on recurrent neural networks (RNN). These make use of temporal events to improve future predictions. They’re often used in natural language processing (NLP) where each previous word can be used to improve predictions for the next word, as they are closely related.

This made perfect sense for my model, as the previous activity is closely related to the next activity. If you’re sitting, you can’t start walking until you stand up and vice versa.

I set to work building another model. Luckily, TensorFlow made it simple to create this one as they have a layer specifically for long-short term memory (LSTM) RNNs.

lstm_model = tf.keras.Sequential([
tf.keras.layers.LSTM(128, input_shape=INPUT_SHAPE),
tf.keras.layers.Dense(100, activation='relu'),
tf.keras.layers.Dense(OUTPUT_SHAPE, activation='softmax')],
name='LSTM-Model')

This model gave improved accuracy on the 3 previous models with 88%. Success! 🥳

I’d managed to create 3 models, using 3 different types of deep learning.

But I was so close to the elusive 90% accuracy.

It was time to turn to the pros and see if they could help me improve my models.

I needed to find out how others had applied these different deep learning models to my application.

I had already looked over some previous research papers related to HAR, but these didn’t give the level of detail I needed to replicate their models with TensorFlow.

After trawling the internet for code examples and tutorials, I came across this article on Machine Learning Mastery.

Jason goes into a great amount of detail on the open-access dataset he used to train his activity classification model, as well as the different algorithms available, some information on how they work, and code examples.

This was perfect!

They were very similar to the algorithms I had already used but extended to combine features from each.

I got to work writing the code to try these models on my data. Check out the new notebook below.

I decided I would stick with the same training hyperparameters so that I could compare these to my previous models.

The new models included another LSTM model with different parameters, a combined CNN-LSTM model and a Convolutional LSTM model.

There was some additional preprocessing needed to get my data into the correct shape to perform the convolutions, which may have been why my previous CNN models were unsuccessful at improving on the best accuracy.

mlm_lstm_model = tf.keras.Sequential([
tf.keras.layers.LSTM(100, input_shape=INPUT_SHAPE),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(100, activation=’relu’),
tf.keras.layers.Dense(OUTPUT_SHAPE, activation=’softmax’)],
name=’MLM-LSTM-Model’)
mlm_cnn_lstm_model = tf.keras.Sequential([
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv1D(filters=64,kernel_size=3, activation=’relu’),input_shape=INPUT_SHAPE_RESHAPED_1),
tf.keras.layers.TimeDistributed(tf.keras.layers.Conv1D(filters=64, kernel_size=3, activation=’relu’)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dropout(0.5)),
tf.keras.layers.TimeDistributed(tf.keras.layers.MaxPooling2D(pool_size=2)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten()),
tf.keras.layers.LSTM(100),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(100, activation=’relu’),
tf.keras.layers.Dense(OUTPUT_SHAPE, activation=’softmax’)],
name=’MLM-CNN-LSTM-Model’)
mlm_convlstm_model = tf.keras.Sequential([
tf.keras.layers.ConvLSTM2D(filters=64, kernel_size=(1,3), activation=’relu’, input_shape=INPUT_SHAPE_RESHAPED_2),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(100, activation=’relu’),
tf.keras.layers.Dense(OUTPUT_SHAPE, activation=’softmax’)],
name=’MLM-ConvLSTM-Model’)

The LSTM model had the same performance as my own version with 88% accuracy… No improvements there.

However, both the combined CNN and the ConvLSTM made further improvements of 90% and 91% respectively.

Source: Giphy. Machine learning master Frank The Tank

It worked! I managed to creep over the 90% accuracy mark. Looking at the confusion matrix we can see big improvements in each classes prediction accuracy.

Image by author: Final confusion matrix

I think there are still improvements I could make to this model. I could have explored further preprocessing of my acceleration data, or I could have tried turning my hyperparameters more.

But I decided to call it a day and start thinking about how I can make this model accessible to others.

When I decided to start this project I knew that I couldn’t let my model die in a Colab notebook.

I’ve seen too many youtube videos of machine learning engineers talking about how important it is to create end-to-end projects to finish here.

So I needed to think about how I could make this model accessible to others.

At the start of the project, when I was creating my OOP software, I thought about my potential users of this model and realised that they were likely to only include myself and other researchers.

The tools and data are far too specialist to worry about making this accessible to everyone.

So as much as I wanted to try out serving my model on a Streamlit app, I decided to incorporate my best model within my OOP software. This way, any researchers wanting to use the software for making predictions on unlabeled shank data would be able to download it from GitHub and run an example script.

As it goes, this was fairly straightforward, and I could simply import TensorFlow into my model class, select the model I wanted to make predictions with a show it the new acceleration data.

However, the output of these predictions was pretty boring. I wanted to create something to visualise days of posture predictions in 1 simple plot.

A colleague of mine had recently completed a similar project looking at upper limb prosthesis usage and had explored using spiral plots for visualising acceleration data.

This made perfect sense, each cycle of spiral could represent a single day and I could use different colours to represent the postures.

I turned to trusty StackOverflow and found a really useful post on examples of different spiral plot implementations using Matplotlib.

I added this to a new plotting class, that could be inherited by other classes for a range of plotting needs.

This worked a treat 🍰

Now I could view the output of my posture classifier and analyse the different postures over multiple days.

Image by author: Postures: blue = lying down, green = sitting, yellow = standing, red = stepping

I had done it! I achieved what I set out to at the start of this project.

I had a system for creating deep learning models and making better posture classification predictions than my previous research.

That was an interesting 6 weeks!

The 42-day project idea is a great outline, enabling you to dedicate enough time to complete a small project without having to invest months of time on a single idea. This is extremely useful for me, as I have loads of things I want to try and learn within machine learning.

Previously I’ve tried learning through tutorials and would focus on individual topics such as databases or libraries like pandas.

Project-based work is a much better way of learning all these things in an applied manner and helps you remember why they are useful and how they’re implemented.

Also, at the end of the project, you’re left with an original codebase and potentially a product that you’ve created. This shows off your skills much better than completing a tutorial where the code you produce is the same as everyone else that’s followed the tutorial.

I will definitely be using the 42-day project plan on my next idea and I might even try incorporating it into other areas of my life such as fitness training or learning new skills.

I’ve learned a huge amount during the past 6 weeks including; project management, building OOP software, processing data, preparing data for deep learning, building TensorFlow models, assessing models, tuning models and finally implementing deep learning within a project.

I’ve also learned a lot about documenting and writing about my work and I’m planning to continue doing this for all my side projects.

I hope this has been an interesting read and it would be great to get your feedback on the work.

It’s time to move onto the next project!

Following the 42-day project guide, I’m going to take a week to cool off, clean up some bits of the code and start thinking about what I want to do for the next project.

I already have a few ideas.

I want to explore computer vision and how to use machine learning to interpret image and video data.

Really, I just want to buy one of these…

If anyone has any interesting ideas then let me know.

Cheers 🤙



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*