Shared Models and Custom Losses in Tensorflow 2 / Keras | by Charles Guan


Developing an advanced neural network model in Tensorflow 2 and Keras

Photo by Charles Guan

In this tutorial, I show how to share neural network layer weights and define custom loss functions. The example code assumes beginner knowledge of Tensorflow 2 and the Keras API.

For a recent project, I wanted to use Tensorflow 2 / Keras to re-implement DeepKoopman, an autoencoder-based neural network architecture described in “Deep learning for universal linear embeddings of nonlinear dynamics”. My end goal was to create a user-friendly version that I could eventually extend

DeepKoopman embeds time series x onto data into a low-dimensional coordinate system y in which the dynamics are linear.

DeepKoopman neural network architecture. Source: Lusch, Kutz, and Brunton (Nature Communications 2018) / (CC-BY 4.0)

The DeepKoopman schematic shows that there are three main components:

  1. The encoder φ, which maps the input to the latent code
  2. The decoder φ-inverse, which reconstructs the input from the latent code
  3. The linear dynamics K, which describe how the latent code evolves over time

To start building the model, we can define the three sub-models as follows:

We can connect the sub-models and then plot the overall architecture using Keras plot_model.

Keras-generated schematic of the main DeepKoopman components.

At this point, we are set up to train the autoencoder component, but we haven’t taken into account the time series nature. We still need to be able to input and compute over a second input, x1.

In basic use-cases, neural networks have a single input node and a single output node (although the corresponding tensors may be multi-dimensional). The original DeepKopman shows the encoder and decoder converting different inputs to different outputs, namely x samples from different times.

Layer sharing turns out to be quite simple in Keras. We can share layers by calling the same encoder and decoder models on a new Input.

To recap, in the DeepKoopman example, we want to use the same encoder φ, decoder, and linear dynamics K for each time-point. To share models, we first define the encoder, decoder, and linear dynamics models. Then, we can use the models to connect different inputs and outputs as if they were independent.

This approach of sharing layers can be helpful in other situations, too. For example, if we wanted to create neural networks with tied weights, we could call the same layer on two inputs.

So far, we have defined the connections of our neural network architecture. But we haven’t yet defined the loss function, so Tensorflow has no way to optimize the weights.

The DeepKoopman loss function is composed of :

  1. reconstruction accuracy: x0 vs x0_reconstructed
  2. future state prediction: x1 vs x1_pred
  3. linearity of dynamics: y1 vs y1_pred

Each loss is the mean squared error between two values. In a typical neural network setup, we would pass in ground-truth targets to compare against our model predictions. For example, many Tensorflow/Keras examples use something like:

Typical Keras Model setup passing the loss function through model.compile() and target outputs through model.fit().

With DeepKoopman, we know the target values for losses (1) and (2), but y1 and y1_pred do not have ground truth values, so we cannot use the same approach to calculate loss (3). Instead, Keras offers a second interface to add custom losses, model.add_loss().

model.add_loss() takes a tensor as input, which means that you can create arbitrarily complex computations using Keras and Tensorflow, then simply add the result as a loss.

Adding the three components of the DeepKoopman loss function.

If you want to add arbitrary metrics, you can also use a similar API through model.add_metric():

The last step is to compile and fit the model:

Note: unfortunately, the model.add_loss() approach is not compatible with applying loss functions to outputs through model.compile(loss=...) . The best solution for losses that include model outputs and internal tensors may be to define a custom training loop.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*