Implementation Differences in LSTM Layers: TensorFlow vs PyTorch | by Madhushan Buwaneswaran

At the time of writing Tensorflow version was 2.4.1

In TF, we can use tf.keras.layers.LSTM and create an LSTM layer. When initializing an LSTM layer, the only required parameter is units. The parameter units corresponds to the number of output features of that layer. That is units = nₕ in our terminology. nₓ will be inferred from the output of the previous layer. Hence the library can initialize all the weight and bias terms in the LSTM layer.

TF LSTM layer expects a 3 dimensional tensor as input during forward propagation. This input should be of the shape (batch, timesteps, input_features). This is shown in the code snippet below. Suppose we are using this LSTM layer to train a language model. Our input will be sentences. The first dimension corresponds to how many sentences we use as one batch to train the model. The second dimension corresponds to how many words are present in one such sentence. In practical setting, the number of words in each sentence varies from sentence to sentence. So, in order to batch these sentences, we can select the length of the longest sentence in the training corpus as this dimension and pad the other sentences with trailing zeros. The last dimension corresponds to the number of features used to represent each word. For simplicity, if we say, we are using one-hot encoding and there are 10000 words in our vocabulary, then this dimension will be 10000.

Figure 3 — Default output of Tensorflow LSTM layer (diagram by the author)

Source link

Implementation Differences in LSTM Layers: TensorFlow vs PyTorch | by Madhushan Buwaneswaran

Be the first to comment

Leave a Reply Cancel reply