We will define a utility function to analyze the predictions and the outputs. Now each forward pass will give a different value as we are sampling from model parameters to get a prediction result. It can be interpreted as an ensemble classifier.
First we create empty arrays in order to store the outputs. I chose forward_passes
as 10, you can increase it to any reasonable number, such as 100 or 200. forward_passes
defines the number of samples which will be obtained from the model (Line 8).
Model outputs a distribution object, in order to get probabilities we use mean()
method (Line 10). Same thing applies for the extracted_std
.
There will be 10 different predictions for each class. Before plotting them we obtain 95% prediction interval for each class independently (Line 31 — Line 33). If we used 100 as the forward_passes
, then there would be 100 different predictions.
Plotting process is straightforward, I will not go into details here.
Let’s test drive this function:
analyse_model_prediction(example_images[284], example_labels[284])
This will give the output:
Label 8 has the highest std in this prediction with the value 0.157
Before going into details, consider the best case scenario. If the model outputs 1.0 for the given image at every forward pass, calculating 95% prediction interval would also give 1.0
In that case the bars would be shorter, indicating that uncertainty is low in that prediction. So the key take-away is that, taller those bars are, higher the uncertainty!
In this prediction we see that model assigned probabilities to 8. However, ranging probabilities has a clear difference between them. In a forward pass the output was something like 0.5, and in another the output was 1.0. So we conclude that model is not so sure about that prediction.
analyse_model_prediction(example_images[50], example_labels[50])
The output looks like this:
Label 0 has the highest std in this prediction with the value 0.001
Voila! Model assigned very high probabilities in each forward pass. We say that model is sure about this prediction.
This model actually knows what it does not know!
Before jumping into the conclusion, let’s add random noise vector to see how it effects the predictions.
noise_vector = np.random.uniform(size = (28, 28, 1),
low = 0, high = 0.5)# Make sure that values are in (0, 1).
noisy_image = np.clip(example_images[50] + noise_vector, 0, 1)analyse_model_prediction(noisy_image, example_labels[50])
Probabilities are still high for class 0, but it might be worthy to check the result. Imagine you are using a Bayesian CNN, and if you get an output like this, what would you do or how would you interpret this?
My interpretation would be, model thinks it belongs to class 0 but there is some uncertainty, so that I might take a look at it.
What happens if we increase the noise in the image, distort it more?
noise_vector = np.random.uniform(size = (28, 28, 1),
low = 0, high = 0.5)noisy_image = np.clip(example_images[50] + noise_vector*2, 0, 1)
analyse_model_prediction(noisy_image, example_labels[50])
If we were using a normal CNN model, the output might be still high for a certain class. That’s why normal neural networks are known as too confident in their predictions.
But Bayesian CNN model says that it can not classify this image properly. It is OK when it does not know, it is better than assigning a wrong label. We can derive this conclusion by looking at tall bars, there are 3 very tall bars and 3 mid tall bars. That should be enough to see uncertainty is high.
Last sample, noise vector:
analyse_model_prediction(np.random.uniform(size = (28, 28, 1),
low = 0, high = 1))
This will yield the output:
Std Array: [0.27027504 0.22355586 0.19433676 0.08276099 0.1712302 0.14369398
0.31018993 0.13080781 0.47434729 0.18379491]
Results are expected, there is no single prediction with low uncertainty. This model says it can not classify this vector properly.
In this article we:
- Looked into Convolution2DReparameterization layer.
- Saw how to approximate KL-Divergence if it can not be computed analytically with TensorFlow-Probability.
- Created a fully probabilistic Bayesian CNN.
You can get the codes and the notebook from here.
In the next part, we will customize the model by using custom prior and posterior functions while using a real dataset.
Be the first to comment