Activation Functions & Non-Linearity: Neural Networks 101 | by Egor Howell | Oct, 2023

Explaining why neural networks can learn (nearly) anything and everything

Photo by Google DeepMind:

In my previous article, we introduced the multi-layer perceptron (MLP), which is just a set of stacked interconnected perceptrons. I highly recommend you check my previous post if you are unfamiliar with the perceptron and MLP as will discuss it quite a bit in this article:

An example MLP with two hidden layers is shown below:

A basic two-hidden multi-layer perceptron. Diagram by author.

However, the problem with the MLP is that it can only fit a linear classifier. This is because the individual perceptrons have a step function as their activation function, which is linear:

The Perceptron, which is the simplest neural network. Diagram by author.

So despite stacking our perceptrons may look like a modern-day neural network, it is still a linear classifier and not that much different from regular linear regression!

Another problem is that it is not fully differentiable over the whole domain range.

So, what do we do about it?

Non-Linear Activation Functions!

What is Linearity?

Let’s quickly state what linearity means to build some context. Mathematically, a function is considered linear if it satisfies the following condition:

There is also another condition:

But, we will work with the previously equation for this demonstration.

Take this very simple case:

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.