Friday, December 21, 2018

Thus a series of Dense layers is simply a series of matrix multiplicationsBut

Next, what is “relu” and why do we need it?

Intuitively, we know that each Dense layer involves a matrix multiplication. Thus, a series of Dense layers is simply a series of matrix multiplications.

But, a series of matrix multiplications reduces to a single matrix multiplication, because a matrix multiplied by a matrix is simply another matrix.

E.g.

y = ABCx is the same as y = Wx, where W = ABC

Thus, a single matrix multiplication is equivalent to Logistic Regression (a linear model).

Why are linear models limited?

Recall the general picture of classification: we would like to found a boundary to separate the red dots and blue dots.

The problem is, what if the red dots and blue dots cannot be separated by a line?

Then a linear classifier cannot work.

In this case, we need our function to be nonlinear. We can accomplish this by inserting a nonlinear function in between each matrix multiply (in other words, at the end of each Dense layer).

We call such functions nonlinear activation functions.

In modern deep learning, the most common nonlinear activation function is the ReLU, which stands for “rectifier linear unit”. Don’t be scared by its name, it is just a very simple elbow curve.