Lets get active with “Activation Functions”

Jitesh Rawat
4 min readApr 6, 2021

If you are pretty much into datascience field especially deep learning or even if you are a beginner in Neural Networks, you are surely hitting these type of lines in your code (if you are using Tensorflow or keras) for an artificial neural network.

Here we see an argument calling activation = “sigmoid” or “relu”. Which are activation functions. So what are these functions and how do these activation functions affect our NN model ? Lets see…

Want to read this story later? Save it in Journal.

Activation Functions.

Activation functions are very important in building a non linear model for a given problem. In this article we will cover different activation functions that are used while building a neural network. We will discuss these functions with their pros and cons. But before that, lets wait and look at why activation functions are needed in the first place.

The activation function in a neural network helps to reduce the output from neurons into a range. So that you can make a decision for your classification. Basically, the activation function decides that your neuron is firing or not firing. Now, this is about how the activation function is helpful in output layer. What about hidden layer ?

In a complex neural network with many hidden layers. The neurons consist of two portions a weighted sum of input and the activation function. Now lets assume that we remove activation functions from hidden layers and output layers. How do you think the neural network will behave?

If you try it you will realise that you will get a mere Linear Equation where the output is just a weighted sum of input features. Sad, for that purpose you dont even need hidden layers 😂. Now we all know that every problem under the sun cannot be solved using linear equation and thats why we need non linear equations and thus we need activation functions.

Activation functions are very important in building a non linear model for a given problem

So now that we know the purpose of activation functions lets take a look at some of them

1 . Step function

Step functions are used for binary type of classifications and thats why they are not very popular. So as in above image example we can see that it has a boundary which decides the value of x. Now, sometimes step function may lead to misclassify due to outliers in the data.

2. Sigmoid or Logit function

Sigmoid funciton as you can see in image gives a smooth curve between zero and one . Which allows it to function well in multi-class classification problems. So it gives a value between 0 and 1 and we can consider the maximum value out of it.

3. Tanh function

Tanh function is similar to sigmoid but instead of giving a range between 0 to 1 it gives an output between -1 and 1. Do not worry about that mathematical equation all that is doing is taking an input and converting it into a range of -1 and 1.

4. ReLU function

ReLU is an extremely simple function. No wonder why Mr. Andrew Ng smiles while talking about relu in neural networks course on coursera.It takes in a value and if its negative it returns 0 and if its postitive it simply returns that value back. Now ReLU is computationaly very effective and light weight function and general guidline is if you are not sure which function to use go with ReLU!

5. Leaky ReLU function

Now this is another flavour of ReLU called Leaky ReLU where the function tries to pull the negative input values close to 0. And therefore equation is max(0.1x,x).

You can also try these functions and observe how they work using library “math” in python.

Conclusion:

The general guidline is to mostly use sigmoid in output layer and try to use tanh in for hidden layers. The issues with these functions would be vanishing gradients an important topic and one should read about it. ReLU is considered to be very efficient for hidden layers. Sometimes there is not a clear answer and you have to try yourself. Because Neural Networks and Machine learning are all about trials and errors 😅.

Reference:

For trying these functions in python notebook you can refer to below link.

📝 Save this story in Journal.

--

--