“If we ignore the process, there is no value to the result.”
— unknown
Neural computing is the most valuable and interesting field, right from its inception there are constant updates in implementing them and new approaches which would save much of the developer’s time while building them. There could be n number of updates but the crux of the neural networks remains the same, and that is the most important step to understand. Often people who start learning neural networks do this one mistake of finding shortcuts to implement the network ignoring the process that is happening behind the scenes.
Today, we would consider a simple “artificial neural network” and understand the process by breaking down it into steps. Let’s first understand what is a neural network?
Neural Network
A neural network involves a series of underlying algorithms, which will try to understand the internal patterns of the data provided. A neural network is designed by mimicking the human brain operations, the human brain also needs some time to process some new information. A similar implementation is made here, except that a neural network needs to be polished a lot until it can predict the new data correctly. Let’s consider a Multi-layer perceptron which is also called an Artificial Neural network, and understand the internal mathematics, and how a network understands the patterns in the data. Before that we need to understand how a neuron works in human brain.
Neuron
The above figure shows an actual Neurone in the human brain, the work of this neuron is to receive some data and transfer the processed data. To make it clear, imagine this as an antenna, which receives a signal and then decodes it and transmits it, the same process also happens inside a neuron. But how do they receive the data and transmit it, for this purpose each neuron has two elements “Dendrites” and “Axon”, Dendrites act as receivers and Axon as the transmitter, but why are they needed? For instance, Imagine a group of people working together doing construction work, each person is from a different country and they speak different languages, so how do they communicate? They need some common language that helps them to exchange their ideas, so this is what “Dendrites” and “Axon” do, each neuron works on a piece of data, and once it completes the job, it needs to communicate with other neurons and transfer the data, so these two elements will help in communicating between the neurons.
A similar model is followed while building a neural network, a single neuron is called a node, which receives inputs and processes them, and sends the output. The above figure shows a node with five inputs and one output, the inputs are independent variables, and these are for one particular instance, you can think of this as a row from the dataset (a single row represents data of a particular person, or an animal or a thing). For each input there is a weight added, these weights help the neural network to learn by allowing the neuron to know what data should be passed, and paused. We will discuss why weights are added to the inputs in detail with examples further in the article. Inside the neuron, all the inputs multiplied with their respective weights are added (sum(input * weight)), next this is sent into an activation function, (there are many types of activation function but will give an overview of what and how an activation function work.)
I think this would give a brief overview of what is a neuron and how it works, now let’s break down the process, and understand the reason behind each element that is used to build a neural network.
Node and weights
A node in a neural network layer is a single neuron, it can take multiple inputs and then generates output. For each input, a weight is added (initially this weight would be a random integer), the reason behind adding weight is simple, let me explain with an example.
Consider, we are dealing with house data, and we want to predict the prices of the house depending on the features related to the house, such as area, number of bedrooms, location, locality, quality, etc. When we provide these features as an input to a neuron, how does it know which feature is important, as a human we can interpret depending upon our requirements but how would an automated machine understand, here comes the weights, for each input, there is a weight, so it decides which feature is more important. If a feature is most important for example area of the house is most important so there would be a higher weight assigned to it, so that the neuron can understand, yes! this need to be processed, if some features such as a swimming pool or a garden are not a very essential part of a house, it depends on the customer, so there would be a minimum weight assigned to it. This is how weights help the neural network understand the features and predict the output.
Y = x1*w1 + x2*w2 + …. + xn * wn
Now, there would be a doubt, what if the weights are zero? Adding the weight doesn’t make any sense, then how could we resolve this?
So here comes another interesting attribute Bias. We add bias to the weight to make sure the input is not zero, the weight and bias can be positive or negative depending on what factor we want to adjust.
Y = (x1*w1 + b1) + (x2*w2 + b2) + … + (xn*wn + bn)
Here bias can be understood as a threshold, where the x*w value should surpass the value of b, and then according to that the weight (w) is adjusted.
This is what happens inside a node. Now, this output is passed into an activation function, but why? What is this activation function?
While solving any problem there should be a limit for the output, if there is no limit, then it would be hard to interpret the results. To solve this issue, we make sure the results fall between certain probabilities, especially for classification problems, this can be achieved through an Activation function. The output produced but a neuron is passed into an activation function, and depending upon what type of activation function we are using it produces an output.
Activation function
It is a function that takes in the weights sum of inputs from a previous layer, which will then produce an output value to feed into the next layer.
We have got an idea of what happens inside a neuron and why weights, bias, and an activation function are used. Now. Let’s consider a simple ANN and group all these together,
The above figure shows a simple Artificial neural network which is also called Multilayer Perceptron (MLP). It has more than one linear layer (a linear layer is a combination of neurons), the simplest form of MLP is three layer network, the first layer is the input layer, and the middle layer is the hidden layer (these layers will help to understand the important features for solving a problem) and the last layer is the output layer. It is a supervised learning network, which means we have the labels defined. There are typically three phases involved in process of training the network, forward pass, loss calculation, and backward pass.
During the forward pass, we provide the inputs to the input layer (which is the first layer) and multiply with weights and add bias (these weight and bias is an adjusting factor, If the input is zero then adding weight doesn’t make any sense so we add a bias.), and this adjusting factor will go into an activation function, this will happen at every layer. After this loss is been calculated, as the model provides some output, we calculate the loss using the predicted output and true values, and this loss is backpropagated into the model from the output layer using the Back-propagation method. In this phase, we update the weights by using gradients (we have SGD, Adam, etc). Once the training is completed, the model is ready to predict the outputs, and depending on the error(MAE, RMSE) we decide on further modifications (such as tuning the parameters) in the model.
Forward pass
This is the phase where the inputs are provided to the input layers, and the neurons process the output and send then into an activation function and then depending upon what type of activation function we use, the output is sent to the next layer, which is the hidden layer. Here also the same process happens and the output is sent to the output layer, where the loss is calculated. The loss is calculated by taking the difference between the actual and the predicted, and now this is the most important phase in training a neural network.
The Backpropagation phase
Often people struggle to understand what actually happens in this phase, the goal is to optimise the weights so that the neural network can learn how to correctly predict or classify the inputs, lets break down this.
Consider the above neural network, where weights and bias are assigned, here, l1 = 0.2, l2 = 0.1, w1 = 0.01, w2 = 0.03, w3 = 0.03, w4 = 0.04, w5 = 0.25, w6 = 0.15, w7 = 0.145, w8 = 0.3, b1 = 0.5 and b2 = 0.75
After calculating the loss for the first forward pass, the loss must be optimised, so let’s start from the forward pass,
Step1: Z1 = l1*w1 ==> 0.2* 0.01 ==> 0.002
Z2 = l2 * w2 ==> 0.1*0.03 ==> 0.003
Hidden layer1 (h1) = Z1 + Z2 + b1 ==> 0.002 + 0.003 + 0.5 ==> 0.505
Step2: Z3 = l1*w3 ==> 0.3* 0.03 ==> 0.009
Z4 = l2 * w4 ==> 0.25*0.04 ==> 0.01
Hidden layer2 (h2) = Z3 + Z4 + b1 ==> 0.003 + 0.01 + 0.5 ==> 0.513
Now before sending this output as inputs in hidden layer, they are passed into an activation function. Let’s consider our activation function is logistic function (there are many more, but for they example we are considering this function, what ever we use the purpose is same.)
f(x)h1 = 1/1+ e ^ — h1 ==> 1/1+ 0.6035056 ==> 0.623 → for h1
f(x)h2 = 1/1+ e ^ — h2 ==> 1/1+ 0.5986 ==> 0.625 → for h2
Now, the same process is carried out for output layer too,
Step3: Z5 = h1*w5 ==> 0.623* 0.25==> 0.156
Z6 = h2 * w6 ==> 0.625*0.15 ==> 0.1
Output layer layer1 (o1) = Z5 + Z6 + b2 ==> 0.156 + 0.1 + 0.75 ==> 1.006
Step7: Z7 = h1*w7 ==> 0.623* 0.145 ==> 0.09
Z8 = h2 * w8 ==> 0.625*0.3 ==> 0.188
Output layer2 (o2) = Z7 + Z8 + b2 ==> 0.09 + 0.188 + 0.75 ==> 1.03
Passing through activation function.
f(x)1 = 1/1+ e ^ — o1 ==> 1/1+ 0.37 ==> 0.73 → for o1
f(x)2 = 1/1+ e ^ — o2 ==> 1/1+ 0.36 ==> 0.74 → for o2
This completes one forward pass, now we need to calculate the loss.
Loss = 1/2(true — predicted)²
Loss1 = 1/2(0.3–1.006)² ==> 0.25
Loss2 = 1/2(0.2–1.03)² ==> 0.34
Total Loss = Loss1 + Loss2 ==> 0.25 + 0.34 ==> 0.59
After calculating the losses we need to back propagate to minimise the losses and once we reach the input layer, the weights are updated and the again we need to perform the forward pass, this will continue until the training is completed.
Back propagation
Our main goal is to reduce the loss, so we need to know what change in the wights will reduce the loss. Foe that we take the partial derivative of the total loss with respect to w5. The reason behind taking partial derivative w.r.t w5 is that it is the weight for last layer.
ðToral Loss / ð w5 = ð Total loss / ð o1 * ð f(x)1 / ð o1 * ð o1 / ð w5
ð Total Loss / ð o1 by solving this we get = 0.43
f(x)1 = 1/1+ e ^ — o1, ð f(x)1 / ð o1, by solving this we get 0.19
o1 = Z5 + Z6 + b2, ð o1 / ð w5, by solving this we get 0.623
ðToral Loss / ð w5 = 0.43 * 0.19 * 0.623 ==> 0.05
To reduce the error we need to update the weight (w5), for that we need to subtract the new wight from the old by some learning rate.[What is learning rate ?]
W5new = w5 — lr* ðToral Loss / ð w5 ==> 0.25–0.1 * 0.05 (l.r is 0.1) ==> 0.245
Similarly we update rest of the weights w6,w7, w8 and go back to hidden layer.
ðToral Loss / ð w1= ð Total loss / ð h1 * ð f(x)h1 / ð h1 * ð h1 / ð w1
Weight w1 is updated in the same way, and similarly w2,w3 and w4 are updated.
This is what happens in back propagation, for the first complete pass (forward + backward) will not reduce the loss, but repeating training for specific number of times will decrease the losses. Depending on the losses, we need to tune our parameters and train again. The whole process while writing code, would be simple, but there is a lot that happens behind the network, this article gave a glimpse of what exactly happens. There are many neural networks, the purpose of the network would differ and the intermediate stages would differ but the overall plot of how the neural network learns remains the same.
Conclusion
The idea behind this article is to actually get an brief idea of what is a NN and how it works and why we need to use it. Nobody can be a master in afield which is growing every day, so the best way to make our self comfortable with the subject is to be excited and accept the reality, instead of pretending to be a master. NN is very interesting at the same time confusing too, because each network has a purpose, and we need to make sure why we are using certain number of layers and neurons, so instead of learning just the code, if we try to understand what is happeing behind each line of code, we can easily build a netwrok according to the requirement.