The end from priyasharma2427

Building Neural Network from Scratch.

1 : We will create a neural network

In the given image, we are having:

Inputs : i1, i2
First hidden layer before activation : h1,h2
First hidden layer after activation : a_h1, a_h2
Output before activation function : o1, o2
Output after activation : a_o1, o_oh2
Weights : w1, w2, w3, w4, w5, w6, w7, w8
Error from first input : E1
Error from second input : E2
Error : E_total
Target : we have two targets, t1 = 0.01 and t2 = 0.99. target is just the output which we are getting here i.e a_o1, a_o2

Note: Do not have any biased Activation layer is not just in hidden layer, it is in all the layers.

Formulas :

h1 : w1*i1 + w2 *i2
h2 : w3i1 :w4i2
a_h1 : σ(h1)
a_h2 : σ(h2)
o1 : w5a_h1 + w6a_h2
o2 : w7a_h1 + w8a_h2
a_o1 : σ(o1)
a_o2 : σ(o2)
E1 = ½*(t1-a_o1)²
E2 = ½*(t2-a_o2)²
E_total = E1 + E2

2 : Calculate all values and put them in the table.

Forward Propagation

Values Given,

inputs : i1 = 0.05, i2 = 0.1
outputs : t1 = 0.01, t2 = 0.99
Weights : w1 = 0.15, w2 = 0.2, w3 = 0.25, w4 = 0.3, w5 = 0.4, w6 = 0.45, w7 = 0.5, w8 = 0.55

Values to be calculated,

We will be calculating these values using the above formulas in our excel sheet.
h1, h2, a_h1, a_h2, o1, o2 , a_o1, a_o2 , E1, E2, E_total

Till here, we were calulating forward propagation.Now we will be calculating backward propogation.

Backward propagation

To do backward propogation, we will start with ∂E_total/∂w5. In the above equation, we have removed E2 as it is not getting generated by w5. So we will not be using it. w5 is directly linked with E_total, but there are many steps in between. ∂(E1)/∂W5 has two things in between, i.e o1 and a_o1. We will be going through this route.

∂E_total/∂w5 = ∂(E1 +E2)/∂W5
∂E_total/∂w5 = ∂(E1)/∂W5

Chain rule,

∂(E1)/∂W5 = ∂(E1)/∂(a_o1) * ∂(a_o1)/∂(o1) * ∂(o1)/∂w5

Now we will calculate the values of the above output.

∂(E1)/∂(a_o1) = ∂(½*(t1-a_o1)²) / ∂(a_o1)
∂(E1)/∂(a_o1) = (t1 – a_o1) * (-1)
∂(E1)/∂(a_o1) = a_o1 - t1

∂(a_o1)/∂(o1) = ∂(σ(o1)) / ∂(o1)
∂(a_o1)/∂(o1) = σ(o1) * (1 - σ(o1))
∂(a_o1)/∂(o1) = a_o1 * (1 – a_o1)

∂(o1)/∂w5 = ∂(w5 * a_h1 + w6 * a_h2) / ∂w5
∂(o1)/∂w5 = a_h1

Now,the equation will be :

∂E_total / ∂w5 = (a_o1 – t1) * a_o1 * (1 – a_o1)) * a_h1

Similarly, we will find the equation for w6, w7, w8.

∂E_total / ∂w6 = (a_o1 – t1) * a_o1 * (1 – a_o1)) * a_h2
∂E_total / ∂w7 = (a_o2 – t2) * a_o2 * (1 – a_o2)) * a_h1
∂E_total / ∂w8 = (a_o2 – t2) * a_o2 * (1 – a_o2)) * a_h2

Now, we will calculate values for w1, w2, w3, w4.

Before looking at w1, we will look at a_h1 value because w1 taking two routes i.e from a_o1 and a_o2 For that we will first calculate ∂E_total/∂a_h1 then will go to w1.

∂E_total / ∂a_h1 = ∂(E1 + E2) / ∂(a_h1)

This time we are having E1 as well as E2, as we have two routes.

∂(E1) / ∂(a_h1) = ∂E1/∂a_o1 * ∂a_o1/∂o1 * ∂o1/∂a_h1

∂(E1) / ∂(a_h1) = (a_o1 -t1) * (a_o1) * (1-a_o1) *w5

∂(E2) / ∂(a_h1) = ∂E2/∂a_o2 * ∂a_o2/∂o2 * ∂o2/∂a_h1
∂(E2) / ∂(a_h1) = (a_o2 -t2) * (a_o2) * (1-a_o2) *w7

∂E_total / ∂a_h1 = ∂(E1 + E2) / ∂(a_h1)
∂E_total / ∂a_h1 = (a_o1 -t1) * (a_o1) * (1-a_o1) *w5 + (a_o2 -t2) * (a_o2) * (1-a_o2) *w7

Similary, we will calculate for ∂E_total / ∂a_h2

∂E_total / ∂a_h2 = (a_o2 -t2) * (a_o2) * (1-a_o2) *w8 + (a_o1 -t1) * (a_o1) * (1-a_o1) *w6

Now we will calculate ∂E_total/∂w1

∂E_total/∂w1 = E_total/a_o1 * a_o1/o1 * o1/a_h1 * a_h1/h1 * h1/w1
∂E_total/∂w1 = ∂E_total/∂a_h1 * ∂a_h1/∂h1 * ∂h1/∂w1
∂E_total/∂w1 = ∂E_total/∂a_h1 * a_h1 * (1-a_h1) * ∂h1/∂w1
∂E_total/∂w1 = ∂E_total/∂a_h1 *a_h1 * (1- a_h1)*i1

Similarly calculate for w2, w3 and w4,

∂E_total/∂w2 = ∂E_total/∂a_h1 *a_h1 * (1- a_h1)*i2
∂E_total/∂w3 = ∂E_total/∂a_h2 *a_h2 * (1- a_h2)*i1
∂E_total/∂w4 = ∂E_total/∂a_h2 *a_h2 * (1- a_h2)*i2

Now we are done with the calculation. Using these formulas we can find out the values of all the given variables. Once done with the table, we can see in the error column that the error is decreasing. Now we will check with the different learning rates.

We will see how the learning rate is affecting the converging rate. if we have small learning rate, it will slow down the speed of training model, and i give it too high, it can cause undesirable divergent behavior to your loss function. That is why we need to find the optimal learning rate.