Neuroscience Meets Technology: What Happens When Computers Mimic the Brain?

Arafat Ashrafi Talha
3 min readFeb 12, 2024

--

Understanding Neural Networks for a Smarter Tomorrow and Learn How Neural Networks Can Supercharge Your Skills.

By author.

One important aspect of Deep Learning is the need to compute the derivatives of complex functions efficiently.

In neural network frameworks like PyTorch, this is handled by storing the operations and variables of functions on a computational graph. Nodes represent the variables, and the edges represent operations. An incoming edge means a new variable has been created by applying an operation on other variables.

Creating the graph and computing the values at the different nodes is called the forward pass.

For each operation we apply, we know the derivative rules for the different variables involved, so we can easily compute the derivatives of the output variables as a function of the input variables.

For a specific node, we only need to compute the derivatives of the outgoing variable for the incoming variables.

By applying the derivative chain rule, we can compute the derivatives of the output variables with respect to input variables for the whole graph. At every point of the process, we only compute the basic derivatives for a triplet of variables, but the chain rule allows us to compute derivatives for extremely complex functions.

Back-propagating the derivatives at the inputs of the graph is referred to as the backward pass. We can have complex computational blocks and still easily backpropagate the derivatives.

The goal of a neural network is to generate an estimate of the target we are trying to predict.

We use a loss function to compare the target to its estimate. The optimization problem is about minimizing the loss function. To minimize the loss function, we take the gradient of the loss function with respect to the network parameters and find the zeros of the resulting function.

But solving this equation can be very hard, so instead we use optimization techniques like the gradient descent algorithm.

By author.

We update the parameters following the gradient in the direction where the loss function decreases. To apply the gradient descent algorithm, we use the computational graph. We first compute the graph in the forward pass.

We can back-propagate the gradients of the loss function for each computational block and node in the graph by using the chain rule in the backward pass.

Now that we have the gradient of the loss function for all the parameters in the graph, we can apply one step of the gradient descent algorithm.

--

--

Arafat Ashrafi Talha
Arafat Ashrafi Talha

Written by Arafat Ashrafi Talha

M.Sc. in Advanced Networking & Cyber Security | CEHv12 | Cybersecurity Enthusiast | Connect on LinkedIn: linkedin.com/in/arafat-ashrafi-talha

Responses (1)