Gradient Descent in 3D
Introduction
Previously, we talked about how to think about gradient descent when moving along a 3D cost curve.
We know that moving along the 3D cost curve above means changing the
Objectives
You will be able to:
- Define a partial derivative
- Interpret visual representations of gradient descent in more than two dimensions
Review gradient descent in two dimensions
In this lesson, we'll learn about gradient descent in three dimensions, but let's first remember how it worked in two dimensions when we changed just one variable of our regression line.
In two dimensions, when changing just one variable,
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(225)
def regression_formula(x):
return 5 + 56*x
x = np.random.rand(30,1).reshape(30)
y_randterm = np.random.normal(0,3,30)
y = 3+ 50* x + y_randterm
plt.plot(x, y, '.b')
plt.plot(x, regression_formula(x), '-')
plt.xlabel("x", fontsize=14)
plt.ylabel("y", fontsize=14);
As we adjust to different slopes, we achieve different errors, represented by RSS.
So that is how gradient descent is represented in two dimensions. How is gradient descent represented in three dimensions?
Gradient Descent in 3 dimensions
In three dimensions, we once again choose an initial regression line, which means that we are choosing a point on the graph below. Then we begin taking steps towards the minimum. But of course, we are now able to walk not just forwards and backwards but left and right as well -- as we now can alter two variables.
To get a sense of how this works, imagine our initial regression line places us at the back-left corner of the graph above, with a slope of 50, and y-intercept of negative 20. Now imagine that we cannot see the rest of the graph - yet we still want to approach the minimum. How do we do this?
Once again, we feel out the slope of the graph with our feet. Only this time, as we shift our feet, we are preparing to walk in two-dimensional space.
So this is our approach. We shift horizontally a little bit to determine the change in output in the right-left direction, and then we shift forward and back to determine the change in output in that direction. From there we take the next step in the direction of the steepest descent.
So this is why our technique of gradient descent is so powerful. Once we consider that in moving towards our best fit lines, we have a choice of moving anywhere in a two-dimensional space, then using the slope to guide us only becomes more important.
So how does this approach of shifting back and forth translate mathematically? It means we determine the slope in one dimension, then the other. Then, we move where that slope is steepest downwards. This moves us towards our minimum.
Partial Derivatives
To measure the slope in each dimension, one after the other, we'll take the derivative with respect to one variable, and then take the derivative with respect to another variable. Now let's be very explicit about what it means to take the partial derivative with respect to a variable.
Let's again talk about this procedure in general, and then we'll apply it to the cost curve. So let's revisit our multivariable function:
Remember that the function looks like the following:
To take a derivative with respect to
And to express the change in output with respect to
Visualizing the partial derivative
So what does a derivative
Well, remember how we think of a standard derivative of a one variable function, for example
So in two dimensions, to take the derivative at a given point, we simply calculate the slope of the function at that x value.
Now the partial derivative of a multivariable function is fairly similar. But here it's equal to the slope of the tangent line at a specific
$\frac{df}{dx}$
Graphs for
Let's take a close look. The top-left graph shows
So with taking the partial derivative
As you can see,
One more example
This can be a little mind-bending so let's go through this again for
Now for
First, let's understand our plots below -- they may be surprising. Starting at the top left quadrant the graph of the function
So now, to think about taking the derivative, once again we move to a slice of graph for a value of
$\frac{df}{dy}$
Graphs for
So that is our technique for a partial derivative. For $\frac{df}{dy} $ we move to a slice of the curve at a specific value of
For
$\frac{df}{dx}$
Graphs for
Our rule for partial derivatives
Ok, so now that you understand the slide, slide, nudge, maybe you can understand this little shortcut that we can pull. For any multivariable function, the variables that you are not taking the derivative with respect to can just be treated as a constant.
For example, with our function of
So that's all it means to take a partial derivative of something: look at what you are taking a derivative with respect to, and only take the derivative of those types of variables. And guess what, this result lines up to what we saw earlier.
We calculated that
Now let's try our rule one more time, this time
$$\frac{df}{dx}f(x,y) = y*\frac{df}{dx}(x^2) = 2yx$$
So this time with
Summary
In this section, we have learned how to think about taking the partial derivative of a function. For the partial derivative, we say we are taking the derivative with respect to a variable. So for example, we can say for the function