Also in the documentation1:
>>> y = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> j = np.gradient(y)
>>> j
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
Gradient is defined as (change in
y)/(change inx).x, here, is the list index, so the difference between adjacent values is 1.At the boundaries, the first difference is calculated. This means that at each end of the array, the gradient given is simply, the difference between the end two values (divided by 1)
Away from the boundaries the gradient for a particular index is given by taking the difference between the the values either side and dividing by 2.
So, the gradient of y, above, is calculated thus:
j[0] = (y[1]-y[0])/1 = (2-1)/1 = 1
j[1] = (y[2]-y[0])/2 = (4-1)/2 = 1.5
j[2] = (y[3]-y[1])/2 = (7-2)/2 = 2.5
j[3] = (y[4]-y[2])/2 = (11-4)/2 = 3.5
j[4] = (y[5]-y[3])/2 = (16-7)/2 = 4.5
j[5] = (y[5]-y[4])/1 = (16-11)/1 = 5
You could find the minima of all the absolute values in the resulting array to find the turning points of a curve, for example.
1The array is actually called x in the example in the docs, I've changed it to y to avoid confusion.
Videos
Also in the documentation1:
>>> y = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> j = np.gradient(y)
>>> j
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
Gradient is defined as (change in
y)/(change inx).x, here, is the list index, so the difference between adjacent values is 1.At the boundaries, the first difference is calculated. This means that at each end of the array, the gradient given is simply, the difference between the end two values (divided by 1)
Away from the boundaries the gradient for a particular index is given by taking the difference between the the values either side and dividing by 2.
So, the gradient of y, above, is calculated thus:
j[0] = (y[1]-y[0])/1 = (2-1)/1 = 1
j[1] = (y[2]-y[0])/2 = (4-1)/2 = 1.5
j[2] = (y[3]-y[1])/2 = (7-2)/2 = 2.5
j[3] = (y[4]-y[2])/2 = (11-4)/2 = 3.5
j[4] = (y[5]-y[3])/2 = (16-7)/2 = 4.5
j[5] = (y[5]-y[4])/1 = (16-11)/1 = 5
You could find the minima of all the absolute values in the resulting array to find the turning points of a curve, for example.
1The array is actually called x in the example in the docs, I've changed it to y to avoid confusion.
Here is what is going on. The Taylor series expansion guides us on how to approximate the derivative, given the value at close points. The simplest comes from the first order Taylor series expansion for a C^2 function (two continuous derivatives)...
- f(x+h) = f(x) + f'(x)h+f''(xi)h^2/2.
One can solve for f'(x)...
- f'(x) = [f(x+h) - f(x)]/h + O(h).
Can we do better? Yes indeed. If we assume C^3, then the Taylor expansion is
- f(x+h) = f(x) + f'(x)h + f''(x)h^2/2 + f'''(xi) h^3/6, and
- f(x-h) = f(x) - f'(x)h + f''(x)h^2/2 - f'''(xi) h^3/6.
Subtracting these (both the h^0 and h^2 terms drop out!) and solve for f'(x):
- f'(x) = [f(x+h) - f(x-h)]/(2h) + O(h^2).
So, if we have a discretized function defined on equal distant partitions: x = x_0,x_0+h(=x_1),....,x_n=x_0+h*n, then numpy gradient will yield a "derivative" array using the first order estimate on the ends and the better estimates in the middle.
Example 1. If you don't specify any spacing, the interval is assumed to be 1. so if you call
f = np.array([5, 7, 4, 8])
what you are saying is that f(0) = 5, f(1) = 7, f(2) = 4, and f(3) = 8. Then
np.gradient(f)
will be: f'(0) = (7 - 5)/1 = 2, f'(1) = (4 - 5)/(2*1) = -0.5, f'(2) = (8 - 7)/(2*1) = 0.5, f'(3) = (8 - 4)/1 = 4.
Example 2. If you specify a single spacing, the spacing is uniform but not 1.
For example, if you call
np.gradient(f, 0.5)
this is saying that h = 0.5, not 1, i.e., the function is really f(0) = 5, f(0.5) = 7, f(1.0) = 4, f(1.5) = 8. The net effect is to replace h = 1 with h = 0.5 and all the results will be doubled.
Example 3. Suppose the discretized function f(x) is not defined on uniformly spaced intervals, for instance f(0) = 5, f(1) = 7, f(3) = 4, f(3.5) = 8, then there is a messier discretized differentiation function that the numpy gradient function uses and you will get the discretized derivatives by calling
np.gradient(f, np.array([0,1,3,3.5]))
Lastly, if your input is a 2d array, then you are thinking of a function f of x, y defined on a grid. The numpy gradient will output the arrays of "discretized" partial derivatives in x and y.
Hi, I'm trying to expand my knowledge in Machine Learning, I came across the np.gradient function, I wanted to understand how it relates to Taylor's Series for estimating values. The documentation seemed a bit confusing for novice.
The problem is, that numpy can't give you the derivatives directly and you have two options:
With NUMPY
What you essentially have to do, is to define a grid in three dimension and to evaluate the function on this grid. Afterwards you feed this table of function values to numpy.gradient to get an array with the numerical derivative for every dimension (variable).
Example from here:
from numpy import *
x,y,z = mgrid[-100:101:25., -100:101:25., -100:101:25.]
V = 2*x**2 + 3*y**2 - 4*z # just a random function for the potential
Ex,Ey,Ez = gradient(V)
Without NUMPY
You could also calculate the derivative yourself by using the centered difference quotient.

This is essentially, what numpy.gradient is doing for every point of your predefined grid.
Numpy and Scipy are for numerical calculations. Since you want to calculate the gradient of an analytical function, you have to use the Sympy package which supports symbolic mathematics. Differentiation is explained here (you can actually use it in the web console in the left bottom corner).
You can install Sympy under Ubuntu with
sudo apt-get install python-sympy
or under any Linux distribution with pip
sudo pip install sympy
I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:
- Calculate the hypothesis h = X * theta
- Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
- Calculate the gradient = X' * loss / m
- Update the parameters theta = theta - alpha * gradient
In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.
Let's have a look at my variation of your code:
import numpy as np
import random
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
def genData(numPoints, bias, variance):
x = np.zeros(shape=(numPoints, 2))
y = np.zeros(shape=numPoints)
# basically a straight line
for i in range(0, numPoints):
# bias feature
x[i][0] = 1
x[i][1] = i
# our target variable
y[i] = (i + bias) + random.uniform(0, 1) * variance
return x, y
# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.
You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.
The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.
If you run my example, the theta returned will look like this:
Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368 1.01108458]
Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.
Below you can find my implementation of gradient descent for linear regression problem.
At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.
- X: feature matrix
- y: target values
- w: weights/values
- N: size of training set
Here is the python code:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random
def generateSample(N, variance=100):
X = np.matrix(range(N)).T + 1
Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
return X, Y
def fitModel_gradient(x, y):
N = len(x)
w = np.zeros((x.shape[1], 1))
eta = 0.0001
maxIteration = 100000
for i in range(maxIteration):
error = x * w - y
gradient = x.T * error / N
w = w - eta * gradient
return w
def plotModel(x, y, w):
plt.plot(x[:,1], y, "x")
plt.plot(x[:,1], x * w, "r-")
plt.show()
def test(N, variance, modelFunction):
X, Y = generateSample(N, variance)
X = np.hstack([np.matrix(np.ones(len(X))).T, X])
w = modelFunction(X, Y)
plotModel(X, Y, w)
test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)
