python gradient of function

stackoverflow.com › questions › 16078818 › calculating-gradient-with-numpy

The problem is, that numpy can't give you the derivatives directly and you have two options:

With NUMPY

What you essentially have to do, is to define a grid in three dimension and to evaluate the function on this grid. Afterwards you feed this table of function values to numpy.gradient to get an array with the numerical derivative for every dimension (variable).

Example from here:

from numpy import *

x,y,z = mgrid[-100:101:25., -100:101:25., -100:101:25.]

V = 2*x**2 + 3*y**2 - 4*z # just a random function for the potential

Ex,Ey,Ez = gradient(V)

Without NUMPY

You could also calculate the derivative yourself by using the centered difference quotient.

This is essentially, what numpy.gradient is doing for every point of your predefined grid.

Answer from Stefan on Stack Overflow

NumPy

numpy.org › doc › stable › reference › generated › numpy.gradient.html

numpy.gradient — NumPy v2.4 Manual

Gradient is calculated only along the given axis or axes The default (axis = None) is to calculate the gradient for all the axes of the input array.

Stack Overflow

stackoverflow.com › questions › 16078818 › calculating-gradient-with-numpy

python - Calculating gradient with NumPy - Stack Overflow

Top answer

1 of 4

27

The problem is, that numpy can't give you the derivatives directly and you have two options:

With NUMPY

What you essentially have to do, is to define a grid in three dimension and to evaluate the function on this grid. Afterwards you feed this table of function values to numpy.gradient to get an array with the numerical derivative for every dimension (variable).

Example from here:

from numpy import *

x,y,z = mgrid[-100:101:25., -100:101:25., -100:101:25.]

V = 2*x**2 + 3*y**2 - 4*z # just a random function for the potential

Ex,Ey,Ez = gradient(V)

Without NUMPY

You could also calculate the derivative yourself by using the centered difference quotient.

This is essentially, what numpy.gradient is doing for every point of your predefined grid.

2 of 4

14

Numpy and Scipy are for numerical calculations. Since you want to calculate the gradient of an analytical function, you have to use the Sympy package which supports symbolic mathematics. Differentiation is explained here (you can actually use it in the web console in the left bottom corner).

You can install Sympy under Ubuntu with

sudo apt-get install python-sympy

or under any Linux distribution with pip

sudo pip install sympy

Discussions

Gradient of a function in Python - Data Science Stack Exchange

How can I obtain the gradient of this function for only some of the elements (par [0:2]) in a specific point? More on datascience.stackexchange.com

datascience.stackexchange.com

May 8, 2020

python - What does numpy.gradient do? - Stack Overflow

So I know what the gradient of a (mathematical) function is, so I feel like I should know what numpy.gradient does. But I don't. The documentation is not really helpful either: Return the gradien... More on stackoverflow.com

stackoverflow.com

Need help in understanding np.gradient for calculating derivatives

One definition of the derivative is f'(x) = (f(x+h)-f(x))/h where h goes to 0. Computers cannot store infinitely small numbers, so they might set h=1e-6 (that is 0.000001). It's a tradeoff because while we want h to be as small as possible, at some point the errors due to computer precision begin to dominate. Given any function that the computer can calculate, it can approximate the derivative. def f(x): return np.sin(x) x = np.arange(-2,2,0.01) y = f(x) dfdx = (f(x+h)-f(x))/h plt.plot(x,y) plt.plot(x,dfdx) plt.show() Assuming that the function is reasonably smooth (i.e. the derivative above exists), another definition of the derivative is f'(x) = (f(x+h)-f(x-h))/(2h) where h goes to 0. Going from x-h to x+h means 2 steps, that's the reason for 2h. Which works just as well. These methods are named finite difference to contrast from the normal derivative definition where h is infinitely small. The first one is the forward difference and the second one is called central difference. The backward difference is (f(x)-f(x-h))/2. Let's assume we want to write a derivative function. It takes a function f and values of x, and gives back f'(x). def f(x): return np.sin(x) def d(fun, x): return (fun(x+h)-fun(x))/h x = np.arange(-2,2,0.01) y = f(x) dfdx = d(f,x) plt.plot(x,y) plt.plot(x,dfdx) plt.show() By passing the function into the function, the derivative function can just call fun wherever it wants/needs to get the derivative. Now things become a bit more inconvenient. For some reason we do not know f. We only know y, i.e. f(x) for some values of x. Let's say that x is evenly spaced as usual. Then our best guess for h is not really tiny but identical to the spacing between neighboring x values. With the forward difference we need to take care at the rightmost value because we cannot just add +h to get a value even further out. Instead we use the backward difference. For values in the middle we decide to use the central difference instead of the forward difference. def f(x): return np.sin(x) def d(y, h=1): dfdx = [(y[1]-y[0])/h] for i in range(1,len(y)-1): dfdx.append((y[i+1]-y[i-1])/2/h) dfdx.append((y[i]-y[i-1])/h) return dfdx h = 0.01 x = np.arange(-2,2,h) y = f(x) dfdx = d(y,h) plt.plot(x,y) plt.plot(x,dfdx) plt.show() The implementation above corresponds to np.gradient in the one-dimensional case where varargs is set to case 1 or 2. The case where varargs is set to 3 or 4 would use x directly in d instead of h. However at that point the formula is more complicated as they mention in the documentation. Effectively any point has a hd (the forward step size) and a hs (the backward step size) and the formula is not just (f(x+hd)-f(x-hs))/(hd+hs) but instead that bigger expression given in the documentation, where the values of hd,hs act as some kind of weights. np.gradient is basically backwards, central and forward difference combined. When you have values like f(1),f(2),f(2+h) and want the derivative at 2, the code notices that 2 and 2+h are very close together and puts greater weight on that (and mostly ignores f(1)). The important part so far is that np.gradient when given a vector with N elements calculates N one-dimensional derivatives, which is not the typical idea of a gradient. np.gradient does support more dimensions which might make things clearer. So in the 1D case, we essentially go through all values from left to right and then consider that value and its direct left and right neighbor to quantify the uptrend or downtrend. In the 2D case, np.gradient still does this, but additionally also walks from top to bottom and does the same. So in 2D it returns 2 arrays, one for left-right and one for top-bottom. The actual definition of the gradient by finite differences is [(f(x+h,y)-f(x,y))/h, (f(x,y+h)-f(x,y))/h] in 2D. These values are indeed returned by np.gradient, the left part is in the first array and the right part in the second array. Say we are in 2D and want the gradient at x=3 and y=0, then we can plug it into np.gradient like this: hx = 1e-6 hy = 1e-3 x = [3,3+hx] y = [0,0+hy] xx,yy = np.meshgrid(x,y) def f(x,y): return x**2-2*x*np.sin(y) + 1/x grad = np.gradient(f(xx,yy), y,x) # Note the order. print(grad[1][0,0], grad[0][0,0]) # Note the order. This is dfdx, dfdy. but if the function f can be calculated by a computer, it makes more sense to just use automatic differentiation instead of finite differences. Automatic differentiation has no h that needs to be chosen carefully. It's always as accurate is possible. import torch x = torch.tensor([3.],requires_grad=True) y = torch.tensor([0.],requires_grad=True) z = x**2-2*x*torch.sin(y) + 1/x z.backward() print(x.grad, y.grad) So what's the deal with the Taylor series? It's just a minor piece in the derivation of that more general expression used by np.gradient. We just start by claiming that we can express the gradient by adding together function values in the direct neighborhood. f'(x) = a f(x) + b f(x+hd) + c f(x-hs) Given that finite differences do work out, this approach should work as well and generalize the idea. Expand f(x+hd) and f(x-hs) with their series: f(x+hd) = f(x) + hd f'(x) + hd^2 f''(x)/2 + ... f(x-hs) = f(x) - hs f'(x) + hs^2 f''(x)/2 + ... Then plug it in and reshape: f'(x) = a f(x) + b f(x) + b hd f'(x) + b hd^2 f''(x)/2 + c f(x) - c hs f'(x) + c hs^2 f''(x)/2 = (a+b+c) f(x) + (b hd - c hs) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) 0 = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) The = in the middle is actually more of an approximately equal sign. We won't be able to reach 0 for all f(x) as claimed on the left hand size, but we can get pretty close. We do NOT want to minimize the right-hand-side. We want it to reach 0 (it can go below 0 right now). To turn this into a minimization problem, we square it. This way we get a positive number always and it really becomes a matter of minimization. We COULD also take the absolute value instead of squaring, but it's pain to work this through and the end result are exactly the same parameters anyway. To minimize: E2 with E = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) One requirement for an optimum is that the gradient is 0. In this case we take the derivatives with respect to a,b,c because we want to find the optimal a,b,c. First a reminder of the chain rule: dE2 /dt = 2E dE/dt for whatever t is. It's optional to do this but a bit less messy than working it through individually. In particular we have dE^2/da = 2E dE/da = 2E f(x) dE^2/db = 2E dE/db = 2E (f(x) + hd f'(x) + hd^2 f''(x)/2) dE^2/dc = 2E dE/dc = 2E (f(x) - hs f'(x) + hs^2 f''(x)/2) We want ALL three of them to be 0 at the same time. This can only happen if E is 0. 0 := (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) and we want this to be 0 for any f, f', f'' for any value of x. The only way for this to happen is if each coefficient is 0, i.e. a+b+c = 0 b hd - c hs = 1 b hd^2 + c hs^2 = 0 We would need to check the second derivative to make sure that this is a minimum, not a maximum, but given the problem it is fairly clear. So why did we stop exactly after f'' in the Taylor series? It's because this way we get exactly 3 unknowns and 3 equations, which is the most convenient to solve. Multiply the second equation by hd then subtract the third from it. (b hd^2 - c hs hd) - (b hd^2 + c hs^2) = hd -c hs^2 - c hs hd = hd c hs (hs + hd) = -hd c = -hd/hs/(hs+hd) = -hd^2 / (hs hd (hs+hd)) where the last step is just so it looks exactly like in np.gradient. Insert c into the second equation. b hd + hd/hs/(hs+hd) hs = 1 b hd + hd/(hs+hd) = 1 b + 1/(hs+hd) = 1/hd b = 1/hd - 1/(hs+hd) b = (hs(hs+hd) - hs hd) / [hs hd (hs+hd)] b = hs^2 / [hs hd (hs+hd)] From the first equation we know that a = -b-c = (hd2 - hs2 )/(hs hd (hs+hd)). So here's your summary: If you have a function that can be calculated by a computer, use torch or tensorflow or any other framework for automatic differentiation. If you have a function that can be calculated by a computer but such a framework is not available, np.gradient is still a bad idea because it is inefficient. Note for the 2D gradient we needed three values, f(x,y), f(x+dx,y), f(x,y+dy). But with np.gradient we would first need to set up arrays where it is almost natural to also include f(x+dx,y+dy) which is not needed for gradient calculations. It's more natural to set up some loop that increments x once, then y once, then z once, and so on. Many solvers in scipy.optimize work with finite differences. If you have a function that cannot be calculated by a computer, np.gradient may be useful. In practice this means that you have data from some experiment. Even there, the concept of a Taylor series plays no role here UNLESS the data was taken on an unevenly spaced grid. More on reddit.com

r/learnpython

4

2

June 30, 2023

numpy - Gradient calculation with python - Stack Overflow

But i got 2 arrays with 3 colums ... the sum of the two would give me the vector i were searchin for but the z component doesn't vanish. I hope i've been sufficiently clear in my explanation. I would like to know how numpy.gradient works and if it's the right choice for my problem. Otherwise i would like to know if there's any other python function i can ... More on stackoverflow.com

stackoverflow.com

Videos

youtube.com

Python Gradient Descent Algorithm - Gradient Function

08:50

YouTube

Numerical derivatives in python using numpy.gradient() function: ...

Gradient Descent from Scratch in Python - YouTube

Machine Learning Tutorial Python - 4: Gradient Descent and Cost ...

July 22, 2018

28:44

YouTube

Gradient Descent From Scratch in Python - Visual Explanation - YouTube

April 18, 2023

View all

GeeksforGeeks

geeksforgeeks.org › how-to-find-gradient-of-a-function-using-python

How to find Gradient of a Function using Python? | GeeksforGeeks

July 28, 2020 - The gradient of a function simply means the rate of change of a function. We will use numdifftools to find Gradient of a function.

Educative

educative.io › answers › how-to-use-the-numpygradient-function-for-a-2d-array-in-python

How to use the numpy.gradient function for a 2D array in Python

We can use the numpy.gradient() function to find the gradient of an N-dimensional array. For gradient approximation, the function uses either first or second-order accurate one-sided differences at the boundaries and second-order accurate central ...

NumPy

numpy.org › doc › 2.1 › reference › generated › numpy.gradient.html

numpy.gradient — NumPy v2.1 Manual

Gradient is calculated only along the given axis or axes The default (axis = None) is to calculate the gradient for all the axes of the input array.

Scaler

scaler.com › home › topics › what is the numpy.gradient() method in numpy?

What is the numpy.gradient() method in Numpy? - Scaler Topics

May 4, 2023 - The gradient is calculated using the numpy gradient() function by utilizing either the first or second-order correct one-sides (in either direction) differences at the boundaries and second-order accurate central differences in the interior ...

Medium

medium.com › @ilmunabid › how-to-find-a-gradient-slope-of-a-function-in-python-774f865467d2

What is Gradient/Slope? and How to Calculate One in Python (SymPy) | Medium

June 5, 2022 - derivative is used to find the gradient of a curve or to measure steepness. it is also called the rate of change. so we take the change in y and divide that with the change in x

Find elsewhere

Google Bing Mojeek

Finxter

blog.finxter.com › home › learn python blog › np.gradient() — a simple illustrated guide

np.gradient() - A Simple Illustrated Guide - Be on the Right Side of Change

June 24, 2022 - In Python, the numpy.gradient() function approximates the gradient of an N-dimensional array. It uses the second-order accurate central differences in the interior points and either first or second-order accurate one-sided differences at the ...

Stack Exchange

datascience.stackexchange.com › questions › 73807 › gradient-of-a-function-in-python

Gradient of a function in Python - Data Science Stack Exchange

Top answer

1 of 1

1

Several options:

You can use the defintion of the derivative to have an approximation....

def f(x):
    return x[0]**2 + 3*x[1]**3

def der(f, x, der_index=[]):
    # der_index: variable w.r.t. get gradient

    epsilon = 2.34E-10
    grads = []

   for idx in der_index:
       x_ = x.copy()
       x_[idx]+=epsilon

       grads.append((f(x_) - f(x))/epsilon)

   return grads

print(der(f, np.array([1.,1.]), der_index=[0, 1]))

If you can solve it analytically, it is better you write the derivative function by yourself
You can also use symbolic programming, like in Matlab, with the library sympy https://towardsdatascience.com/taking-derivatives-in-python-d6229ba72c64
Another way to do it is going for the "differentiable programming" paradigm or "software 2.0"

Kodeclik

kodeclik.com › numpy-gradient

Python numpy.gradient()

October 16, 2024 - numpy.gradient() computes the gradient of a function represented in an n-dimensional array using finite differences.

Stack Overflow

stackoverflow.com › questions › 24633618 › what-does-numpy-gradient-do

python - What does numpy.gradient do? - Stack Overflow

Top answer

1 of 4

196

Also in the documentation¹:

>>> y = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> j = np.gradient(y)
>>> j 
array([ 1. ,  1.5,  2.5,  3.5,  4.5,  5. ])

Gradient is defined as (change in y)/(change in x).
x, here, is the list index, so the difference between adjacent values is 1.
At the boundaries, the first difference is calculated. This means that at each end of the array, the gradient given is simply, the difference between the end two values (divided by 1)
Away from the boundaries the gradient for a particular index is given by taking the difference between the the values either side and dividing by 2.

So, the gradient of y, above, is calculated thus:

j[0] = (y[1]-y[0])/1 = (2-1)/1  = 1
j[1] = (y[2]-y[0])/2 = (4-1)/2  = 1.5
j[2] = (y[3]-y[1])/2 = (7-2)/2  = 2.5
j[3] = (y[4]-y[2])/2 = (11-4)/2 = 3.5
j[4] = (y[5]-y[3])/2 = (16-7)/2 = 4.5
j[5] = (y[5]-y[4])/1 = (16-11)/1 = 5

You could find the minima of all the absolute values in the resulting array to find the turning points of a curve, for example.

¹The array is actually called x in the example in the docs, I've changed it to y to avoid confusion.

2 of 4

32

Here is what is going on. The Taylor series expansion guides us on how to approximate the derivative, given the value at close points. The simplest comes from the first order Taylor series expansion for a C^2 function (two continuous derivatives)...

f(x+h) = f(x) + f'(x)h+f''(xi)h^2/2.

One can solve for f'(x)...

f'(x) = [f(x+h) - f(x)]/h + O(h).

Can we do better? Yes indeed. If we assume C^3, then the Taylor expansion is

f(x+h) = f(x) + f'(x)h + f''(x)h^2/2 + f'''(xi) h^3/6, and
f(x-h) = f(x) - f'(x)h + f''(x)h^2/2 - f'''(xi) h^3/6.

Subtracting these (both the h^0 and h^2 terms drop out!) and solve for f'(x):

f'(x) = [f(x+h) - f(x-h)]/(2h) + O(h^2).

So, if we have a discretized function defined on equal distant partitions: x = x_0,x_0+h(=x_1),....,x_n=x_0+h*n, then numpy gradient will yield a "derivative" array using the first order estimate on the ends and the better estimates in the middle.

Example 1. If you don't specify any spacing, the interval is assumed to be 1. so if you call

f = np.array([5, 7, 4, 8])

what you are saying is that f(0) = 5, f(1) = 7, f(2) = 4, and f(3) = 8. Then

np.gradient(f)

will be: f'(0) = (7 - 5)/1 = 2, f'(1) = (4 - 5)/(2*1) = -0.5, f'(2) = (8 - 7)/(2*1) = 0.5, f'(3) = (8 - 4)/1 = 4.

Example 2. If you specify a single spacing, the spacing is uniform but not 1.

For example, if you call

np.gradient(f, 0.5)

this is saying that h = 0.5, not 1, i.e., the function is really f(0) = 5, f(0.5) = 7, f(1.0) = 4, f(1.5) = 8. The net effect is to replace h = 1 with h = 0.5 and all the results will be doubled.

Example 3. Suppose the discretized function f(x) is not defined on uniformly spaced intervals, for instance f(0) = 5, f(1) = 7, f(3) = 4, f(3.5) = 8, then there is a messier discretized differentiation function that the numpy gradient function uses and you will get the discretized derivatives by calling

np.gradient(f, np.array([0,1,3,3.5]))

Lastly, if your input is a 2d array, then you are thinking of a function f of x, y defined on a grid. The numpy gradient will output the arrays of "discretized" partial derivatives in x and y.

reddit.com › r/learnpython › need help in understanding np.gradient for calculating derivatives

r/learnpython on Reddit: Need help in understanding np.gradient for calculating derivatives

June 30, 2023 -

Hi, I'm trying to expand my knowledge in Machine Learning, I came across the np.gradient function, I wanted to understand how it relates to Taylor's Series for estimating values. The documentation seemed a bit confusing for novice.

Top answer

1 of 2

7

One definition of the derivative is f'(x) = (f(x+h)-f(x))/h where h goes to 0. Computers cannot store infinitely small numbers, so they might set h=1e-6 (that is 0.000001). It's a tradeoff because while we want h to be as small as possible, at some point the errors due to computer precision begin to dominate. Given any function that the computer can calculate, it can approximate the derivative. def f(x): return np.sin(x) x = np.arange(-2,2,0.01) y = f(x) dfdx = (f(x+h)-f(x))/h plt.plot(x,y) plt.plot(x,dfdx) plt.show() Assuming that the function is reasonably smooth (i.e. the derivative above exists), another definition of the derivative is f'(x) = (f(x+h)-f(x-h))/(2h) where h goes to 0. Going from x-h to x+h means 2 steps, that's the reason for 2h. Which works just as well. These methods are named finite difference to contrast from the normal derivative definition where h is infinitely small. The first one is the forward difference and the second one is called central difference. The backward difference is (f(x)-f(x-h))/2. Let's assume we want to write a derivative function. It takes a function f and values of x, and gives back f'(x). def f(x): return np.sin(x) def d(fun, x): return (fun(x+h)-fun(x))/h x = np.arange(-2,2,0.01) y = f(x) dfdx = d(f,x) plt.plot(x,y) plt.plot(x,dfdx) plt.show() By passing the function into the function, the derivative function can just call fun wherever it wants/needs to get the derivative. Now things become a bit more inconvenient. For some reason we do not know f. We only know y, i.e. f(x) for some values of x. Let's say that x is evenly spaced as usual. Then our best guess for h is not really tiny but identical to the spacing between neighboring x values. With the forward difference we need to take care at the rightmost value because we cannot just add +h to get a value even further out. Instead we use the backward difference. For values in the middle we decide to use the central difference instead of the forward difference. def f(x): return np.sin(x) def d(y, h=1): dfdx = [(y[1]-y[0])/h] for i in range(1,len(y)-1): dfdx.append((y[i+1]-y[i-1])/2/h) dfdx.append((y[i]-y[i-1])/h) return dfdx h = 0.01 x = np.arange(-2,2,h) y = f(x) dfdx = d(y,h) plt.plot(x,y) plt.plot(x,dfdx) plt.show() The implementation above corresponds to np.gradient in the one-dimensional case where varargs is set to case 1 or 2. The case where varargs is set to 3 or 4 would use x directly in d instead of h. However at that point the formula is more complicated as they mention in the documentation. Effectively any point has a hd (the forward step size) and a hs (the backward step size) and the formula is not just (f(x+hd)-f(x-hs))/(hd+hs) but instead that bigger expression given in the documentation, where the values of hd,hs act as some kind of weights. np.gradient is basically backwards, central and forward difference combined. When you have values like f(1),f(2),f(2+h) and want the derivative at 2, the code notices that 2 and 2+h are very close together and puts greater weight on that (and mostly ignores f(1)). The important part so far is that np.gradient when given a vector with N elements calculates N one-dimensional derivatives, which is not the typical idea of a gradient. np.gradient does support more dimensions which might make things clearer. So in the 1D case, we essentially go through all values from left to right and then consider that value and its direct left and right neighbor to quantify the uptrend or downtrend. In the 2D case, np.gradient still does this, but additionally also walks from top to bottom and does the same. So in 2D it returns 2 arrays, one for left-right and one for top-bottom. The actual definition of the gradient by finite differences is [(f(x+h,y)-f(x,y))/h, (f(x,y+h)-f(x,y))/h] in 2D. These values are indeed returned by np.gradient, the left part is in the first array and the right part in the second array. Say we are in 2D and want the gradient at x=3 and y=0, then we can plug it into np.gradient like this: hx = 1e-6 hy = 1e-3 x = [3,3+hx] y = [0,0+hy] xx,yy = np.meshgrid(x,y) def f(x,y): return x**2-2*x*np.sin(y) + 1/x grad = np.gradient(f(xx,yy), y,x) # Note the order. print(grad[1][0,0], grad[0][0,0]) # Note the order. This is dfdx, dfdy. but if the function f can be calculated by a computer, it makes more sense to just use automatic differentiation instead of finite differences. Automatic differentiation has no h that needs to be chosen carefully. It's always as accurate is possible. import torch x = torch.tensor([3.],requires_grad=True) y = torch.tensor([0.],requires_grad=True) z = x**2-2*x*torch.sin(y) + 1/x z.backward() print(x.grad, y.grad) So what's the deal with the Taylor series? It's just a minor piece in the derivation of that more general expression used by np.gradient. We just start by claiming that we can express the gradient by adding together function values in the direct neighborhood. f'(x) = a f(x) + b f(x+hd) + c f(x-hs) Given that finite differences do work out, this approach should work as well and generalize the idea. Expand f(x+hd) and f(x-hs) with their series: f(x+hd) = f(x) + hd f'(x) + hd^2 f''(x)/2 + ... f(x-hs) = f(x) - hs f'(x) + hs^2 f''(x)/2 + ... Then plug it in and reshape: f'(x) = a f(x) + b f(x) + b hd f'(x) + b hd^2 f''(x)/2 + c f(x) - c hs f'(x) + c hs^2 f''(x)/2 = (a+b+c) f(x) + (b hd - c hs) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) 0 = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) The = in the middle is actually more of an approximately equal sign. We won't be able to reach 0 for all f(x) as claimed on the left hand size, but we can get pretty close. We do NOT want to minimize the right-hand-side. We want it to reach 0 (it can go below 0 right now). To turn this into a minimization problem, we square it. This way we get a positive number always and it really becomes a matter of minimization. We COULD also take the absolute value instead of squaring, but it's pain to work this through and the end result are exactly the same parameters anyway. To minimize: E2 with E = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) One requirement for an optimum is that the gradient is 0. In this case we take the derivatives with respect to a,b,c because we want to find the optimal a,b,c. First a reminder of the chain rule: dE2 /dt = 2E dE/dt for whatever t is. It's optional to do this but a bit less messy than working it through individually. In particular we have dE^2/da = 2E dE/da = 2E f(x) dE^2/db = 2E dE/db = 2E (f(x) + hd f'(x) + hd^2 f''(x)/2) dE^2/dc = 2E dE/dc = 2E (f(x) - hs f'(x) + hs^2 f''(x)/2) We want ALL three of them to be 0 at the same time. This can only happen if E is 0. 0 := (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) and we want this to be 0 for any f, f', f'' for any value of x. The only way for this to happen is if each coefficient is 0, i.e. a+b+c = 0 b hd - c hs = 1 b hd^2 + c hs^2 = 0 We would need to check the second derivative to make sure that this is a minimum, not a maximum, but given the problem it is fairly clear. So why did we stop exactly after f'' in the Taylor series? It's because this way we get exactly 3 unknowns and 3 equations, which is the most convenient to solve. Multiply the second equation by hd then subtract the third from it. (b hd^2 - c hs hd) - (b hd^2 + c hs^2) = hd -c hs^2 - c hs hd = hd c hs (hs + hd) = -hd c = -hd/hs/(hs+hd) = -hd^2 / (hs hd (hs+hd)) where the last step is just so it looks exactly like in np.gradient. Insert c into the second equation. b hd + hd/hs/(hs+hd) hs = 1 b hd + hd/(hs+hd) = 1 b + 1/(hs+hd) = 1/hd b = 1/hd - 1/(hs+hd) b = (hs(hs+hd) - hs hd) / [hs hd (hs+hd)] b = hs^2 / [hs hd (hs+hd)] From the first equation we know that a = -b-c = (hd2 - hs2 )/(hs hd (hs+hd)). So here's your summary: If you have a function that can be calculated by a computer, use torch or tensorflow or any other framework for automatic differentiation. If you have a function that can be calculated by a computer but such a framework is not available, np.gradient is still a bad idea because it is inefficient. Note for the 2D gradient we needed three values, f(x,y), f(x+dx,y), f(x,y+dy). But with np.gradient we would first need to set up arrays where it is almost natural to also include f(x+dx,y+dy) which is not needed for gradient calculations. It's more natural to set up some loop that increments x once, then y once, then z once, and so on. Many solvers in scipy.optimize work with finite differences. If you have a function that cannot be calculated by a computer, np.gradient may be useful. In practice this means that you have data from some experiment. Even there, the concept of a Taylor series plays no role here UNLESS the data was taken on an unevenly spaced grid.

2 of 2

2

You might enjoy this stackoverflow post on the same question

Aleksandar Haber

aleksandarhaber.com › automatic-computation-of-gradients-of-nonlinear-functions-in-python

Automatic Computation of Gradients of Multivariable Functions in Python – Fusion of Engineering, Control, Coding, Machine Learning, and Science

November 25, 2023 - First, let us explain how to compute the gradients of the function ... # -*- coding: utf-8 -*- """ Computation of Symbolic Gradients in Python and Automatic Generation of Python Functions for Gradients """ import numpy as np from sympy import * init_printing() # symbolic state vector x = MatrixSymbol('x',2,1) # nonlinear function f=Matrix([(x[0]-2)**2+(x[1]-2)**2]) # compute the gradient g=f.jacobian(x) # compute the transpose g=g.transpose()

Medium

surajsinghbisht054.medium.com › mastering-gradient-descent-math-python-and-the-magic-behind-machine-learning-d12a7791f24e

Mastering Gradient Descent: Math, Python, and the Magic Behind Machine Learning — Part 1 | by Suraj Singh Bisht | Medium

December 26, 2023 - If that describes you, then let’s proceed! Gradient Descent serves as an optimization technique employed to pinpoint the lowest point, known as a local minimum, in a function ...

AskPython

askpython.com › home › numpy gradient: returning the gradient of n-dimensional array

Numpy Gradient: Returning the Gradient of N-dimensional Array - AskPython

December 29, 2022 - Good Lord above heavens! But hey, we have got Python to carry out some of the stuff for us. Phew! This article explains on the deployment of the gradient( ) function within the numpy library of Python for usage against the arrays of N-dimensions.

TutorialsPoint

tutorialspoint.com › how-to-estimate-the-gradient-of-a-function-in-one-or-more-dimensions-in-pytorch

How to estimate the gradient of a function in one or more dimensions in PyTorch?

To estimate the gradient of a function, we can apply the torch.gradient() function. This function estimates the gradient using the second-order accurate central differences method. We can estimate the gradient in one or more dimen

Stack Overflow

stackoverflow.com › questions › 17901363 › gradient-calculation-with-python

numpy - Gradient calculation with python - Stack Overflow

Top answer

1 of 1

31

You need to give gradient a matrix that describes your angular frequency values for your (x,y) points. e.g.

def f(x,y):
    return np.sin((x + y))
x = y = np.arange(-5, 5, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)

gx,gy = np.gradient(Z,0.05,0.05)

You can see that plotting Z as a surface gives:

Here is how to interpret your gradient:

gx is a matrix that gives the change dz/dx at all points. e.g. gx[0][0] is dz/dx at (x0,y0). Visualizing gx helps in understanding:

Since my data was generated from f(x,y) = sin(x+y) gy looks the same.

Here is a more obvious example using f(x,y) = sin(x)...

f(x,y)

and the gradients

update Let's take a look at the xy pairs.

This is the code I used:

def f(x,y):
    return np.sin(x)
x = y = np.arange(-3,3,.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
xy_pairs = np.array([str(x)+','+str(y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
xy_pairs = xy_pairs.reshape(X.shape)

gy,gx = np.gradient(Z,.05,.05)

Now we can look and see exactly what is happening. Say we wanted to know what point was associated with the value atZ[20][30]? Then...

>>> Z[20][30]
-0.99749498660405478

And the point is

>>> xy_pairs[20][30]
'-1.5,-2.0'

Is that right? Let's check.

>>> np.sin(-1.5)
-0.99749498660405445

Yes.

And what are our gradient components at that point?

>>> gy[20][30]
0.0
>>> gx[20][30]
0.070707731517679617

Do those check out?

dz/dy always 0 check. dz/dx = cos(x) and...

>>> np.cos(-1.5)
0.070737201667702906

Looks good.

You'll notice they aren't exactly correct, that is because my Z data isn't continuous, there is a step size of 0.05 and gradient can only approximate the rate of change.

linkedin.com › pulse › understanding-gradient-descent-python-rany-elhousieny-phdᴬᴮᴰ

Understanding Gradient Descent in Python

February 7, 2024 - Compute the slope (gradient): Determine the direction in which the hill is steepest. In math, this is the derivative of the function at your current location. Take a step downhill: Move a small distance in the direction of the steepest slope.

SciPy

docs.scipy.org › doc › scipy › reference › generated › scipy.optimize.approx_fprime.html

approx_fprime — SciPy v1.17.0 Manual

If a function maps from \(R^n\) to \(R^m\), its derivatives form an m-by-n matrix called the Jacobian, where an element \((i, j)\) is a partial derivative of f[i] with respect to xk[j]. ... The coordinate vector at which to determine the gradient of f.

reddit.com › r/learnpython › how do i get the gradient of a multi-dimensional function in python?

r/learnpython on Reddit: How do I get the gradient of a multi-dimensional function in python?

January 7, 2023 -

I a trying to find a simple python function to get the gradient of a mathematical function. numpy gradient() doesn't get a function as an input so it doesn't seem like the right thing to used is there any other option? Than you!

Top answer

1 of 2

6

numpy is all about efficient numerical calculations. If you want to do more symbolic things like this, look into scipy (if you're in a data science, ML, or applied math class) or even SageMath (if you're doing pure math).

2 of 2

2

Do you need it at a given point, like in gradient descent, or do you need the functional form? Can you do a gradient of a single dimensional function? Say y=f(x). Assuming so, then to extend to multidimension, say y=f(x_1,x_2,x_n) ,then you just need to get the partial derivative of y w.r.t to x_1-x_n. The code to do the single dimensional case should work for any of the partial derivatives.