Also in the documentation1:

>>> y = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> j = np.gradient(y)
>>> j 
array([ 1. ,  1.5,  2.5,  3.5,  4.5,  5. ])
  • Gradient is defined as (change in y)/(change in x).

  • x, here, is the list index, so the difference between adjacent values is 1.

  • At the boundaries, the first difference is calculated. This means that at each end of the array, the gradient given is simply, the difference between the end two values (divided by 1)

  • Away from the boundaries the gradient for a particular index is given by taking the difference between the the values either side and dividing by 2.

So, the gradient of y, above, is calculated thus:

j[0] = (y[1]-y[0])/1 = (2-1)/1  = 1
j[1] = (y[2]-y[0])/2 = (4-1)/2  = 1.5
j[2] = (y[3]-y[1])/2 = (7-2)/2  = 2.5
j[3] = (y[4]-y[2])/2 = (11-4)/2 = 3.5
j[4] = (y[5]-y[3])/2 = (16-7)/2 = 4.5
j[5] = (y[5]-y[4])/1 = (16-11)/1 = 5

You could find the minima of all the absolute values in the resulting array to find the turning points of a curve, for example.


1The array is actually called x in the example in the docs, I've changed it to y to avoid confusion.

Answer from SiHa on Stack Overflow
๐ŸŒ
NumPy
numpy.org โ€บ doc โ€บ 2.1 โ€บ reference โ€บ generated โ€บ numpy.gradient.html
numpy.gradient โ€” NumPy v2.1 Manual
Durran D. R. (1999) Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. New York: Springer. ... Fornberg B. (1988) Generation of Finite Difference Formulas on Arbitrarily Spaced Grids, Mathematics of Computation 51, no. 184 : 699-706. PDF. ... >>> import numpy as np >>> f = np.array([1, 2, 4, 7, 11, 16]) >>> np.gradient(f) array([1.
๐ŸŒ
NumPy
numpy.org โ€บ doc โ€บ stable โ€บ reference โ€บ generated โ€บ numpy.gradient.html
numpy.gradient โ€” NumPy v2.4 Manual
Durran D. R. (1999) Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. New York: Springer. ... Fornberg B. (1988) Generation of Finite Difference Formulas on Arbitrarily Spaced Grids, Mathematics of Computation 51, no. 184 : 699-706. PDF. ... Try it in your browser! >>> import numpy as np >>> f = np.array([1, 2, 4, 7, 11, 16]) >>> np.gradient(f) array([1.
Discussions

python - Numerical gradient for nonlinear function in numpy/scipy - Stack Overflow
I'm trying to implement an numerical gradient calculation in numpy to be used as the callback function for the gradient in cyipopt. My understanding of the numpy gradient function is that it should More on stackoverflow.com
๐ŸŒ stackoverflow.com
Need help in understanding np.gradient for calculating derivatives
One definition of the derivative is f'(x) = (f(x+h)-f(x))/h where h goes to 0. Computers cannot store infinitely small numbers, so they might set h=1e-6 (that is 0.000001). It's a tradeoff because while we want h to be as small as possible, at some point the errors due to computer precision begin to dominate. Given any function that the computer can calculate, it can approximate the derivative. def f(x): return np.sin(x) x = np.arange(-2,2,0.01) y = f(x) dfdx = (f(x+h)-f(x))/h plt.plot(x,y) plt.plot(x,dfdx) plt.show() Assuming that the function is reasonably smooth (i.e. the derivative above exists), another definition of the derivative is f'(x) = (f(x+h)-f(x-h))/(2h) where h goes to 0. Going from x-h to x+h means 2 steps, that's the reason for 2h. Which works just as well. These methods are named finite difference to contrast from the normal derivative definition where h is infinitely small. The first one is the forward difference and the second one is called central difference. The backward difference is (f(x)-f(x-h))/2. Let's assume we want to write a derivative function. It takes a function f and values of x, and gives back f'(x). def f(x): return np.sin(x) def d(fun, x): return (fun(x+h)-fun(x))/h x = np.arange(-2,2,0.01) y = f(x) dfdx = d(f,x) plt.plot(x,y) plt.plot(x,dfdx) plt.show() By passing the function into the function, the derivative function can just call fun wherever it wants/needs to get the derivative. Now things become a bit more inconvenient. For some reason we do not know f. We only know y, i.e. f(x) for some values of x. Let's say that x is evenly spaced as usual. Then our best guess for h is not really tiny but identical to the spacing between neighboring x values. With the forward difference we need to take care at the rightmost value because we cannot just add +h to get a value even further out. Instead we use the backward difference. For values in the middle we decide to use the central difference instead of the forward difference. def f(x): return np.sin(x) def d(y, h=1): dfdx = [(y[1]-y[0])/h] for i in range(1,len(y)-1): dfdx.append((y[i+1]-y[i-1])/2/h) dfdx.append((y[i]-y[i-1])/h) return dfdx h = 0.01 x = np.arange(-2,2,h) y = f(x) dfdx = d(y,h) plt.plot(x,y) plt.plot(x,dfdx) plt.show() The implementation above corresponds to np.gradient in the one-dimensional case where varargs is set to case 1 or 2. The case where varargs is set to 3 or 4 would use x directly in d instead of h. However at that point the formula is more complicated as they mention in the documentation. Effectively any point has a hd (the forward step size) and a hs (the backward step size) and the formula is not just (f(x+hd)-f(x-hs))/(hd+hs) but instead that bigger expression given in the documentation, where the values of hd,hs act as some kind of weights. np.gradient is basically backwards, central and forward difference combined. When you have values like f(1),f(2),f(2+h) and want the derivative at 2, the code notices that 2 and 2+h are very close together and puts greater weight on that (and mostly ignores f(1)). The important part so far is that np.gradient when given a vector with N elements calculates N one-dimensional derivatives, which is not the typical idea of a gradient. np.gradient does support more dimensions which might make things clearer. So in the 1D case, we essentially go through all values from left to right and then consider that value and its direct left and right neighbor to quantify the uptrend or downtrend. In the 2D case, np.gradient still does this, but additionally also walks from top to bottom and does the same. So in 2D it returns 2 arrays, one for left-right and one for top-bottom. The actual definition of the gradient by finite differences is [(f(x+h,y)-f(x,y))/h, (f(x,y+h)-f(x,y))/h] in 2D. These values are indeed returned by np.gradient, the left part is in the first array and the right part in the second array. Say we are in 2D and want the gradient at x=3 and y=0, then we can plug it into np.gradient like this: hx = 1e-6 hy = 1e-3 x = [3,3+hx] y = [0,0+hy] xx,yy = np.meshgrid(x,y) def f(x,y): return x**2-2*x*np.sin(y) + 1/x grad = np.gradient(f(xx,yy), y,x) # Note the order. print(grad[1][0,0], grad[0][0,0]) # Note the order. This is dfdx, dfdy. but if the function f can be calculated by a computer, it makes more sense to just use automatic differentiation instead of finite differences. Automatic differentiation has no h that needs to be chosen carefully. It's always as accurate is possible. import torch x = torch.tensor([3.],requires_grad=True) y = torch.tensor([0.],requires_grad=True) z = x**2-2*x*torch.sin(y) + 1/x z.backward() print(x.grad, y.grad) So what's the deal with the Taylor series? It's just a minor piece in the derivation of that more general expression used by np.gradient. We just start by claiming that we can express the gradient by adding together function values in the direct neighborhood. f'(x) = a f(x) + b f(x+hd) + c f(x-hs) Given that finite differences do work out, this approach should work as well and generalize the idea. Expand f(x+hd) and f(x-hs) with their series: f(x+hd) = f(x) + hd f'(x) + hd^2 f''(x)/2 + ... f(x-hs) = f(x) - hs f'(x) + hs^2 f''(x)/2 + ... Then plug it in and reshape: f'(x) = a f(x) + b f(x) + b hd f'(x) + b hd^2 f''(x)/2 + c f(x) - c hs f'(x) + c hs^2 f''(x)/2 = (a+b+c) f(x) + (b hd - c hs) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) 0 = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) The = in the middle is actually more of an approximately equal sign. We won't be able to reach 0 for all f(x) as claimed on the left hand size, but we can get pretty close. We do NOT want to minimize the right-hand-side. We want it to reach 0 (it can go below 0 right now). To turn this into a minimization problem, we square it. This way we get a positive number always and it really becomes a matter of minimization. We COULD also take the absolute value instead of squaring, but it's pain to work this through and the end result are exactly the same parameters anyway. To minimize: E2 with E = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) One requirement for an optimum is that the gradient is 0. In this case we take the derivatives with respect to a,b,c because we want to find the optimal a,b,c. First a reminder of the chain rule: dE2 /dt = 2E dE/dt for whatever t is. It's optional to do this but a bit less messy than working it through individually. In particular we have dE^2/da = 2E dE/da = 2E f(x) dE^2/db = 2E dE/db = 2E (f(x) + hd f'(x) + hd^2 f''(x)/2) dE^2/dc = 2E dE/dc = 2E (f(x) - hs f'(x) + hs^2 f''(x)/2) We want ALL three of them to be 0 at the same time. This can only happen if E is 0. 0 := (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) and we want this to be 0 for any f, f', f'' for any value of x. The only way for this to happen is if each coefficient is 0, i.e. a+b+c = 0 b hd - c hs = 1 b hd^2 + c hs^2 = 0 We would need to check the second derivative to make sure that this is a minimum, not a maximum, but given the problem it is fairly clear. So why did we stop exactly after f'' in the Taylor series? It's because this way we get exactly 3 unknowns and 3 equations, which is the most convenient to solve. Multiply the second equation by hd then subtract the third from it. (b hd^2 - c hs hd) - (b hd^2 + c hs^2) = hd -c hs^2 - c hs hd = hd c hs (hs + hd) = -hd c = -hd/hs/(hs+hd) = -hd^2 / (hs hd (hs+hd)) where the last step is just so it looks exactly like in np.gradient. Insert c into the second equation. b hd + hd/hs/(hs+hd) hs = 1 b hd + hd/(hs+hd) = 1 b + 1/(hs+hd) = 1/hd b = 1/hd - 1/(hs+hd) b = (hs(hs+hd) - hs hd) / [hs hd (hs+hd)] b = hs^2 / [hs hd (hs+hd)] From the first equation we know that a = -b-c = (hd2 - hs2 )/(hs hd (hs+hd)). So here's your summary: If you have a function that can be calculated by a computer, use torch or tensorflow or any other framework for automatic differentiation. If you have a function that can be calculated by a computer but such a framework is not available, np.gradient is still a bad idea because it is inefficient. Note for the 2D gradient we needed three values, f(x,y), f(x+dx,y), f(x,y+dy). But with np.gradient we would first need to set up arrays where it is almost natural to also include f(x+dx,y+dy) which is not needed for gradient calculations. It's more natural to set up some loop that increments x once, then y once, then z once, and so on. Many solvers in scipy.optimize work with finite differences. If you have a function that cannot be calculated by a computer, np.gradient may be useful. In practice this means that you have data from some experiment. Even there, the concept of a Taylor series plays no role here UNLESS the data was taken on an unevenly spaced grid. More on reddit.com
๐ŸŒ r/learnpython
4
2
June 30, 2023
python - Calculating gradient with NumPy - Stack Overflow
What you essentially have to do, is to define a grid in three dimension and to evaluate the function on this grid. Afterwards you feed this table of function values to numpy.gradient to get an array with the numerical derivative for every dimension (variable). More on stackoverflow.com
๐ŸŒ stackoverflow.com
Higher order central differences using NumPy.gradient()
Hello everyone, I am new to Python and am still learning it. So my apologies if this is a basic question. I am given two arrays: X and Y. Where Y=2*(x^2)+x/2. I need to calculate the first and the fifth order central differences of Y with respect to X using the numpy.gradient function. More on discuss.python.org
๐ŸŒ discuss.python.org
0
0
September 5, 2022
Top answer
1 of 4
196

Also in the documentation1:

>>> y = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> j = np.gradient(y)
>>> j 
array([ 1. ,  1.5,  2.5,  3.5,  4.5,  5. ])
  • Gradient is defined as (change in y)/(change in x).

  • x, here, is the list index, so the difference between adjacent values is 1.

  • At the boundaries, the first difference is calculated. This means that at each end of the array, the gradient given is simply, the difference between the end two values (divided by 1)

  • Away from the boundaries the gradient for a particular index is given by taking the difference between the the values either side and dividing by 2.

So, the gradient of y, above, is calculated thus:

j[0] = (y[1]-y[0])/1 = (2-1)/1  = 1
j[1] = (y[2]-y[0])/2 = (4-1)/2  = 1.5
j[2] = (y[3]-y[1])/2 = (7-2)/2  = 2.5
j[3] = (y[4]-y[2])/2 = (11-4)/2 = 3.5
j[4] = (y[5]-y[3])/2 = (16-7)/2 = 4.5
j[5] = (y[5]-y[4])/1 = (16-11)/1 = 5

You could find the minima of all the absolute values in the resulting array to find the turning points of a curve, for example.


1The array is actually called x in the example in the docs, I've changed it to y to avoid confusion.

2 of 4
32

Here is what is going on. The Taylor series expansion guides us on how to approximate the derivative, given the value at close points. The simplest comes from the first order Taylor series expansion for a C^2 function (two continuous derivatives)...

  • f(x+h) = f(x) + f'(x)h+f''(xi)h^2/2.

One can solve for f'(x)...

  • f'(x) = [f(x+h) - f(x)]/h + O(h).

Can we do better? Yes indeed. If we assume C^3, then the Taylor expansion is

  • f(x+h) = f(x) + f'(x)h + f''(x)h^2/2 + f'''(xi) h^3/6, and
  • f(x-h) = f(x) - f'(x)h + f''(x)h^2/2 - f'''(xi) h^3/6.

Subtracting these (both the h^0 and h^2 terms drop out!) and solve for f'(x):

  • f'(x) = [f(x+h) - f(x-h)]/(2h) + O(h^2).

So, if we have a discretized function defined on equal distant partitions: x = x_0,x_0+h(=x_1),....,x_n=x_0+h*n, then numpy gradient will yield a "derivative" array using the first order estimate on the ends and the better estimates in the middle.

Example 1. If you don't specify any spacing, the interval is assumed to be 1. so if you call

f = np.array([5, 7, 4, 8])

what you are saying is that f(0) = 5, f(1) = 7, f(2) = 4, and f(3) = 8. Then

np.gradient(f) 

will be: f'(0) = (7 - 5)/1 = 2, f'(1) = (4 - 5)/(2*1) = -0.5, f'(2) = (8 - 7)/(2*1) = 0.5, f'(3) = (8 - 4)/1 = 4.

Example 2. If you specify a single spacing, the spacing is uniform but not 1.

For example, if you call

np.gradient(f, 0.5)

this is saying that h = 0.5, not 1, i.e., the function is really f(0) = 5, f(0.5) = 7, f(1.0) = 4, f(1.5) = 8. The net effect is to replace h = 1 with h = 0.5 and all the results will be doubled.

Example 3. Suppose the discretized function f(x) is not defined on uniformly spaced intervals, for instance f(0) = 5, f(1) = 7, f(3) = 4, f(3.5) = 8, then there is a messier discretized differentiation function that the numpy gradient function uses and you will get the discretized derivatives by calling

np.gradient(f, np.array([0,1,3,3.5]))

Lastly, if your input is a 2d array, then you are thinking of a function f of x, y defined on a grid. The numpy gradient will output the arrays of "discretized" partial derivatives in x and y.

๐ŸŒ
Kodeclik
kodeclik.com โ€บ numpy-gradient
Python numpy.gradient()
October 16, 2024 - In numpy, the gradient function takes an array of values as input and computes the numerical gradient along a specified axis.
๐ŸŒ
NumPy
numpy.org โ€บ doc โ€บ 2.0 โ€บ reference โ€บ generated โ€บ numpy.gradient.html
numpy.gradient โ€” NumPy v2.0 Manual
Durran D. R. (1999) Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. New York: Springer. ... Fornberg B. (1988) Generation of Finite Difference Formulas on Arbitrarily Spaced Grids, Mathematics of Computation 51, no. 184 : 699-706. PDF. ... >>> f = np.array([1, 2, 4, 7, 11, 16], dtype=float) >>> np.gradient(f) array([1.
๐ŸŒ
Scaler
scaler.com โ€บ home โ€บ topics โ€บ what is the numpy.gradient() method in numpy?
What is the numpy.gradient() method in Numpy? - Scaler Topics
May 4, 2023 - The gradient is calculated using the numpy gradient() function by utilizing either the first or second-order correct one-sides (in either direction) differences at the boundaries and second-order accurate central differences in the interior ...
Find elsewhere
๐ŸŒ
JAX Documentation
docs.jax.dev โ€บ en โ€บ latest โ€บ _autosummary โ€บ jax.numpy.gradient.html
jax.numpy.gradient โ€” JAX documentation
>>> def f(x): ... return jnp.sin(x) * jnp.exp(-x / 4) ... >>> def gradf_exact(x): ... # exact analytical gradient of f(x) ... return -f(x) / 4 + jnp.cos(x) * jnp.exp(-x / 4) ... >>> x = jnp.linspace(0, 5, 10) >>> with jnp.printoptions(precision=2, suppress=True): ... print("numerical gradient:", jnp.gradient(f(x), x)) ...
Top answer
1 of 2
6

First of all, some warnings:

  • numerical-optimization is hard to do right
  • ipopt is very complex software
    • combining ipopt with numerical-differentiation sounds like you are asking for trouble, but that depends on your problem of course
    • ipopt is almost always based on automatic-differentiation tools and not numerical-differentiation!

And some more:

  • as this is a complex task and the state of python + ipopt is not as nice as in some other languages (julia + JuMP for example), it's a bit of work

And some alternatives:

  • use pyomo which wraps ipopt and has automatic-differentiation
  • use casadi which also wraps ipopt and has automatic-differentiation
  • use autograd to automatically calculate gradients on a subset of numpy-code
    • then use cyipopt to add those
  • scipy.minimize with solvers SLSQP or COBYLA which can do everything for you (SLSQP can use equality and inequality constraints; COBYLA only inequality-constraints, where emulating equality-constraints by x >= y + x <= y can work)

Approaching your task with your tools

Your complete example-problem is defined in Test Examples for Nonlinear Programming Codes:

Here is some code, based on numerical-differentiation, solving your test-problem, including the official setup (function, gradients, start-point, bounds, ...)

import numpy as np
import scipy.sparse as sps
import ipopt
from scipy.optimize import approx_fprime


class Problem40(object):
    """ # Hock & Schittkowski test problem #40
            Basic structure  follows:
            - cyipopt example from https://pythonhosted.org/ipopt/tutorial.html#defining-the-problem
            - which follows ipopt's docs from: https://www.coin-or.org/Ipopt/documentation/node22.html
            Changes:
            - numerical-diff using scipy for function & constraints
            - removal of hessian-calculation
              - we will use limited-memory approximation
                - ipopt docs: https://www.coin-or.org/Ipopt/documentation/node31.html
              - (because i'm too lazy to reason about the math; lagrange and co.)
    """
    def __init__(self):
        self.num_diff_eps = 1e-8  # maybe tuning needed!

    def objective(self, x):
        # callback for objective
        return -np.prod(x)  # -x1 x2 x3 x4

    def constraint_0(self, x):
        return np.array([x[0]**3 + x[1]**2 -1])

    def constraint_1(self, x):
        return np.array([x[0]**2 * x[3] - x[2]])

    def constraint_2(self, x):
        return np.array([x[3]**2 - x[1]])

    def constraints(self, x):
        # callback for constraints
        return np.concatenate([self.constraint_0(x),
                               self.constraint_1(x),
                               self.constraint_2(x)])

    def gradient(self, x):
        # callback for gradient
        return approx_fprime(x, self.objective, self.num_diff_eps)

    def jacobian(self, x):
        # callback for jacobian
        return np.concatenate([
            approx_fprime(x, self.constraint_0, self.num_diff_eps),
            approx_fprime(x, self.constraint_1, self.num_diff_eps),
            approx_fprime(x, self.constraint_2, self.num_diff_eps)])

    def hessian(self, x, lagrange, obj_factor):
        return False  # we will use quasi-newton approaches to use hessian-info

    # progress callback
    def intermediate(
            self,
            alg_mod,
            iter_count,
            obj_value,
            inf_pr,
            inf_du,
            mu,
            d_norm,
            regularization_size,
            alpha_du,
            alpha_pr,
            ls_trials
            ):

        print("Objective value at iteration #%d is - %g" % (iter_count, obj_value))

# Remaining problem definition; still following official source:
# http://www.ai7.uni-bayreuth.de/test_problem_coll.pdf

# start-point -> infeasible
x0 = [0.8, 0.8, 0.8, 0.8]

# variable-bounds -> empty => np.inf-approach deviates from cyipopt docs!
lb = [-np.inf, -np.inf, -np.inf, -np.inf]
ub = [np.inf, np.inf, np.inf, np.inf]

# constraint bounds -> c == 0 needed -> both bounds = 0
cl = [0, 0, 0]
cu = [0, 0, 0]

nlp = ipopt.problem(
            n=len(x0),
            m=len(cl),
            problem_obj=Problem40(),
            lb=lb,
            ub=ub,
            cl=cl,
            cu=cu
            )

# IMPORTANT: need to use limited-memory / lbfgs here as we didn't give a valid hessian-callback
nlp.addOption(b'hessian_approximation', b'limited-memory')
x, info = nlp.solve(x0)
print(x)
print(info)

# CORRECT RESULT & SUCCESSFUL STATE

Output:

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:       12
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        4
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        3
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

Objective value at iteration #0 is - -0.4096
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 -4.0960000e-01 2.88e-01 2.53e-02   0.0 0.00e+00    -  0.00e+00 0.00e+00   0
Objective value at iteration #1 is - -0.255391
   1 -2.5539060e-01 1.28e-02 2.98e-01 -11.0 2.51e-01    -  1.00e+00 1.00e+00h  1
Objective value at iteration #2 is - -0.249299
   2 -2.4929898e-01 8.29e-05 3.73e-01 -11.0 7.77e-03    -  1.00e+00 1.00e+00h  1
Objective value at iteration #3 is - -0.25077
   3 -2.5076955e-01 1.32e-03 3.28e-01 -11.0 2.46e-02    -  1.00e+00 1.00e+00h  1
Objective value at iteration #4 is - -0.250025
   4 -2.5002535e-01 4.06e-05 1.93e-02 -11.0 4.65e-03    -  1.00e+00 1.00e+00h  1
Objective value at iteration #5 is - -0.25
   5 -2.5000038e-01 6.57e-07 1.70e-04 -11.0 5.46e-04    -  1.00e+00 1.00e+00h  1
Objective value at iteration #6 is - -0.25
   6 -2.5000001e-01 2.18e-08 2.20e-06 -11.0 9.69e-05    -  1.00e+00 1.00e+00h  1
Objective value at iteration #7 is - -0.25
   7 -2.5000000e-01 3.73e-12 4.42e-10 -11.0 1.27e-06    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 7

                                   (scaled)                 (unscaled)
Objective...............:  -2.5000000000225586e-01   -2.5000000000225586e-01
Dual infeasibility......:   4.4218750883118219e-10    4.4218750883118219e-10
Constraint violation....:   3.7250202922223252e-12    3.7250202922223252e-12
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   4.4218750883118219e-10    4.4218750883118219e-10


Number of objective function evaluations             = 8
Number of objective gradient evaluations             = 8
Number of equality constraint evaluations            = 8
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 8
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 0
Total CPU secs in IPOPT (w/o function evaluations)   =      0.016
Total CPU secs in NLP function evaluations           =      0.000

EXIT: Optimal Solution Found.
[ 0.79370053  0.70710678  0.52973155  0.84089641]
{'x': array([ 0.79370053,  0.70710678,  0.52973155,  0.84089641]), 'g': array([  3.72502029e-12,  -3.93685085e-13,   5.86974913e-13]), 'obj_val': -0.25000000000225586, 'mult_g': array([ 0.49999999, -0.47193715,  0.35355339]), 'mult_x_L': array([ 0.,  0.,  0.,  0.]), 'mult_x_U': array([ 0.,  0.,  0.,  0.]), 'status': 0, 'status_msg': b'Algorithm terminated successfully at a locally optimal point, satisfying the convergence tolerances (can be specified by options).'}

Remarks about the code

  • We use scipy's approx_fprime which basically was added for all those gradient-based optimizers in scipy.optimize
  • As stated in the sources; i did not take care about ipopt's need for the hessian and we used ipopts hessian-approximation
    • the basic idea is described at wiki: LBFGS
  • I did ignore ipopts need for sparsity structure of the Jacobian of the constraints
    • a default-assumption: the default hessian structure is of a lower triangular matrix is used and i won't give any guarantees on what can happen here (bad performance vs. breaking everything)
2 of 2
0

I think you have some kind of misunderstanding about what is a mathematical function and what is its numerical implementation.

You should define your function as:

def func(x1, x2, x3, x4):
    return -x1*x2*x3*x4

Now you want to evaluate your function at specific points, which you can do using the np.mgrid you provided.

If you want to compute your gradient, use copy.misc.derivative(https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html) (watch out the default parameters for dx is usually bad, change it to 1e-5. There is no difference between linear and non-linear gradient for the numerical evaluation, only that for non linear function the gradient won't be the same everywhere.

What you did was with np.gradient was actually to compute the gradient from the point in your array, the definition of your function being hidden by your definition of f, thus not allowing for multiple gradient evaluation at different points. Also using your method makes you dependant of your discretisation step.

๐ŸŒ
Reddit
reddit.com โ€บ r/learnpython โ€บ need help in understanding np.gradient for calculating derivatives
r/learnpython on Reddit: Need help in understanding np.gradient for calculating derivatives
June 30, 2023 -

Hi, I'm trying to expand my knowledge in Machine Learning, I came across the np.gradient function, I wanted to understand how it relates to Taylor's Series for estimating values. The documentation seemed a bit confusing for novice.

Top answer
1 of 2
7
One definition of the derivative is f'(x) = (f(x+h)-f(x))/h where h goes to 0. Computers cannot store infinitely small numbers, so they might set h=1e-6 (that is 0.000001). It's a tradeoff because while we want h to be as small as possible, at some point the errors due to computer precision begin to dominate. Given any function that the computer can calculate, it can approximate the derivative. def f(x): return np.sin(x) x = np.arange(-2,2,0.01) y = f(x) dfdx = (f(x+h)-f(x))/h plt.plot(x,y) plt.plot(x,dfdx) plt.show() Assuming that the function is reasonably smooth (i.e. the derivative above exists), another definition of the derivative is f'(x) = (f(x+h)-f(x-h))/(2h) where h goes to 0. Going from x-h to x+h means 2 steps, that's the reason for 2h. Which works just as well. These methods are named finite difference to contrast from the normal derivative definition where h is infinitely small. The first one is the forward difference and the second one is called central difference. The backward difference is (f(x)-f(x-h))/2. Let's assume we want to write a derivative function. It takes a function f and values of x, and gives back f'(x). def f(x): return np.sin(x) def d(fun, x): return (fun(x+h)-fun(x))/h x = np.arange(-2,2,0.01) y = f(x) dfdx = d(f,x) plt.plot(x,y) plt.plot(x,dfdx) plt.show() By passing the function into the function, the derivative function can just call fun wherever it wants/needs to get the derivative. Now things become a bit more inconvenient. For some reason we do not know f. We only know y, i.e. f(x) for some values of x. Let's say that x is evenly spaced as usual. Then our best guess for h is not really tiny but identical to the spacing between neighboring x values. With the forward difference we need to take care at the rightmost value because we cannot just add +h to get a value even further out. Instead we use the backward difference. For values in the middle we decide to use the central difference instead of the forward difference. def f(x): return np.sin(x) def d(y, h=1): dfdx = [(y[1]-y[0])/h] for i in range(1,len(y)-1): dfdx.append((y[i+1]-y[i-1])/2/h) dfdx.append((y[i]-y[i-1])/h) return dfdx h = 0.01 x = np.arange(-2,2,h) y = f(x) dfdx = d(y,h) plt.plot(x,y) plt.plot(x,dfdx) plt.show() The implementation above corresponds to np.gradient in the one-dimensional case where varargs is set to case 1 or 2. The case where varargs is set to 3 or 4 would use x directly in d instead of h. However at that point the formula is more complicated as they mention in the documentation. Effectively any point has a hd (the forward step size) and a hs (the backward step size) and the formula is not just (f(x+hd)-f(x-hs))/(hd+hs) but instead that bigger expression given in the documentation, where the values of hd,hs act as some kind of weights. np.gradient is basically backwards, central and forward difference combined. When you have values like f(1),f(2),f(2+h) and want the derivative at 2, the code notices that 2 and 2+h are very close together and puts greater weight on that (and mostly ignores f(1)). The important part so far is that np.gradient when given a vector with N elements calculates N one-dimensional derivatives, which is not the typical idea of a gradient. np.gradient does support more dimensions which might make things clearer. So in the 1D case, we essentially go through all values from left to right and then consider that value and its direct left and right neighbor to quantify the uptrend or downtrend. In the 2D case, np.gradient still does this, but additionally also walks from top to bottom and does the same. So in 2D it returns 2 arrays, one for left-right and one for top-bottom. The actual definition of the gradient by finite differences is [(f(x+h,y)-f(x,y))/h, (f(x,y+h)-f(x,y))/h] in 2D. These values are indeed returned by np.gradient, the left part is in the first array and the right part in the second array. Say we are in 2D and want the gradient at x=3 and y=0, then we can plug it into np.gradient like this: hx = 1e-6 hy = 1e-3 x = [3,3+hx] y = [0,0+hy] xx,yy = np.meshgrid(x,y) def f(x,y): return x**2-2*x*np.sin(y) + 1/x grad = np.gradient(f(xx,yy), y,x) # Note the order. print(grad[1][0,0], grad[0][0,0]) # Note the order. This is dfdx, dfdy. but if the function f can be calculated by a computer, it makes more sense to just use automatic differentiation instead of finite differences. Automatic differentiation has no h that needs to be chosen carefully. It's always as accurate is possible. import torch x = torch.tensor([3.],requires_grad=True) y = torch.tensor([0.],requires_grad=True) z = x**2-2*x*torch.sin(y) + 1/x z.backward() print(x.grad, y.grad) So what's the deal with the Taylor series? It's just a minor piece in the derivation of that more general expression used by np.gradient. We just start by claiming that we can express the gradient by adding together function values in the direct neighborhood. f'(x) = a f(x) + b f(x+hd) + c f(x-hs) Given that finite differences do work out, this approach should work as well and generalize the idea. Expand f(x+hd) and f(x-hs) with their series: f(x+hd) = f(x) + hd f'(x) + hd^2 f''(x)/2 + ... f(x-hs) = f(x) - hs f'(x) + hs^2 f''(x)/2 + ... Then plug it in and reshape: f'(x) = a f(x) + b f(x) + b hd f'(x) + b hd^2 f''(x)/2 + c f(x) - c hs f'(x) + c hs^2 f''(x)/2 = (a+b+c) f(x) + (b hd - c hs) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) 0 = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd^2 + c hs^2 )/2 f''(x) The = in the middle is actually more of an approximately equal sign. We won't be able to reach 0 for all f(x) as claimed on the left hand size, but we can get pretty close. We do NOT want to minimize the right-hand-side. We want it to reach 0 (it can go below 0 right now). To turn this into a minimization problem, we square it. This way we get a positive number always and it really becomes a matter of minimization. We COULD also take the absolute value instead of squaring, but it's pain to work this through and the end result are exactly the same parameters anyway. To minimize: E2 with E = (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) One requirement for an optimum is that the gradient is 0. In this case we take the derivatives with respect to a,b,c because we want to find the optimal a,b,c. First a reminder of the chain rule: dE2 /dt = 2E dE/dt for whatever t is. It's optional to do this but a bit less messy than working it through individually. In particular we have dE^2/da = 2E dE/da = 2E f(x) dE^2/db = 2E dE/db = 2E (f(x) + hd f'(x) + hd^2 f''(x)/2) dE^2/dc = 2E dE/dc = 2E (f(x) - hs f'(x) + hs^2 f''(x)/2) We want ALL three of them to be 0 at the same time. This can only happen if E is 0. 0 := (a+b+c) f(x) + (b hd - c hs - 1) f'(x) + (b hd2 + c hs2 )/2 f''(x) and we want this to be 0 for any f, f', f'' for any value of x. The only way for this to happen is if each coefficient is 0, i.e. a+b+c = 0 b hd - c hs = 1 b hd^2 + c hs^2 = 0 We would need to check the second derivative to make sure that this is a minimum, not a maximum, but given the problem it is fairly clear. So why did we stop exactly after f'' in the Taylor series? It's because this way we get exactly 3 unknowns and 3 equations, which is the most convenient to solve. Multiply the second equation by hd then subtract the third from it. (b hd^2 - c hs hd) - (b hd^2 + c hs^2) = hd -c hs^2 - c hs hd = hd c hs (hs + hd) = -hd c = -hd/hs/(hs+hd) = -hd^2 / (hs hd (hs+hd)) where the last step is just so it looks exactly like in np.gradient. Insert c into the second equation. b hd + hd/hs/(hs+hd) hs = 1 b hd + hd/(hs+hd) = 1 b + 1/(hs+hd) = 1/hd b = 1/hd - 1/(hs+hd) b = (hs(hs+hd) - hs hd) / [hs hd (hs+hd)] b = hs^2 / [hs hd (hs+hd)] From the first equation we know that a = -b-c = (hd2 - hs2 )/(hs hd (hs+hd)). So here's your summary: If you have a function that can be calculated by a computer, use torch or tensorflow or any other framework for automatic differentiation. If you have a function that can be calculated by a computer but such a framework is not available, np.gradient is still a bad idea because it is inefficient. Note for the 2D gradient we needed three values, f(x,y), f(x+dx,y), f(x,y+dy). But with np.gradient we would first need to set up arrays where it is almost natural to also include f(x+dx,y+dy) which is not needed for gradient calculations. It's more natural to set up some loop that increments x once, then y once, then z once, and so on. Many solvers in scipy.optimize work with finite differences. If you have a function that cannot be calculated by a computer, np.gradient may be useful. In practice this means that you have data from some experiment. Even there, the concept of a Taylor series plays no role here UNLESS the data was taken on an unevenly spaced grid.
2 of 2
2
You might enjoy this stackoverflow post on the same question
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ how-to-find-gradient-of-a-function-using-python
How to find Gradient of a Function using Python? | GeeksforGeeks
July 28, 2020 - We will use numdifftools to find Gradient of a function. Examples: Input : x^4+x+1 Output :Gradient of x^4+x+1 at x=1 is 4.99 Input :(1-x)^2+(y-x^2)^2 Output :Gradient of (1-x^2)+(y-x^2)^2 at (1, 2) is [-4.
๐ŸŒ
Readthedocs
numdifftools.readthedocs.io โ€บ en โ€บ latest โ€บ reference โ€บ generated โ€บ numdifftools.core.Gradient.html
5.1.1.2. numdifftools.core.Gradient โ€” Numdifftools 0.9.41 documentation
Integrals of Derivatives. Numerische Mathematik. ... >>> import numpy as np >>> import numdifftools as nd >>> fun = lambda x: np.sum(x**2) >>> dfun = nd.Gradient(fun) >>> np.allclose(dfun([1,2,3]), [ 2., 4., 6.]) True
๐ŸŒ
Medium
surajsinghbisht054.medium.com โ€บ mastering-gradient-descent-math-python-and-the-magic-behind-machine-learning-d12a7791f24e
Mastering Gradient Descent: Math, Python, and the Magic Behind Machine Learning โ€” Part 1 | by Suraj Singh Bisht | Medium
December 26, 2023 - In this article, Iโ€™ll explain what gradient descent is, why itโ€™s crucial to learn, the basic math behind it, and how it benefits machine learning, and Iโ€™ll even provide a straightforward Python code example.
๐ŸŒ
Finxter
blog.finxter.com โ€บ home โ€บ learn python blog โ€บ np.gradient() โ€” a simple illustrated guide
np.gradient() - A Simple Illustrated Guide - Be on the Right Side of Change
June 24, 2022 - In Python, the numpy.gradient() function approximates the gradient of an N-dimensional array. It uses the second-order accurate central differences in the interior points and either first or second-order accurate one-sided differences at the ...
๐ŸŒ
Aleksandar Haber
aleksandarhaber.com โ€บ automatic-computation-of-gradients-of-nonlinear-functions-in-python
Automatic Computation of Gradients of Multivariable Functions in Python โ€“ Fusion of Engineering, Control, Coding, Machine Learning, and Science
November 25, 2023 - That is why we need to transpose the result to obtain the gradient vector. Next, we create a Python function that will return the numerical value of the gradient for the given numerical value of the symbolic vector โ€œxโ€. The following Python script is used to achieve this task
๐ŸŒ
SciPy
docs.scipy.org โ€บ doc โ€บ numpy-1.9.3 โ€บ reference โ€บ generated โ€บ numpy.gradient.html
numpy.gradient โ€” NumPy v1.9 Manual
>>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float) >>> np.gradient(x) array([ 1. , 1.5, 2.5, 3.5, 4.5, 5.
๐ŸŒ
Svitla Systems
svitla.com โ€บ home โ€บ articles โ€บ numerical differentiation methods in python
Python for Numerical Differentiation: Methods & Tools
January 14, 2021 - For example, when finding the optimum of the values of functions. The calculation of the derivative is also used for gradient methods when training neural networks. In this post, we examine how you can calculate the value of the derivative using numerical methods in Python.
Price ย  $$$
Call ย  +1-415-891-8605
Address ย  100 Meadowcreek Drive, Suite 102, 94925, Corte Madera
๐ŸŒ
PyPI
pypi.org โ€บ project โ€บ numdifftools
numdifftools ยท PyPI
>>> xdata = np.reshape(np.arange(0,1,0.1),(-1,1)) >>> ydata = 1+2*np.exp(0.75*xdata) >>> fun = lambda c: (c[0]+c[1]*np.exp(c[2]*xdata) - ydata)**2 >>> Jfun = nda.Jacobian(fun, method='reverse') >>> np.allclose(np.abs(Jfun([1,2,0.75])), 0) # should be numerically zero True ... >>> fun = lambda x: np.sum(x**2) >>> dfun = nda.Gradient(fun) >>> np.allclose(dfun([1,2,3]), [ 2., 4., 6.]) True ... Numdifftools works on Python 2.7+ and Python 3.0+.
      ยป pip install numdifftools
    
Published ย  Dec 11, 2025
Version ย  0.9.42
๐ŸŒ
Python.org
discuss.python.org โ€บ python help
Higher order central differences using NumPy.gradient() - Python Help - Discussions on Python.org
September 5, 2022 - Hello everyone, I am new to Python and am still learning it. So my apologies if this is a basic question. I am given two arrays: X and Y. Where Y=2*(x^2)+x/2. I need to calculate the first and the fifth order central differences of Y with respect to X using the numpy.gradient function.