algorithms for estimating the derivative of mathematical functions
In numerical analysis, numerical differentiation algorithms estimate the derivative of a mathematical function or subroutine using values of the function. Unlike analytical differentiation, which provides exact expressions for derivatives, numerical differentiation relies … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Numerical_differentiation
Numerical differentiation - Wikipedia
February 22, 2026 - In numerical analysis, numerical differentiation algorithms estimate the derivative of a mathematical function or subroutine using values of the function. Unlike analytical differentiation, which provides exact expressions for derivatives, numerical differentiation relies on the function's ...
🌐
MathWorks
mathworks.com › matlab › mathematics › numerical integration and differential equations › numerical integration and differentiation
gradient - Numerical gradient - MATLAB
For the third output FZ and the outputs that follow, the Nth output is the gradient along the Nth dimension of F. ... The numerical gradient of a function is a way to estimate the values of the partial derivatives in each dimension using the known values of the function at certain points.
Discussions

c++ - Is there any "standard" way to calculate the numerical gradient? - Stack Overflow
I am trying to calculate the numerical gradient of a smooth function in c++. And the parameter value could vary from zero to a very large number(maybe 1e10 to 1e20?) I used the function f(x,y) = 1... More on stackoverflow.com
🌐 stackoverflow.com
vector analysis - How do we calculate the gradient from numerical data - Mathematics Stack Exchange
I have a 3D irregular mesh. Each coordinate point holds a specific data, say , for example, temperature. For a specific point, how do I calculate the temperature gradient ? More on math.stackexchange.com
🌐 math.stackexchange.com
March 25, 2016
Simple numerical differentiation
Hello there, I am wondering how to implement numerical differentiation in Julia simply. My MWE looks like, using ForwardDiff p = [1, 2, 3] f(x::Vector) = p[1] .+ p[2] .* x .+ p[3] .* x .^ 2 x=1:10 g = x -> ForwardDiff.gradient(f, x); g(x) The result looks like Array{Array{ForwardDiff.Dual{... More on discourse.julialang.org
🌐 discourse.julialang.org
0
0
December 23, 2020
Autograd function with numerical gradients
I have a non-differentiable loss function. Something that takes a few tensors that require gradients, copies them, computes some stuff, and then returns the cost as a tensor. Is there a way to force the autograd framewo… More on discuss.pytorch.org
🌐 discuss.pytorch.org
0
0
July 26, 2018
🌐
Medium
medium.com › @ayushjluhar › numerical-vs-analytical-gradients-in-machine-learning-7581c33c1db1
Numerical vs. Analytical Gradients in Machine Learning | by Luhar Ayush | Medium
April 1, 2025 - More limitations would be clear as we will make a deep dive in types of gradients. Continuing with the hill analogy. You want to find the quickest way down, but you can’t see the terrain. So, you use a simple trick: take a small step in each direction (north, south, east, west), feel how much the ground slopes in each direction, and then move in the direction where the ground drops the most steeply. This is essentially what numerical gradient calculation does in neural networks.
Top answer
1 of 6
12

In the first place, you should use the central difference scheme, which is more accurate (by cancellation of one more term of the Taylor develoment).

(f(x + h) - f(x - h)) / 2h

rather than

(f(x + h) - f(x)) / h

Then the choice of h is critical and using a fixed constant is the worst thing you can do. Because for small x, h will be too large so that the approximation formula no more works, and for large x, h will be too small, resulting in severe truncation error.

A much better choice is to take a relative value, h = x√ε, where ε is the machine epsilon (1 ulp), which gives a good tradeoff.

(f(x(1 + √ε)) - f(x(1 - √ε))) / 2x√ε

Beware that when x = 0, a relative value cannot work and you need to fall back to a constant. But then, nothing tells you which to use !

2 of 6
2

You need to consider the precision needed.

At first glance, since |y| = 5.49756e14 and epsi = 1e-4, you need at least ⌈log2(5.49756e14)-log2(1e-4)⌉ = 63 bits of significand precision (that is the number of bits used to encode the digits of your number, also known as mantissa) for y and y+epsi to be considered different.

The double-precision floating-point format only has 53 bits of significand precision (assuming it is 8 bytes). So, currently, f1, f2 and f3 are exactly the same because y, y+epsi and y-epsi are equal.

Now, let's consider the limit : y = 1e20, and the result of your function, 10x^3 + y^3. Let's ignore x for now, so let's take f = y^3. Now we can calculate the precision needed for f(y) and f(y+epsi) to be different : f(y) = 1e60 and f(epsi) = 1e-12. This gives a minimum significand precision of ⌈log2(1e60)-log2(1e-12)⌉ = 240 bits.

Even if you were to use the long double type, assuming it is 16 bytes, your results would not differ : f1, f2 and f3 would still be equal, even though y and y+epsi would not.

If we take x into account, the maximum value of f would be 11e60 (with x = y = 1e20). So the upper limit on precision is ⌈log2(11e60)-log2(1e-12)⌉ = 243 bits, or at least 31 bytes.

One way to solve your problem is to use another type, maybe a bignum used as fixed-point.

Another way is to rethink your problem and deal with it differently. Ultimately, what you want is f1 - f2. You can try to decompose f(y+epsi). Again, if you ignore x, f(y+epsi) = (y+epsi)^3 = y^3 + 3*y^2*epsi + 3*y*epsi^2 + epsi^3. So f(y+epsi) - f(y) = 3*y^2*epsi + 3*y*epsi^2 + epsi^3.

🌐
R Project
search.r-project.org › CRAN › refmans › numDeriv › html › grad.html
R: Numerical Gradient of a Function
If a vector x produces a scalar result then grad returns the numerical approximation of the gradient at the point x (which has the same length as x). If a vector x produces a vector result then the result must have the same length as x, and it is assumed that this corresponds to applying the function to each of its arguments (for example, sin(x)).
🌐
Creativescala
creativescala.org › case-study-gradient-descent › numerical-differentiation.html
Numerical Differentiation
The gradient is just a fancy name for the slope, and the slope is "rise over run". ... where \( h \) is a small number. In this equation \( h \) is the run, and \( f(x+h) - f(x) \) is the rise. This is the essential idea behind numerical differentiation.
Find elsewhere
🌐
Mpg
orca-manual.mpi-muelheim.mpg.de › contents › essentialelements › numericalgradients.html
2.23. Numerical Gradients — ORCA 6.1.1 Manual
The time for one gradient calculation is equal to 6 \(\times\) (number of atoms) \(\times\) (time for one single point calculation). The numerical gradient can be performed in a multi-process run, doing each displacement as a parallel calculation (see section Parallel and Multi-Process Modules).
🌐
NumPy
numpy.org › doc › stable › reference › generated › numpy.gradient.html
numpy.gradient — NumPy v2.4 Manual
The gradient is computed using second order accurate central differences in the interior points and either first or second order accurate one-sides (forward or backwards) differences at the boundaries.
🌐
arXiv
arxiv.org › abs › 2206.12643
[2206.12643] Optimized numerical gradient and Hessian estimation for variational quantum algorithms
November 20, 2022 - In particular, there exists a critical sampling-copy number below which an optimized difference estimator gives a smaller average estimation error in contrast to the standard (analytical) parameter-shift estimator, which exactly computes gradient and Hessian components.
🌐
arXiv
arxiv.org › abs › 2504.12806
[2504.12806] A Numerical Gradient Inversion Attack in Variational Quantum Neural-Networks
May 7, 2025 - Our scheme is based on gradient inversion that works by combining gradients estimation with the finite difference method and adaptive low-pass filtering. The scheme is further optimized with Kalman filter to obtain efficient convergence.
🌐
YouTube
youtube.com › tilestats
Numerical differentiation - simply explained - YouTube
https://www.tilestats.com/1. How to calculate the slope of a line numerically2. How to compute the first order numerical derivative (03:10)3. How to compute ...
Published   March 26, 2023
Views   4K
🌐
Wikipedia
en.wikipedia.org › wiki › Gradient_descent
Gradient descent - Wikipedia
2 weeks ago - Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradient ...
🌐
Julia Programming Language
discourse.julialang.org › new to julia
Simple numerical differentiation - New to Julia - Julia Programming Language
December 23, 2020 - Hello there, I am wondering how to implement numerical differentiation in Julia simply. My MWE looks like, using ForwardDiff p = [1, 2, 3] f(x::Vector) = p[1] .+ p[2] .* x .+ p[3] .* x .^ 2 x=1:10 g = x -> ForwardDiff.gradient(f, x); g(x) The result looks like Array{Array{ForwardDiff.Dual{...
🌐
PyTorch Forums
discuss.pytorch.org › autograd
Autograd function with numerical gradients - autograd - PyTorch Forums
July 26, 2018 - I have a non-differentiable loss function. Something that takes a few tensors that require gradients, copies them, computes some stuff, and then returns the cost as a tensor. Is there a way to force the autograd framewo…
🌐
YouTube
youtube.com › watch
Neural Networks Demystified [Part 5: Numerical Gradient Checking] - YouTube
When building complex systems like neural networks, checking portions of your work can save hours of headache. Here we'll check our gradient computations. Su...
Published   December 19, 2014
🌐
IntechOpen
intechopen.com › chapters › 84591
Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random Fields in a Statistical Framework | IntechOpen
November 1, 2022 - Accurate computation of gradients, w.r.t parameters of interest, is a key aspect of deterministic algorithms like Gauss-Newton, Levenberg–Marquardt, Occam’s inversion [1] as well as statistical algorithms like Hamiltonian Monte Carlo (HMC) [2]. Common nonintrusive methods like finite differences compute the gradient by taking differences between the response at the current model and at a perturbed model, such methods suffer from certain drawbacks. Two types of errors stand out in particular: numerical error involved with truncation of Taylor’s series and round-off error involved with finite precision arithmetic of computers [3]. For more robust analysis, this chapter focuses on methods where the gradient is computed directly from the analytical/numerical model by enlisting the sensitivity equations.