optimization algorithm
Gradient descent - Wikipedia
gradient descent example nonlinear equations
Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Gradient_descent
Gradient descent - Wikipedia
2 weeks ago - The idea is to take repeated steps in the opposite direction of the gradient (or approximate gradient) of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a trajectory that maximizes that function; the procedure is then known as gradient ascent.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › gradient-descent-in-linear-regression
Gradient Descent in Linear Regression - GeeksforGeeks
Gradient Descent is an optimization algorithm used in linear regression to find the best-fit line for the data. It works by gradually adjusting the line’s slope and intercept to reduce the difference between actual and predicted values.
Published   December 12, 2025
Discussions

How does gradient descent know which values to pick next?

I'm not exactly sure this is what you mean, but in practice it can be left to your discretion.

You can recompute the derivatives for all of the parameters at every iteration, or you can use approximations. In particular, if the target function is not a nice clean function like f(x,y,z), I typically just find partial derivatives for x, y, and z and pick the ones which will decrease the target by the most. If the function is convex it shouldn't typically impact the time by that much. If it's a clean function, you can compute the derivatives analytically.

Not sure if that's what you were asking though.

More on reddit.com
🌐 r/MachineLearning
12
5
February 10, 2014
[D] An Introduction to Gradient Descent

Gradient descent can be difficult to understand but I found that if you take small steps in the right direction you'll end up where you want to be.

More on reddit.com
🌐 r/MachineLearning
7
6
October 8, 2018
ELI5: Gradient Descent Algorithm

Imagine you're stranded in a mountainous area, and you're blindfolded. You'll be rescued if you reach the lowest point in a valley. Your only knowledge of your immediate surroundings comes from placing your foot one step away from yourself, and estimating which direction takes you the furthest downward. That's it basically, and it's quite accurate for a simple GD problem in 3 dimensions. It can be further complicated if there are multiple valleys, and real problems will usually have more dimensions (so just replace "checking the direction that goes downward the most" with "taking a derivative and choosing the direction that minimizes it". I'm a bit rusty on this topic but I think that's accurate enough to get the idea.

More on reddit.com
🌐 r/explainlikeimfive
3
9
November 4, 2016
ELI5: How does gradient descent work in Machine learning?
"Gradient descent" is fancy math wording for "if you you want to find the bottom, go downhill". Machine learning is a kind of optimization...keep changing numbers in a function (possibly randomly, possibly in a structured way) to make some output number as good as you can get it. You're either trying to minimize or maximize the output number. That output number is how "good" the machine learning algorithm is. Frequently it will be the error rate for some decision problem...you want to minimize the error rate (make as few mistakes as possible). Gradient is a math operator that takes in a bunch of numbers and spits out a vector that points in the direction the numbers increase fastest. That's mathematically the "most uphill" direction. If you go *opposite* that, you're going downhill...that's gradient descent. Run a bunch of versions of your machine learning algorithm, so you get a bunch of output numbers. Figure out the gradient of those output numbers. Then go the opposite direction...that will tend to be towards the minimum output value. Then do that over and over again until you can't get the output number any smaller. You've found the "local minimum"...anywhere you go from there is "up". This may or may not be the "global mimimum", the best possible value you can find...this is analogous to standing in a valley and going downhill until you hit the stream at the bottom, but there's a valley next door that's even deeper. You'd have to have started *in that other valley* to find that minimum. Finding local minima is relatively easy...finding global minima (and proving they're global) is generally a giant pain in the ass in optimization problems. Hence we find the local and, if it's good enough (particularly common in machine learning), we call that good. More on reddit.com
🌐 r/explainlikeimfive
8
0
November 2, 2021
People also ask

What is the best gradient descent algorithm?
The best gradient descent algorithm depends on the task, but Adam is widely favored for its adaptive learning rates and quick convergence, especially in deep learning and NLP tasks. Its versatility makes it a top choice for many applications.
🌐
labelyourdata.com
labelyourdata.com › home › articles › gradient descent algorithm: key concepts and uses
Gradient Descent Algorithm: Key Concepts and Uses in 2026 | Label ...
What are the three main types of gradient descent algorithm?
The three main types of a gradient descent algorithm are batch gradient descent, stochastic gradient descent (SGD), and mini-batch gradient descent. Each type differs in how much data it uses per iteration, balancing precision, speed, and computational efficiency.
🌐
labelyourdata.com
labelyourdata.com › home › articles › gradient descent algorithm: key concepts and uses
Gradient Descent Algorithm: Key Concepts and Uses in 2026 | Label ...
What happens if the learning rate in the gradient descent algorithm is set too high?
A high learning rate can cause the model to overshoot the optimal point, leading to erratic parameter updates. This often disrupts convergence and creates instability in training.
🌐
labelyourdata.com
labelyourdata.com › home › articles › gradient descent algorithm: key concepts and uses
Gradient Descent Algorithm: Key Concepts and Uses in 2026 | Label ...
🌐
Google
developers.google.com › machine learning › linear regression: gradient descent
Linear regression: Gradient descent | Machine Learning | Google for Developers
February 3, 2026 - Learn how gradient descent iteratively finds the weight and bias that minimize a model's loss. This page explains how the gradient descent algorithm works, and how to determine that a model has converged by looking at its loss curve.
🌐
OpenAI
blog.gopenai.com › gradient-descent-in-machine-learning-a-deep-dive-26cd1df53835
Gradient Descent in Machine Learning: A Deep Dive | by Ishwarya S | GoPenAI
October 24, 2024 - The learning rate, α, is a crucial hyperparameter in the gradient descent algorithm. It determines the size of the steps we take towards the minimum of the cost function. In the gradient descent formula, the learning rate multiplies the gradient to control how much we adjust the parameters in each iteration.
Find elsewhere
🌐
Label Your Data
labelyourdata.com › home › articles › gradient descent algorithm: key concepts and uses
Gradient Descent Algorithm: Key Concepts and Uses in 2026 | Label Your Data
Here’s why: ... A high learning ... and creates instability in training. The basic formula for gradient descent is: θ = θ - α∇J(θ)....
🌐
Medium
medium.com › @abhaysingh71711 › gradient-descent-explained-the-engine-behind-ai-training-2d8ef6ecad6f
Gradient Descent Explained: The Engine Behind AI Training | by Abhay singh | Medium
January 5, 2025 - It helps models find the optimal set of parameters by iteratively adjusting them in the opposite direction of the gradient. This article will deeply dive into gradient descent, exploring its different flavors, applications, and challenges.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › gradient-descent-algorithm-and-its-variants
Gradient Descent Algorithm in Machine Learning - GeeksforGeeks
Trains the model: In each iteration, the model makes predictions, calculates the error and updates the parameters using Gradient Descent.
Published   1 week ago
🌐
IBM
ibm.com › think › topics › gradient-descent
What is Gradient Descent? | IBM
November 17, 2025 - Similar to finding the line of best fit in linear regression, the goal of gradient descent is to minimize the cost function, or the error between predicted and actual y. In order to do this, it requires two data points—a direction and a learning rate. These factors determine the partial derivative calculations of future iterations, allowing it to gradually arrive at the local or global minimum (i.e.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › ml-stochastic-gradient-descent-sgd
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
Noisy Convergence: Since the gradient is estimated based on a single data point (or a small batch), the updates can be noisy, causing the cost function to fluctuate rather than steadily decrease. This makes convergence slower and more erratic than in batch gradient descent.
Published   September 30, 2025
🌐
DataCamp
datacamp.com › fr › tutorial › tutorial-gradient-descent
Gradient Descent in Machine Learning: A Deep Dive | DataCamp
September 23, 2024 - Gradient descent is an algorithm used in linear regression because of the computational complexity. The general mathematical formula for gradient descent is xt+1= xt- η∆xt, with η representing the learning rate and ∆xt the direction of descent. Gradient descent is an algorithm applicable ...
🌐
Cornell Computer Science
cs.cornell.edu › courses › cs4780 › 2022sp › notes › LectureNotes11.html
Lecture 11: Gradient Descent (and Beyond)
A good approximation can be to only compute its diagonal entries and multiply the update with a small step-size. Essentially you are then doing a hybrid between Newton's method and gradient descent, where you weigh the step-size for each dimension by the inverse Hessian.
🌐
Internal Pointers
internalpointers.com › post › gradient-descent-function.html
The gradient descent function - Internal Pointers
February 5, 2017 - Your starting position on the hill corresponds to the initial values given to [texi]\theta_0, \theta_1[texi]. Black route has a slightly different starting point compared to the white one, which reveals an interesting property of the gradient descent algorithm: changing the initial value of theta's might lead you to a different minimum.
🌐
Dive into Deep Learning
d2l.ai › chapter_optimization › gd.html
12.3. Gradient Descent — Dive into Deep Learning 1.0.3 documentation
Therefore, in gradient descent we first choose an initial value \(x\) and a constant \(\eta > 0\) and then use them to continuously iterate \(x\) until the stop condition is reached, for example, when the magnitude of the gradient \(|f'(x)|\) is small enough or the number of iterations has reached a certain value.
🌐
Cantors Paradise
cantorsparadise.com › gradient-descent-simplified-421b437507c0
Gradient Descent Simplified
January 12, 2025 - Gradient descent works by iteratively moving in the direction opposite to the gradient of the function, with the step size determined by a hyperparameter called “learning rate”. Gradient descent is commonly used in machine learning to adjust the parameters of a model in order to minimize the loss function, especially in deep learning.
🌐
Mit
introml.mit.edu › notes › gradient_descent.html
3 Gradient Descent – 6.390 - Intro to Machine Learning
We start by considering gradient descent in one dimension. Assume \(\Theta \in \mathbb{R}\), and that we know both \(J(\Theta)\) and its first derivative with respect to \(\Theta\), \(J'(\Theta)\). Here is pseudocode for gradient descent on an arbitrary function \(f\).
🌐
Medium
medium.com › @yennhi95zz › 4-a-beginners-guide-to-gradient-descent-in-machine-learning-773ba7cd3dfe
#4. A Beginner’s Guide to Gradient Descent in Machine Learning | by Yennhi95zz | Medium
May 31, 2023 - Implementing gradient descent involves updating the parameters iteratively. The update formula for parameter w is given by w = w — α * (dJ/dw), where α is the learning rate and (dJ/dw) is the derivative term of the cost function with respect ...
🌐
Deepchecks
deepchecks.com › glossary › gradient descent for machine learning
Gradient Descent For Machine Learning
September 20, 2021 - When minimizing a function, gradient descent uses iterative movement in the direction of steepest descent, as defined by the gradient’s inverse. It’s called gradient descent in machine learning and it’s used to update our model’s parameters.
🌐
Medium
medium.com › @datasciencewizards › a-simple-guide-to-gradient-descent-algorithm-60cbb66a0df9
A Simple Guide to Gradient Descent Algorithm | by Data Science Wizards | Medium
May 5, 2023 - In our list of articles, we aim to look at the guide to different machine learning algorithms, and with this article, we will take a look at one of the most important concepts of the machine learning algorithms world (Gradient Descent), which is based on the idea of iteratively adjust the parameter of the model steepest descent direction of the machine learning algorithm’s cost function.