🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum
Implementing gradient descent in Python to find a local minimum - GeeksforGeeks
October 25, 2025 - Gradient Descent is an optimization algorithm used to find the local minimum of a function. It is used in machine learning to minimize a cost or loss function by iteratively updating parameters in the opposite direction of the gradient.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › gradient-descent-algorithm-and-its-variants
Gradient Descent Algorithm in Machine Learning - GeeksforGeeks
Trains the model: In each iteration, the model makes predictions, calculates the error and updates the parameters using Gradient Descent. Tracks the loss: Stores the loss value in every iteration to observe how the error changes. Plots the loss graph: Shows how the error decreases over iterations. Plots the fitted line: Displays the data points along with the final regression line learned by the model. Python ·
Published   16 hours ago
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum
How to implement a gradient descent in Python to find a local minimum ? - GeeksforGeeks
December 14, 2022 - To implement a gradient descent algorithm, we require a cost function that needs to be minimized, the number of iterations, a learning rate to determine the step size at each iteration while moving towards the minimum, partial derivatives for weight & bias to update the parameters at each iteration, and a prediction function. Till now we have seen the parameters required for gradient descent.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › ml-mini-batch-gradient-descent-with-python
ML | Mini-Batch Gradient Descent with Python - GeeksforGeeks
July 5, 2025 - Mini-batch gradient descent is a optimization method that updates model parameters using small subsets of the training data called mini-batches. This technique offers a middle path between the high variance of stochastic gradient descent and ...
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › gradient-descent-in-linear-regression
Gradient Descent in Linear Regression - GeeksforGeeks
Python for Machine Learning · ... Updated : 12 Dec, 2025 · Gradient Descent is an optimization algorithm used in linear regression to find the best-fit line for the data....
Published   December 12, 2025
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › what-is-gradient-descent
What is Gradient Descent - GeeksforGeeks
January 17, 2026 - 5. Run gradient descent loop: Predict values: y_pred = m*X_scaled + c. Compute error and Mean Squared Error (MSE). Calculate gradients (dm, dc) and update m and c. Track loss in loss_history. Print progress every 100 iterations. Python ·
🌐
GeeksforGeeks
geeksforgeeks.org › python › stochastic-gradient-descent-classifier
Stochastic Gradient Descent Classifier - GeeksforGeeks
July 23, 2025 - In summary, the Stochastic Gradient Descent (SGD) Classifier in Python is a versatile optimization algorithm that underpins a wide array of machine learning applications. By efficiently updating model parameters using random subsets of data, ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › gradient-descent-optimization-in-tensorflow
Gradient Descent Optimization in Tensorflow - GeeksforGeeks
July 23, 2025 - DSA Python · Data Science · NumPy ... 2025 · Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function....
🌐
GeeksforGeeks
geeksforgeeks.org › python › applications-of-gradient-descent-in-tensorflow
Applications of Gradient Descent in TensorFlow - GeeksforGeeks
July 23, 2025 - DSA Python · Data Science · NumPy ... descent optimization procedure. It entails incrementally changing the model's parameters in the direction of the cost function's steepest decline....
Find elsewhere
🌐
GeeksforGeeks
geeksforgeeks.org › videos › how-does-gradient-descent-work-in-linear-regression
How Does Gradient Descent Work in Linear Regression - GeeksforGeeks | Videos
Gradient Descent in Python works by moving towards the minimum of a cost function to find the best-fit parameters. It does so by taking the derivative of the cost function and adjusting parameters step-by-step.
Published   December 4, 2024
Views   60K
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › numpy-gradient-descent-optimizer-of-neural-networks
Numpy Gradient - Descent Optimizer of Neural Networks - GeeksforGeeks
March 29, 2023 - The idea is very simple- start with an arbitrary starting point and move towards the minimum (that is -ve of gradient value), and return a point that is as close to the minimum. GD() is a user-defined function employed for this purpose.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › how-to-implement-adam-gradient-descent-from-scratch-using-python
How to Implement Adam Gradient Descent from Scratch using Python? - GeeksforGeeks
July 23, 2025 - Gradient Descent: An iterative optimization algorithm used to find the minimum of a function by iteratively adjusting the parameters in the direction of the steepest descent of the gradient.
🌐
PyPI
pypi.org › project › gradient-descent
gradient-descent · PyPI
April 5, 2020 - Optimization Techniques for Gradient Descent by www.geeksforgeeks.org website · optimization_algos GitHub repository by Iain Carmichael · [Deep Learning] (http://www.deeplearningbook.org) by Begnio, Goodfellow and Courtville · These details have been verified by PyPI · daniel_costa · These details have not been verified by PyPI · Homepage · License: MIT License · Author: Daniel da Costa · Requires: Python >=3.6 ·
      » pip install gradient-descent
    
Published   Apr 06, 2020
Version   0.0.3
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › ml-stochastic-gradient-descent-sgd
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
Python for Machine Learning · ... 2025 · Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets....
Published   September 30, 2025
🌐
GeeksforGeeks
geeksforgeeks.org › python › stochastic-gradient-descent-regressor
Stochastic Gradient Descent Regressor - GeeksforGeeks
July 23, 2025 - The gradient descent step size is determined by the learning rate. The stability and rate of convergence of training can be affected by the selection of learning rate. random_state: To set the random seed for repeatability, use the optional random_state parameter. You can make your experiments reproducible by ensuring that the model initialization and data randomization are constant across runs by setting a specified value for random_state. Python3 ·
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › different-variants-of-gradient-descent
Different Variants of Gradient Descent - GeeksforGeeks
September 29, 2025 - The update rule for batch gradient descent is: \theta = \theta - \eta \nabla J(\theta) where: \theta represents the parameters of the model · \eta is the learning rate · ∇J(θ) is the gradient of the loss function · J(θ)) with respect to · θ. Computes the gradient using all training examples. Averages the gradient over the full dataset. Updates theta once per epoch. Suitable for small to medium datasets. Python ·
🌐
GeeksforGeeks
geeksforgeeks.org › deep learning › mini-batch-gradient-descent-in-deep-learning
Mini-Batch Gradient Descent in Deep Learning - GeeksforGeeks
September 30, 2025 - Python · JavaScript · Data Science ... Sep, 2025 · Mini-batch gradient descent is a variant of the traditional gradient descent algorithm used to optimize the parameters i.e weights and biases of a neural network....
Top answer
1 of 6
146

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

  1. Calculate the hypothesis h = X * theta
  2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
  3. Calculate the gradient = X' * loss / m
  4. Update the parameters theta = theta - alpha * gradient

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

2 of 6
12

Below you can find my implementation of gradient descent for linear regression problem.

At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.

  • X: feature matrix
  • y: target values
  • w: weights/values
  • N: size of training set

Here is the python code:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

🌐
GeeksforGeeks
geeksforgeeks.org › python › gradient
Gradient - GeeksforGeeks
October 25, 2025 - In machine learning, the gradient guides gradient descent, an optimization algorithm used to minimize loss functions.