I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

  1. Calculate the hypothesis h = X * theta
  2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
  3. Calculate the gradient = X' * loss / m
  4. Update the parameters theta = theta - alpha * gradient

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

Answer from Thomas Jungblut on Stack Overflow
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum
Implementing gradient descent in Python to find a local minimum - GeeksforGeeks
October 25, 2025 - Gradient Descent is an optimization algorithm used to find the local minimum of a function. It is used in machine learning to minimize a cost or loss function by iteratively updating parameters in the opposite direction of the gradient.
Top answer
1 of 6
146

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

  1. Calculate the hypothesis h = X * theta
  2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
  3. Calculate the gradient = X' * loss / m
  4. Update the parameters theta = theta - alpha * gradient

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

2 of 6
12

Below you can find my implementation of gradient descent for linear regression problem.

At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.

  • X: feature matrix
  • y: target values
  • w: weights/values
  • N: size of training set

Here is the python code:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

🌐
Real Python
realpython.com › gradient-descent-algorithm-python
Stochastic Gradient Descent Algorithm With Python and NumPy – Real Python
October 21, 2023 - In this tutorial, you'll learn what the stochastic gradient descent algorithm is, how it works, and how to implement it with Python and NumPy.
🌐
Medium
induraj2020.medium.com › implementing-gradient-descent-in-python-d1c6aeb9a448
Implementing Gradient descent in python | by Induraj | Medium
February 22, 2023 - During each iteration of gradient descent, the parameters θ are updated according to the above formula, where ∇J(θ) is evaluated using the current values of θ. This means that in each iteration, the algorithm takes a step in the direction of the steepest descent of the cost function, with a step size determined by the learning rate.
🌐
Towards Data Science
towardsdatascience.com › home › latest › implementing gradient descent in python from scratch
Implementing Gradient Descent in Python from Scratch | Towards Data Science
January 21, 2025 - Gradient Descent is an optimisation algorithm which helps you find the optimal weights for your model. It does it by trying various weights and finding the weights which fit the models best i.e. minimises the cost function.
🌐
PyImageSearch
pyimagesearch.com › home › blog › gradient descent with python
Gradient Descent with Python - PyImageSearch
August 10, 2022 - Learn how to implement the gradient descent algorithm for machine learning, neural networks, and deep learning using Python.
🌐
LinkedIn
linkedin.com › pulse › understanding-gradient-descent-python-rany-elhousieny-phdᴬᴮᴰ
Understanding Gradient Descent in Python
February 7, 2024 - In each iteration, we compute the gradient and update x using the Gradient Descent formula. After optimization, minimum_x contains the value of x that minimizes the cost function, and minimum_cost contains the minimum cost. Now, let's run the code and see the expected output.
Find elsewhere
🌐
Stack Abuse
stackabuse.com › gradient-descent-in-python-implementation-and-theory
Gradient Descent in Python: Implementation and Theory
November 16, 2023 - In this tutorial, we'll go over the theory on how does gradient descent work and how to implement it in Python. Then, we'll implement batch and stochastic gradient descent to minimize Mean Squared Error functions.
🌐
Kaggle
kaggle.com › code › penchalaiah123 › step-by-step-guide-to-gradient-descent
Step by Step Guide to Gradient Descent
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
GeeksforGeeks
geeksforgeeks.org › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum
How to implement a gradient descent in Python to find a local minimum ? - GeeksforGeeks
December 14, 2022 - To implement a gradient descent algorithm, we require a cost function that needs to be minimized, the number of iterations, a learning rate to determine the step size at each iteration while moving towards the minimum, partial derivatives for weight & bias to update the parameters at each iteration, and a prediction function. Till now we have seen the parameters required for gradient descent.
🌐
DataCamp
datacamp.com › tutorial › stochastic-gradient-descent
Stochastic Gradient Descent in Python: A Complete Guide for ML Optimization | DataCamp
July 24, 2024 - I’ve pasted the entire code into this GitHub gist so that you can look at the whole picture. Let’s test how well the found parameters work by running the model on the test set: # Find the optimal parameters m, b = stochastic_gradient_descent(train_xy[:, 0], train_xy[:, 1]) # Make predictions y_preds = model(m, test_xy[:, 0], b) # Compute and print the loss mean_squared_error = loss(test_xy[:, 1], y_preds) mean_squared_error ** 0.5 1595.3955619759456
🌐
GitHub
github.com › xbeat › Machine-Learning › blob › main › Building a Gradient Descent Optimizer from Scratch in Python.md
Machine-Learning/Building a Gradient Descent Optimizer from Scratch in Python.md at main · xbeat/Machine-Learning
def nag_gradient_descent(func, initial_x, learning_rate, momentum, num_iterations): x = initial_x velocity = 0 for _ in range(num_iterations): x_ahead = x + momentum * velocity grad = gradient(func, x_ahead) velocity = momentum * velocity - ...
Author   xbeat
🌐
MachineLearningMastery
machinelearningmastery.com › home › blog › how to implement gradient descent optimization from scratch
How to Implement Gradient Descent Optimization from Scratch - MachineLearningMastery.com
October 12, 2021 - How to implement the gradient descent algorithm from scratch in Python. How to apply the gradient descent algorithm to an objective function. Do you have any questions? Ask your questions in the comments below and I will do my best to answer. ... It provides self-study tutorials with full working code on: Gradient Descent, Genetic Algorithms, Hill Climbing, Curve Fitting, RMSProp, Adam, and much more...
🌐
Medium
prasad07143.medium.com › variants-of-gradient-descent-and-their-implementation-in-python-from-scratch-2b3cceb7a1a0
Step-by-Step: Implementing Gradient Descent Variants in Python for Beginners — A Comprehensive Guide | by Prasad Meesala | Medium
November 22, 2023 - They are: i) Batch GD ii) Stochastic GD, and iii) Mini Batch GD. The difference between them lies mainly in the number of training instances or samples used per iteration, computational complexity, and likelihood of finding a global minima.
🌐
Bogotobogo
bogotobogo.com › python › python_numpy_batch_gradient_descent_algorithm.php
Python Tutorial: batch gradient descent algorithm - 2020
Batch gradient descent algorithm Single Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function Batch gradient descent versus stochastic gradient descent Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD) Logistic Regression VC (Vapnik-Chervonenkis) Dimension and Shatter Bias-variance tradeoff Maximum Likelihood Estimation (MLE) Neural
🌐
Paperspace
blog.paperspace.com › part-1-generic-python-implementation-of-gradient-descent-for-nn-optimization
Implementing Gradient Descent in Python Part 1
April 9, 2021 - Through a series of tutorials, the gradient descent (GD) algorithm will be implemented from scratch in Python for optimizing parameters of artificial neural network (ANN) in the backpropagation phase.
🌐
TutorialsPoint
tutorialspoint.com › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum
How to implement a gradient descent in Python to find a local minimum?
April 25, 2023 - Gradient descent updates the input ... gradient descent in Python entails specifying the function to optimize and its derivative, initializing the input value, and determining the algorithm's learning rate and the number of iterations....
🌐
Towards Data Science
towardsdatascience.com › home › latest › gradient descent with momentum
Gradient Descent in Python
January 31, 2025 - Let’s assume the initial weights of the network under consideration correspond to point A. With gradient descent, the Loss function decreases rapidly along the slope AB as the gradient along this slope is high. But as soon as it reaches point B the gradient becomes very low.
🌐
AlgorithmMinds
algorithmminds.com › home › gradient descent in python – a step-by-step guide
Gradient Descent in Python - A Step-by-Step Guide
March 27, 2025 - This article covers its iterative process of gradient descent in python for minimizing cost functions, various types like batch, or mini-batch and SGD , and provides insights into implementing it in Python. Learn about the mathematical principles behind gradient descent, the critical role of the learning rate, and strategies to overcome challenges such as oscillation and slow convergence.