gradient descent python

stackoverflow.com › questions › 17784587 › gradient-descent-using-python-and-numpy

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

Calculate the hypothesis h = X * theta
Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
Calculate the gradient = X' * loss / m
Update the parameters theta = theta - alpha * gradient

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

Answer from Thomas Jungblut on Stack Overflow

GeeksforGeeks

geeksforgeeks.org › machine learning › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum

Implementing gradient descent in Python to find a local minimum - GeeksforGeeks

October 25, 2025 - Gradient Descent is an optimization algorithm used to find the local minimum of a function. It is used in machine learning to minimize a cost or loss function by iteratively updating parameters in the opposite direction of the gradient.

Medium

induraj2020.medium.com › implementing-gradient-descent-in-python-d1c6aeb9a448

Implementing Gradient descent in python | by Induraj | Medium

February 22, 2023 - During each iteration of gradient descent, the parameters θ are updated according to the above formula, where ∇J(θ) is evaluated using the current values of θ. This means that in each iteration, the algorithm takes a step in the direction of the steepest descent of the cost function, with a step size determined by the learning rate.

Videos

42:39

YouTube

Gradient Descent From Scratch In Python - YouTube

Gradient Descent implemented in Python [Machine Learning tutorial ...

April 8, 2023

28:44

YouTube

Gradient Descent From Scratch in Python - Visual Explanation - YouTube

April 18, 2023

youtube.com

Python - Gradient Descent: Step-by-Step to Find the Minimum!

youtube.com

Building the Gradient Descent Algorithm in 15 Minutes ...

youtube.com

Gradient Descent For Neural Network | Deep Learning ...

View all

Real Python

realpython.com › gradient-descent-algorithm-python

Stochastic Gradient Descent Algorithm With Python and NumPy – Real Python

October 21, 2023 - You’ll use only plain Python and NumPy, which enables you to write concise code when working with arrays (or vectors) and gain a performance boost. This is a basic implementation of the algorithm that starts with an arbitrary point, start, iteratively moves it toward the minimum, and returns a point that is hopefully at or near the minimum: ... 1def gradient_descent(gradient, start, learn_rate, n_iter): 2 vector = start 3 for _ in range(n_iter): 4 diff = -learn_rate * gradient(vector) 5 vector += diff 6 return vector

Stack Overflow

stackoverflow.com › questions › 17784587 › gradient-descent-using-python-and-numpy

gradient descent using python and numpy - Stack Overflow

Top answer

1 of 6

146

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

Calculate the hypothesis h = X * theta
Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
Calculate the gradient = X' * loss / m
Update the parameters theta = theta - alpha * gradient

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

2 of 6

Below you can find my implementation of gradient descent for linear regression problem.

At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.

X: feature matrix
y: target values
w: weights/values
N: size of training set

Here is the python code:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

Sbu-python-class

sbu-python-class.github.io › python-science › 11-machine-learning › gradient-descent.html

Gradient Descent — PHY 546: Python for Scientific Computing

def do_descent(dfdx, x0, eps=1.e-5, eta=2.e-3, args=None, ax=None): # dx will be the change in the solution -- we'll iterate until this # is small dx = 1.e30 xp_old = x0.copy() if args: grad = dfdx(xp_old, *args) else: grad = dfdx(xp_old) while dx > eps: xp = xp_old - eta * grad if ax: ax.plot([xp_old[0], xp[0]], [xp_old[1], xp[1]], color="C1") dx = np.linalg.norm(xp - xp_old) if args: grad_new = dfdx(xp, *args) else: grad_new = dfdx(xp) #eta_new = np.abs(np.transpose(xp) @ (grad_new - grad)) / np.linalg.norm(grad_new - grad)**2 #eta = min(10*eta, eta_new) grad = grad_new xp_old[:] = xp

scikit-learn

scikit-learn.org › stable › modules › sgd.html

1.5. Stochastic Gradient Descent — scikit-learn 1.8.0 documentation

Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector \(X\) to \([0,1]\) or \([-1,1]\), or standardize it to have mean \(0\) and variance \(1\). Note that the same scaling must be applied to the test vector to obtain meaningful results.

DataCamp

datacamp.com › tutorial › stochastic-gradient-descent

Stochastic Gradient Descent in Python: A Complete Guide for ML Optimization | DataCamp

July 24, 2024 - Learn Stochastic Gradient Descent, an essential optimization technique for machine learning, with this comprehensive Python guide. Perfect for beginners and experts.

linkedin.com › pulse › understanding-gradient-descent-python-rany-elhousieny-phdᴬᴮᴰ

Understanding Gradient Descent in Python

February 7, 2024 - In each iteration, we compute the gradient and update x using the Gradient Descent formula. After optimization, minimum_x contains the value of x that minimizes the cost function, and minimum_cost contains the minimum cost. Now, let's run the code and see the expected output. When you run the code, you should see the following output: ... Hrithik S. 4 years ago · Using Weakref in Python for Efficient Caching in…

Find elsewhere

Google Bing Mojeek

Towards Data Science

towardsdatascience.com › home › latest › implementing gradient descent in python from scratch

Implementing Gradient Descent in Python from Scratch | Towards Data Science

January 21, 2025 - Gradient Descent is an optimisation algorithm which helps you find the optimal weights for your model. It does it by trying various weights and finding the weights which fit the models best i.e. minimises the cost function.

GitHub

github.com › xbeat › Machine-Learning › blob › main › Building a Gradient Descent Optimizer from Scratch in Python.md

Machine-Learning/Building a Gradient Descent Optimizer from Scratch in Python.md at main · xbeat/Machine-Learning

Gradient descent is a fundamental optimization algorithm in machine learning. It's used to minimize a cost function by iteratively moving in the direction of steepest descent. In this presentation, we'll build a gradient descent optimizer from ...

Author xbeat

MachineLearningMastery

machinelearningmastery.com › home › blog › how to implement gradient descent optimization from scratch

How to Implement Gradient Descent Optimization from Scratch - MachineLearningMastery.com

October 12, 2021 - How to implement the gradient descent algorithm from scratch in Python.

Stack Abuse

stackabuse.com › gradient-descent-in-python-implementation-and-theory

Gradient Descent in Python: Implementation and Theory

November 16, 2023 - In this tutorial, we'll go over the theory on how does gradient descent work and how to implement it in Python. Then, we'll implement batch and stochastic gradient descent to minimize Mean Squared Error functions.

PyImageSearch

pyimagesearch.com › home › blog › gradient descent with python

Gradient Descent with Python - PyImageSearch

August 10, 2022 - Learn how to implement the gradient descent algorithm for machine learning, neural networks, and deep learning using Python.

Paperspace

blog.paperspace.com › part-1-generic-python-implementation-of-gradient-descent-for-nn-optimization

Implementing Gradient Descent in Python Part 1

April 9, 2021 - Through a series of tutorials, the gradient descent (GD) algorithm will be implemented from scratch in Python for optimizing parameters of artificial neural network (ANN) in the backpropagation phase.

GeeksforGeeks

geeksforgeeks.org › how-to-implement-a-gradient-descent-in-python-to-find-a-local-minimum

How to implement a gradient descent in Python to find a local minimum ? - GeeksforGeeks

December 14, 2022 - Now that we are clear with the gradient descent's internal working, let us look into the python implementation of gradient descent where we will be minimizing the cost function of the linear regression algorithm and finding the best fit line.

Kaggle

kaggle.com › code › penchalaiah123 › step-by-step-guide-to-gradient-descent

Step by Step Guide to Gradient Descent

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

GitHub

github.com › Ravi-IISc › Gradient-Descent-Algorithm-in-Python

GitHub - Ravi-IISc/Gradient-Descent-Algorithm-in-Python: Gradient Descent method is a conventional method for optimization of a function. Since gradient of a function is the direction of the steepest ascent, this method chooses negative of the gradient, that is direction of steepest descent. · GitHub

- GitHub - Ravi-IISc/Gradient-Descent-Algorithm-in-Python: Gradient Descent method is a conventional method for optimization of a function. Since gradient of a function is the direction of the steepest ascent, this method chooses negative of ...

Author Ravi-IISc

Medium

medium.com › @saeedkohans85 › gradient-descent-a-step-by-step-explanation-with-python-implementation-5b5a1664e460

Gradient Descent: A Step-by-Step Explanation with Python Implementation | by Saeedkohansal | Medium

March 18, 2025 - In this article, we will implement and explain Gradient Descent for optimizing a convex function, covering both the mathematical concepts and the Python code implementation step by step.

Duchesnay

duchesnay.github.io › pystatsml › optimization › optim_gradient_descent.html

Gradient descent — Statistics and Machine Learning in Python 0.5 documentation

Our goal is to move from the mountain in the top right corner (high cost) to the dark blue sea in the bottom left (low cost). The arrows represent the direction of steepest descent (negative gradient) from any given point–the direction that decreases the cost function as quickly as possible

Stack Overflow

stackoverflow.com › questions › 42065171 › gradient-descent-in-python

machine learning - Gradient Descent in Python - Stack Overflow