🌐
Medium
medium.com › @jaleeladejumo › gradient-descent-from-scratch-batch-gradient-descent-stochastic-gradient-descent-and-mini-batch-def681187473
Gradient Descent From Scratch- Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent. | by Jaleel Adejumo | Medium
April 12, 2023 - In this article, I will take you through the implementation of Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent coding from scratch in python. This will be beginners friendly. Understanding gradient descent method will help you in optimising your loss during ML model training.
🌐
Kaggle
kaggle.com › code › bhatnagardaksh › gradient-descent-from-scratch
Gradient Descent from scratch
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Spot Intelligence
spotintelligence.com › home › batch gradient descent in machine learning made simple & how to tutorial in python
Batch Gradient Descent In Machine Learning Made Simple & How To Tutorial In Python
May 22, 2024 - Below is the Python code for the batch gradient descent algorithm with a simple linear regression example for demonstration purposes.
🌐
Real Python
realpython.com › gradient-descent-algorithm-python
Stochastic Gradient Descent Algorithm With Python and NumPy – Real Python
October 21, 2023 - You’ll create a new function called sgd() that is very similar to gradient_descent() but uses randomly selected minibatches to move along the search space: ... 1import numpy as np 2 3def sgd( 4 gradient, x, y, start, learn_rate=0.1, batch_size=1, ...
🌐
Medium
medium.com › @zhaoyi0113 › python-implementation-of-batch-gradient-descent-379fa19eb428
Python implementation of batch gradient descent | by Joey Yi Zhao | Medium
July 26, 2023 - The different about batch gradient decent algorithm is that it computes values for the whole dataset. This is great for convex, or relatively smooth error manifolds. In this case, we move somewhat directly towards an optimum solution, either local or global. Additionally, batch gradient descent, given an annealed learning rate…
🌐
GeeksforGeeks
geeksforgeeks.org › ml-mini-batch-gradient-descent-with-python
ML | Mini-Batch Gradient Descent with Python | GeeksforGeeks
August 2, 2022 - Depending on the number of training ... descents: Batch Gradient Descent: Parameters are updated after computing the gradient of the error with respect to the entire training set...
🌐
Bogotobogo
bogotobogo.com › python › python_numpy_batch_gradient_descent_algorithm.php
Python Tutorial: batch gradient descent algorithm - 2020
Batch gradient descent algorithm Single Layer Neural Network - Perceptron model on the Iris dataset using Heaviside step activation function Batch gradient descent versus stochastic gradient descent Single Layer Neural Network - Adaptive Linear Neuron using linear (identity) activation function with batch gradient descent method Single Layer Neural Network : Adaptive Linear Neuron using linear (identity) activation function with stochastic gradient descent (SGD) Logistic Regression VC (Vapnik-Chervonenkis) Dimension and Shatter Bias-variance tradeoff Maximum Likelihood Estimation (MLE) Neural
🌐
YouTube
youtube.com › watch
Gradient Descent Implementation from Scratch in Python - YouTube
In this video we show how you can implement the batch gradient descent and stochastic gradient descent algorithms from scratch in python.** SUBSCRIBE:https:/...
Published   January 21, 2019
🌐
Medium
medium.com › @ugurozcan108 › batch-gradient-descent-in-python-4d3b16d40755
Batch Gradient Descent in Python. The gradient descent algorithm… | by Uğur Özcan | Medium
March 17, 2022 - Batch Gradient Descent in Python The gradient descent algorithm multiplies the gradient by a learning rate to determine the next point in the process of reaching a local minimum. In batch gradient …
Find elsewhere
🌐
Stack Abuse
stackabuse.com › gradient-descent-in-python-implementation-and-theory
Gradient Descent in Python: Implementation and Theory
November 16, 2023 - The gradient_descent() function can then be used as-is. Note that all training examples are processed together when computing the gradient. Hence, this version of gradient descent for updating weights is referred to as batch updating or batch learning:
🌐
The Land of Oz
ozzieliu.com › 2016 › 02 › 09 › gradient-descent-tutorial
Python Tutorial on Linear Regression with Batch Gradient Descent - The Land of Oz
February 10, 2016 - This method is called “batch” gradient descent because we use the entire batch of points X to calculate each gradient, as opposed to stochastic gradient descent. which uses one point at a time.
Top answer
1 of 2
13

This function returns the mini-batches given the inputs and targets:

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0] - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

and this tells you how to use that for training:

for n in xrange(n_epochs):
    for batch in iterate_minibatches(X, Y, batch_size, shuffle=True):
        x_batch, y_batch = batch
        l_train, acc_train = f_train(x_batch, y_batch)

    l_val, acc_val = f_val(Xt, Yt)
    logging.info('epoch ' + str(n) + ' ,train_loss ' + str(l_train) + ' ,acc ' + str(acc_train) + ' ,val_loss ' + str(l_val) + ' ,acc ' + str(acc_val))

Obviously you need to define the f_train, f_val and other functions yourself given the optimisation library (e.g. Lasagne, Keras) you are using.

2 of 2
6

The following function returns (yields) mini-batches. It is based on the function provided by Ash, but correctly handles the last minibatch.

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert inputs.shape[0] == targets.shape[0]
    if shuffle:
        indices = np.arange(inputs.shape[0])
        np.random.shuffle(indices)
    for start_idx in range(0, inputs.shape[0], batchsize):
        end_idx = min(start_idx + batchsize, inputs.shape[0])
        if shuffle:
            excerpt = indices[start_idx:end_idx]
        else:
            excerpt = slice(start_idx, end_idx)
        yield inputs[excerpt], targets[excerpt]
Top answer
1 of 6
146

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

  1. Calculate the hypothesis h = X * theta
  2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
  3. Calculate the gradient = X' * loss / m
  4. Update the parameters theta = theta - alpha * gradient

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

At first I create a small random dataset which should look like this:

As you can see I also added the generated regression line and formula that was calculated by excel.

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

2 of 6
12

Below you can find my implementation of gradient descent for linear regression problem.

At first, you calculate gradient like X.T * (X * w - y) / N and update your current theta with this gradient simultaneously.

  • X: feature matrix
  • y: target values
  • w: weights/values
  • N: size of training set

Here is the python code:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random

def generateSample(N, variance=100):
    X = np.matrix(range(N)).T + 1
    Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
    return X, Y

def fitModel_gradient(x, y):
    N = len(x)
    w = np.zeros((x.shape[1], 1))
    eta = 0.0001

    maxIteration = 100000
    for i in range(maxIteration):
        error = x * w - y
        gradient = x.T * error / N
        w = w - eta * gradient
    return w

def plotModel(x, y, w):
    plt.plot(x[:,1], y, "x")
    plt.plot(x[:,1], x * w, "r-")
    plt.show()

def test(N, variance, modelFunction):
    X, Y = generateSample(N, variance)
    X = np.hstack([np.matrix(np.ones(len(X))).T, X])
    w = modelFunction(X, Y)
    plotModel(X, Y, w)


test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)

🌐
GitHub
github.com › bhattbhavesh91 › gradient-descent-variants
GitHub - bhattbhavesh91/gradient-descent-variants: My implementation of Batch, Stochastic & Mini-Batch Gradient Descent Algorithm using Python
My implementation of Batch, Stochastic & Mini-Batch Gradient Descent Algorithm using Python - bhattbhavesh91/gradient-descent-variants
Starred by 21 users
Forked by 22 users
Languages   Jupyter Notebook 100.0% | Jupyter Notebook 100.0%
🌐
Stack Overflow
stackoverflow.com › questions › 47593225 › batch-gradient-descent-algorithm-implementation-in-python
machine learning - Batch gradient descent algorithm implementation in python - Stack Overflow
data = open('Data_trial.txt','r') import time lines=data.readlines() dataSet=[] for line in lines: dataSet.append(line.split()) original_output=[] features=[] for i in range(0,len(dataSet)): features.append([]) predict=[] grad=[] weights=[0,0,0,0,0] learning_factor=0.01 for i in range(0,len(dataSet)): for j in range(0,len(dataSet[i])): if j==0: original_output.append(float(dataSet[i][j])) else: features[i].append(float(dataSet[i][j])) def prediction(predict,weights,original_output,features): for count in range(0,len(original_output)): predict.append(sum(weights[i]*features[count][i] for i in r
🌐
Kenndanielso
kenndanielso.github.io › mlrefined › blog_posts › 13_Multilayer_perceptrons › 13_6_Stochastic_and_minibatch_gradient_descent.html
13.6 Stochastic and mini-batch gradient descent
Ideally we want all mini-batches to have the same size - a parameter we call the batch size - or be as equally-sized as possible when $J$ does not divide $P$. Notice, a batch size of $1$ turns mini-batch gradient descent into stochastic gradient descent, whereas a batch size of $P$ turns it into the standard or batch gradient descent. The code cell below contains Python implementation of the mini-batch gradient descent algorithm based on the standard gradient descent algorithm we saw previously in Chapter 6, where it is now slightly adjusted to take in the total number of data points as well as the size of each mini-batch via the input variables num_pts and batch_size, respectively.
🌐
Towards Data Science
towardsdatascience.com › home › latest › gradient descent, clearly explained in python, part 2: the compelling code.
Gradient Descent, clearly explained in Python, Part 2: The compelling code. | Towards Data Science
January 19, 2025 - Now, Gradient Descent comes in different versions, but the ones that you will come across the most are: ... We will now discuss, implement and analyse each of them in that order, so let’s begin! ... Batch Gradient Descent is probably the first type of Gradient Descent you will come across.