scikit-learn.org › stable › modules › sgd.html

1.5. Stochastic Gradient Descent — scikit-learn 1.8.0 documentation

The advantages of Stochastic Gradient Descent are: Efficiency. Ease of implementation (lots of opportunities for code tuning). The disadvantages of Stochastic Gradient Descent include:

github.com › CU-UQ › SGD

GitHub - CU-UQ/SGD: Implementation of Stochastic Gradient Descent algorithms in Python (cite https://doi.org/10.1007/s00158-020-02599-z) · GitHub

Implementation of Stochastic Gradient Descent algorithms in Python (cite https://doi.org/10.1007/s00158-020-02599-z) - CU-UQ/SGD

Starred by 11 users

Forked by 2 users

Languages Python

Videos

L25/4 Minibatch SGD in Python - YouTube

Machine Learning Tutorial Python - 4: Gradient Descent and Cost ...

Gradient Descent Implemented in Python - YouTube

Mini batch gradient descent implementation from scratch in python ...

September 26, 2020

Stochastic gradient descent code from scratch in python - YouTube

September 24, 2020

Gradient Descent Implementation from Scratch in Python - YouTube

January 21, 2019

pyimagesearch.com › home › blog › stochastic gradient descent (sgd) with python

Stochastic Gradient Descent (SGD) with Python - PyImageSearch

May 1, 2021 - Learn how to implement the Stochastic Gradient Descent (SGD) algorithm in Python for machine learning, neural networks, and deep learning.

vitalflux.com › home › data science › stochastic gradient descent python example

Stochastic Gradient Descent Python Example - Analytics Yogi

April 20, 2022 - Another advantage of SGD is that it is relatively easy to implement, which has made it one of the most popular learning. SGD is also efficient in terms of storage, as only a small number of samples need to be stored in memory at each iteration. Here is the Python code which represents the learning of weights (or weight updation) after each training example.

github.com › arsenyturin › SGD-From-Scratch

GitHub - arsenyturin/SGD-From-Scratch: Stochastic gradient descent from scratch for linear regression · GitHub

In the function below I made possible to change sample size (batch_size), because sometimes its better to use more than one sample at a time. def SGD(X, y, lr=0.05, epoch=10, batch_size=1): ''' Stochastic Gradient Descent for a single feature ...

Starred by 42 users

Forked by 17 users

Languages Jupyter Notebook

realpython.com › gradient-descent-algorithm-python

Stochastic Gradient Descent Algorithm With Python and NumPy – Real Python

October 21, 2023 - Python has the built-in random module, and NumPy has its own random generator. The latter is more convenient when you work with arrays. You’ll create a new function called sgd() that is very similar to gradient_descent() but uses randomly selected minibatches to move along the search space:

medium.com › biased-algorithms › stochastic-gradient-descent-from-scratch-in-python-81a1a71615cb

Stochastic Gradient Descent from Scratch in Python | by Amit Yadav | Biased-Algorithms | Medium

April 18, 2025 - Similarly, in SGD, we initialize the weights and biases randomly. Here’s the deal: when these weights are initialized randomly, they’ll be tweaked during training to fit the data as accurately as possible. For a linear regression problem, these weights determine the slope of your line, and the bias adjusts the line’s intercept. In Python, you can initialize these using random values from a normal distribution or just small random numbers.

medium.com › @nikhilparmar9 › simple-sgd-implementation-in-python-for-linear-regression-on-boston-housing-data-f63fcaaecfb1

Simple SGD implementation in Python for Linear Regression on Boston Housing Data | by Nikhil Parmar | Medium

December 16, 2019 - Hello Folks, in this article we will build our own Stochastic Gradient Descent (SGD) from scratch in Python and then we will use it for Linear Regression on Boston Housing Dataset.

datascience.stackexchange.com › questions › 30786 › implementation-of-stochastic-gradient-descent-in-python

linear regression - Implementation of Stochastic Gradient Descent in Python - Data Science Stack Exchange

There is only one small difference between gradient descent and stochastic gradient descent. Gradient descent calculates the gradient based on the loss function calculated across all training instances, whereas stochastic gradient descent calculates the gradient based on the loss in batches. Both of these techniques are used to find optimal parameters for a model.

Let us try to implement SGD on this 2D dataset.

The algorithm

The dataset has 2 features, however we will want to add a bias term so we append a column of ones to the end of the data matrix.

shape = x.shape 
x = np.insert(x, 0, 1, axis=1)

Then we initialize our weights, there are many strategies to do this. For simplicity I will set them all to 1 however setting the initial weights randomly is probably better in order to be able to use multiple restarts.

w = np.ones((shape[1]+1,))

Our initial line looks like this

Now we will iteratively update the weights of the model if it mistakenly classifies an example.

for ix, i in enumerate(x):
   pred = np.dot(i,w)
   if pred > 0: pred = 1
   elif pred < 0: pred = -1
   if pred != y[ix]:
      w = w - learning_rate * pred * i

This line is the weight update w = w - learning_rate * pred * i.

We can see that doing this process continuously will lead to convergence.

After 10 epochs

After 20 epochs

After 50 epochs

After 100 epochs

And finally,

The code

The dataset for this code can be found here.

The function which will train the weights takes in the feature matrix $x$ and the targets $y$. It returns the trained weights $w$ and a list of historical weights encountered throughout the training process.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

def get_weights(x, y, verbose = 0):
    shape = x.shape
    x = np.insert(x, 0, 1, axis=1)
    w = np.ones((shape[1]+1,))
    weights = []

    learning_rate = 10
    iteration = 0
    loss = None
    while iteration <= 1000 and loss != 0:
        for ix, i in enumerate(x):
            pred = np.dot(i,w)
            if pred > 0: pred = 1
            elif pred < 0: pred = -1
            if pred != y[ix]:
                w = w - learning_rate * pred * i
            weights.append(w)    
            if verbose == 1:
                print('X_i = ', i, '    y = ', y[ix])
                print('Pred: ', pred )
                print('Weights', w)
                print('------------------------------------------')


        loss = np.dot(x, w)
        loss[loss<0] = -1
        loss[loss>0] = 1
        loss = np.sum(loss - y )

        if verbose == 1:
            print('------------------------------------------')
            print(np.sum(loss - y ))
            print('------------------------------------------')
        if iteration%10 == 0: learning_rate = learning_rate / 2
        iteration += 1    
    print('Weights: ', w)
    print('Loss: ', loss)
    return w, weights

We will apply this SGD to our data in perceptron.csv.

df = np.loadtxt("perceptron.csv", delimiter = ',')
x = df[:,0:-1]
y = df[:,-1]

print('Dataset')
print(df, '\n')

w, all_weights = get_weights(x, y)
x = np.insert(x, 0, 1, axis=1)

pred = np.dot(x, w)
pred[pred > 0] =  1
pred[pred < 0] = -1
print('Predictions', pred)

Let's plot the decision boundary

x1 = np.linspace(np.amin(x[:,1]),np.amax(x[:,2]),2)
x2 = np.zeros((2,))
for ix, i in enumerate(x1):
    x2[ix] = (-w[0] - w[1]*i) / w[2]

plt.scatter(x[y>0][:,1], x[y>0][:,2], marker = 'x')
plt.scatter(x[y<0][:,1], x[y<0][:,2], marker = 'o')
plt.plot(x1,x2)
plt.title('Perceptron Seperator', fontsize=20)
plt.xlabel('Feature 1 ($x_1$)', fontsize=16)
plt.ylabel('Feature 2 ($x_2$)', fontsize=16)
plt.show()

To see the training process you can print the weights as they changed through the epochs.

for ix, w in enumerate(all_weights):
    if ix % 10 == 0:
        print('Weights:', w)
        x1 = np.linspace(np.amin(x[:,1]),np.amax(x[:,2]),2)
        x2 = np.zeros((2,))
        for ix, i in enumerate(x1):
            x2[ix] = (-w[0] - w[1]*i) / w[2]
        print('$0 = ' + str(-w[0]) + ' - ' + str(w[1]) + 'x_1'+ ' - ' + str(w[2]) + 'x_2$')

        plt.scatter(x[y>0][:,1], x[y>0][:,2], marker = 'x')
        plt.scatter(x[y<0][:,1], x[y<0][:,2], marker = 'o')
        plt.plot(x1,x2)
        plt.title('Perceptron Seperator', fontsize=20)
        plt.xlabel('Feature 1 ($x_1$)', fontsize=16)
        plt.ylabel('Feature 2 ($x_2$)', fontsize=16)
        plt.show()

Find elsewhere

Google Bing Mojeek

kaggle.com › code › marissafernandes › linear-regression-with-sgd-in-python-from-scratch

Linear Regression with SGD in Python from scratch

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

codesignal.com › learn › courses › gradient-descent-building-optimization-algorithms-from-scratch › lessons › stochastic-gradient-descent-theory-and-implementation-in-python

Stochastic Gradient Descent: Theory and Implementation ...

... This plot visualizes the implementation of SGD on a simple linear regression problem, showcasing the resulting model. ... Today's lesson unveiled critical aspects of the Stochastic Gradient Descent algorithm. We explored its significance, advantages, disadvantages, mathematical formulation, ...

medium.com › @dhirendrachoudhary_96193 › stochastic-gradient-descent-in-python-a-complete-guide-for-ml-optimization-c140de6119dc

Stochastic Gradient Descent in Python: A Complete Guide for ML Optimization | by Dhirendra Choudhary | Medium

November 29, 2024 - Stochastic Gradient Descent (SGD) is a cornerstone technique in machine learning optimization. This guide will walk you through the essentials of SGD, providing you with both theoretical insights and practical Python implementations.

stackoverflow.com › questions › 48843721 › python-gd-and-sgd-implementation-on-linear-regression

machine learning - Python, GD and SGD implementation on Linear Regression - Stack Overflow

Sign up to request clarification or add additional context in comments. ... Model = linear_model.SGDRegressor(learning_rate = 'constant', alpha = 0, eta0 = 0.0001, shuffle=True, max_iter = 100000) My mistake!! Now I set it right, but again I get a lot better results: RMSE: 10.753194242863968, RMSE: 11.347666806771018, RMSE: 13.527890454048752, RMSE: 12.67379069336345, RMSE: 11.070171781078658...

geeksforgeeks.org › python › stochastic-gradient-descent-classifier

Stochastic Gradient Descent Classifier - GeeksforGeeks

July 23, 2025 - In summary, the Stochastic Gradient Descent (SGD) Classifier in Python is a versatile optimization algorithm that underpins a wide array of machine learning applications. By efficiently updating model parameters using random subsets of data, ...

datacamp.com › tutorial › stochastic-gradient-descent

Stochastic Gradient Descent in Python: A Complete Guide for ML Optimization | DataCamp

July 24, 2024 - One of the most popular algorithms for doing this process is called Stochastic Gradient Descent (SGD). In this tutorial, you will learn everything you should know about the algorithm, including some initial intuition without the math, the mathematical details, and how to implement it in Python.

medium.com › @rebirth4vali › stochastic-gradient-descent-sgd-from-scratch-in-python-661480ddf5fa

Stochastic Gradient Descent (SGD) from scratch in Python | by Sadak Vali | Medium

March 26, 2023 - SGD is an iterative optimization algorithm that aims to minimize a cost function by updating the model parameters in the opposite direction of the gradient of the cost function. The cost function represents the difference between the predicted ...

kaggle.com › code › marissafernandes › logistic-regression-sgd-in-python-from-scratch

Logistic Regression + SGD in Python from scratch

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

gist.github.com › RaghavPrabhu › 33a0fa841a40338cc351862c9e9c9b6c

Stochastic Gradient Descent (SGD) Algorithm Python Implementation · GitHub

May 13, 2018 - Stochastic Gradient Descent (SGD) Algorithm Python Implementation - SGD.py

geeksforgeeks.org › machine learning › ml-stochastic-gradient-descent-sgd

ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks

In each epoch, the data is shuffled and for each mini-batch (or single sample), the gradient is calculated and the parameters are updated. The cost is calculated as the mean squared error and the history of the cost is recorded to monitor convergence. Python · def sgd(X, y, learning_rate=0.1, epochs=1000, batch_size=1): m = len(X) theta = np.random.randn(2, 1) X_bias = np.c_[np.ones((m, 1)), X] cost_history = [] for epoch in range(epochs): indices = np.random.permutation(m) X_shuffled = X_bias[indices] y_shuffled = y[indices] for i in range(0, m, batch_size): X_batch = X_shuffled[i:i + batch_

Published September 30, 2025

tensorflow.org › tensorflow v2.16.1 › tf.keras.optimizers.sgd

tf.keras.optimizers.SGD | TensorFlow v2.16.1

Gradient descent (with momentum) optimizer.