🌐
scikit-learn
scikit-learn.org › stable › modules › sgd.html
1.5. Stochastic Gradient Descent — scikit-learn 1.8.0 documentation
The advantages of Stochastic Gradient Descent are: Efficiency. Ease of implementation (lots of opportunities for code tuning). The disadvantages of Stochastic Gradient Descent include:
🌐
VitalFlux
vitalflux.com › home › data science › stochastic gradient descent python example
Stochastic Gradient Descent Python Example - Analytics Yogi
April 20, 2022 - Another advantage of SGD is that it is relatively easy to implement, which has made it one of the most popular learning. SGD is also efficient in terms of storage, as only a small number of samples need to be stored in memory at each iteration. Here is the Python code which represents the learning of weights (or weight updation) after each training example.
🌐
PyImageSearch
pyimagesearch.com › home › blog › stochastic gradient descent (sgd) with python
Stochastic Gradient Descent (SGD) with Python - PyImageSearch
May 1, 2021 - Learn how to implement the Stochastic Gradient Descent (SGD) algorithm in Python for machine learning, neural networks, and deep learning.
🌐
GitHub
github.com › CU-UQ › SGD
GitHub - CU-UQ/SGD: Implementation of Stochastic Gradient Descent algorithms in Python (cite https://doi.org/10.1007/s00158-020-02599-z)
Implementation of Stochastic Gradient Descent algorithms in Python (cite https://doi.org/10.1007/s00158-020-02599-z) - CU-UQ/SGD
Starred by 11 users
Forked by 2 users
Languages   Python 100.0% | Python 100.0%
🌐
GitHub
github.com › arsenyturin › SGD-From-Scratch
GitHub - arsenyturin/SGD-From-Scratch: Stochastic gradient descent from scratch for linear regression · GitHub
In the function below I made possible to change sample size (batch_size), because sometimes its better to use more than one sample at a time. def SGD(X, y, lr=0.05, epoch=10, batch_size=1): ''' Stochastic Gradient Descent for a single feature ...
Starred by 41 users
Forked by 17 users
Languages   Jupyter Notebook
🌐
Real Python
realpython.com › gradient-descent-algorithm-python
Stochastic Gradient Descent Algorithm With Python and NumPy – Real Python
October 21, 2023 - Python has the built-in random module, and NumPy has its own random generator. The latter is more convenient when you work with arrays. You’ll create a new function called sgd() that is very similar to gradient_descent() but uses randomly selected minibatches to move along the search space:
🌐
Medium
medium.com › @nikhilparmar9 › simple-sgd-implementation-in-python-for-linear-regression-on-boston-housing-data-f63fcaaecfb1
Simple SGD implementation in Python for Linear Regression on Boston Housing Data | by Nikhil Parmar | Medium
December 16, 2019 - Hello Folks, in this article we will build our own Stochastic Gradient Descent (SGD) from scratch in Python and then we will use it for Linear Regression on Boston Housing Dataset.
Top answer
1 of 1
6

There is only one small difference between gradient descent and stochastic gradient descent. Gradient descent calculates the gradient based on the loss function calculated across all training instances, whereas stochastic gradient descent calculates the gradient based on the loss in batches. Both of these techniques are used to find optimal parameters for a model.

Let us try to implement SGD on this 2D dataset.

The algorithm

The dataset has 2 features, however we will want to add a bias term so we append a column of ones to the end of the data matrix.

shape = x.shape 
x = np.insert(x, 0, 1, axis=1)

Then we initialize our weights, there are many strategies to do this. For simplicity I will set them all to 1 however setting the initial weights randomly is probably better in order to be able to use multiple restarts.

w = np.ones((shape[1]+1,))

Our initial line looks like this

Now we will iteratively update the weights of the model if it mistakenly classifies an example.

for ix, i in enumerate(x):
   pred = np.dot(i,w)
   if pred > 0: pred = 1
   elif pred < 0: pred = -1
   if pred != y[ix]:
      w = w - learning_rate * pred * i

This line is the weight update w = w - learning_rate * pred * i.

We can see that doing this process continuously will lead to convergence.

After 10 epochs

After 20 epochs

After 50 epochs

After 100 epochs

And finally,


The code

The dataset for this code can be found here.

The function which will train the weights takes in the feature matrix $x$ and the targets $y$. It returns the trained weights $w$ and a list of historical weights encountered throughout the training process.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

def get_weights(x, y, verbose = 0):
    shape = x.shape
    x = np.insert(x, 0, 1, axis=1)
    w = np.ones((shape[1]+1,))
    weights = []

    learning_rate = 10
    iteration = 0
    loss = None
    while iteration <= 1000 and loss != 0:
        for ix, i in enumerate(x):
            pred = np.dot(i,w)
            if pred > 0: pred = 1
            elif pred < 0: pred = -1
            if pred != y[ix]:
                w = w - learning_rate * pred * i
            weights.append(w)    
            if verbose == 1:
                print('X_i = ', i, '    y = ', y[ix])
                print('Pred: ', pred )
                print('Weights', w)
                print('------------------------------------------')


        loss = np.dot(x, w)
        loss[loss<0] = -1
        loss[loss>0] = 1
        loss = np.sum(loss - y )

        if verbose == 1:
            print('------------------------------------------')
            print(np.sum(loss - y ))
            print('------------------------------------------')
        if iteration%10 == 0: learning_rate = learning_rate / 2
        iteration += 1    
    print('Weights: ', w)
    print('Loss: ', loss)
    return w, weights

We will apply this SGD to our data in perceptron.csv.

df = np.loadtxt("perceptron.csv", delimiter = ',')
x = df[:,0:-1]
y = df[:,-1]

print('Dataset')
print(df, '\n')

w, all_weights = get_weights(x, y)
x = np.insert(x, 0, 1, axis=1)

pred = np.dot(x, w)
pred[pred > 0] =  1
pred[pred < 0] = -1
print('Predictions', pred)

Let's plot the decision boundary

x1 = np.linspace(np.amin(x[:,1]),np.amax(x[:,2]),2)
x2 = np.zeros((2,))
for ix, i in enumerate(x1):
    x2[ix] = (-w[0] - w[1]*i) / w[2]

plt.scatter(x[y>0][:,1], x[y>0][:,2], marker = 'x')
plt.scatter(x[y<0][:,1], x[y<0][:,2], marker = 'o')
plt.plot(x1,x2)
plt.title('Perceptron Seperator', fontsize=20)
plt.xlabel('Feature 1 ($x_1$)', fontsize=16)
plt.ylabel('Feature 2 ($x_2$)', fontsize=16)
plt.show()

To see the training process you can print the weights as they changed through the epochs.

for ix, w in enumerate(all_weights):
    if ix % 10 == 0:
        print('Weights:', w)
        x1 = np.linspace(np.amin(x[:,1]),np.amax(x[:,2]),2)
        x2 = np.zeros((2,))
        for ix, i in enumerate(x1):
            x2[ix] = (-w[0] - w[1]*i) / w[2]
        print('$0 = ' + str(-w[0]) + ' - ' + str(w[1]) + 'x_1'+ ' - ' + str(w[2]) + 'x_2$')

        plt.scatter(x[y>0][:,1], x[y>0][:,2], marker = 'x')
        plt.scatter(x[y<0][:,1], x[y<0][:,2], marker = 'o')
        plt.plot(x1,x2)
        plt.title('Perceptron Seperator', fontsize=20)
        plt.xlabel('Feature 1 ($x_1$)', fontsize=16)
        plt.ylabel('Feature 2 ($x_2$)', fontsize=16)
        plt.show()
🌐
Kaggle
kaggle.com › code › marissafernandes › linear-regression-with-sgd-in-python-from-scratch
Linear Regression with SGD in Python from scratch
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
Find elsewhere
🌐
Medium
medium.com › biased-algorithms › stochastic-gradient-descent-from-scratch-in-python-81a1a71615cb
Stochastic Gradient Descent from Scratch in Python | by Amit Yadav | Biased-Algorithms | Medium
April 18, 2025 - Similarly, in SGD, we initialize the weights and biases randomly. Here’s the deal: when these weights are initialized randomly, they’ll be tweaked during training to fit the data as accurately as possible. For a linear regression problem, these weights determine the slope of your line, and the bias adjusts the line’s intercept. In Python, you can initialize these using random values from a normal distribution or just small random numbers.
🌐
CodeSignal
codesignal.com › learn › courses › gradient-descent-building-optimization-algorithms-from-scratch › lessons › stochastic-gradient-descent-theory-and-implementation-in-python
Stochastic Gradient Descent: Theory and Implementation ...
... This plot visualizes the implementation of SGD on a simple linear regression problem, showcasing the resulting model. ... Today's lesson unveiled critical aspects of the Stochastic Gradient Descent algorithm. We explored its significance, advantages, disadvantages, mathematical formulation, ...
🌐
Medium
medium.com › @dhirendrachoudhary_96193 › stochastic-gradient-descent-in-python-a-complete-guide-for-ml-optimization-c140de6119dc
Stochastic Gradient Descent in Python: A Complete Guide for ML Optimization | by Dhirendra Choudhary | Medium
November 29, 2024 - Stochastic Gradient Descent (SGD) is a cornerstone technique in machine learning optimization. This guide will walk you through the essentials of SGD, providing you with both theoretical insights and practical Python implementations.
🌐
Stack Overflow
stackoverflow.com › questions › 48843721 › python-gd-and-sgd-implementation-on-linear-regression
machine learning - Python, GD and SGD implementation on Linear Regression - Stack Overflow
Sign up to request clarification or add additional context in comments. ... Model = linear_model.SGDRegressor(learning_rate = 'constant', alpha = 0, eta0 = 0.0001, shuffle=True, max_iter = 100000) My mistake!! Now I set it right, but again I get a lot better results: RMSE: 10.753194242863968, RMSE: 11.347666806771018, RMSE: 13.527890454048752, RMSE: 12.67379069336345, RMSE: 11.070171781078658...
🌐
GeeksforGeeks
geeksforgeeks.org › python › stochastic-gradient-descent-classifier
Stochastic Gradient Descent Classifier - GeeksforGeeks
July 23, 2025 - In summary, the Stochastic Gradient Descent (SGD) Classifier in Python is a versatile optimization algorithm that underpins a wide array of machine learning applications. By efficiently updating model parameters using random subsets of data, ...
🌐
Kaggle
kaggle.com › code › marissafernandes › logistic-regression-sgd-in-python-from-scratch
Logistic Regression + SGD in Python from scratch
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Sebastianvauth
sebastianvauth.github.io › gradient_descent_lesson_7_coding_implementing_mini_batch_sgd_in_python
Lesson 7 - Coding Lesson: Implementing Mini-Batch SGD in Python
Code commanders, prepare for an upgrade! 🚀 In this coding lesson, we're taking our Gradient Descent implementation to the next level by coding Mini-Batch Stochastic Gradient Descent (SGD) in Python! You'll build upon your Batch GD code from Lesson 3, adding the crucial elements of mini-batches ...
🌐
DataCamp
datacamp.com › tutorial › stochastic-gradient-descent
Stochastic Gradient Descent in Python: A Complete Guide for ML Optimization | DataCamp
July 24, 2024 - One of the most popular algorithms for doing this process is called Stochastic Gradient Descent (SGD). In this tutorial, you will learn everything you should know about the algorithm, including some initial intuition without the math, the mathematical details, and how to implement it in Python.
🌐
Medium
medium.com › @rebirth4vali › stochastic-gradient-descent-sgd-from-scratch-in-python-661480ddf5fa
Stochastic Gradient Descent (SGD) from scratch in Python | by Sadak Vali | Medium
March 26, 2023 - SGD is an iterative optimization algorithm that aims to minimize a cost function by updating the model parameters in the opposite direction of the gradient of the cost function. The cost function represents the difference between the predicted ...
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › ml-stochastic-gradient-descent-sgd
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
In each epoch, the data is shuffled and for each mini-batch (or single sample), the gradient is calculated and the parameters are updated. The cost is calculated as the mean squared error and the history of the cost is recorded to monitor convergence. Python · def sgd(X, y, learning_rate=0.1, epochs=1000, batch_size=1): m = len(X) theta = np.random.randn(2, 1) X_bias = np.c_[np.ones((m, 1)), X] cost_history = [] for epoch in range(epochs): indices = np.random.permutation(m) X_shuffled = X_bias[indices] y_shuffled = y[indices] for i in range(0, m, batch_size): X_batch = X_shuffled[i:i + batch_
Published   September 30, 2025