Scikit learn provides you two approaches to linear regression:

  1. LinearRegression object uses Ordinary Least Squares solver from scipy, as LR is one of two classifiers which have closed form solution. Despite the ML course - you can actually learn this model by just inverting and multiplicating some matrices.

  2. SGDRegressor which is an implementation of stochastic gradient descent, very generic one where you can choose your penalty terms. To obtain linear regression you choose loss to be L2 and penalty also to none (linear regression) or L2 (Ridge regression)

There is no "typical gradient descent" because it is rarely used in practise. If you can decompose your loss function into additive terms, then stochastic approach is known to behave better (thus SGD) and if you can spare enough memory - OLS method is faster and easier (thus first solution).

Answer from lejlot on Stack Overflow
🌐
scikit-learn
scikit-learn.org › stable › modules › sgd.html
1.5. Stochastic Gradient Descent — scikit-learn 1.8.0 documentation
Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector \(X\) to \([0,1]\) or \([-1,1]\), or standardize it to have mean \(0\) and variance \(1\). Note that the same scaling must be applied to the test vector to obtain meaningful results. This can be easily done using StandardScaler: from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X_train) # Don't cheat - fit only on training data X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # apply same transformation to test data # Or better yet: use a pipeline!
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.linear_model.SGDRegressor.html
SGDRegressor — scikit-learn 1.8.0 documentation
SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.linear_model.SGDClassifier.html
SGDClassifier — scikit-learn 1.8.0 documentation
This estimator implements regularized linear models with stochastic gradient descent (SGD) learning: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka ...
🌐
scikit-learn
scikit-learn.org › 0.15 › modules › sgd.html
1.3. Stochastic Gradient Descent — scikit-learn 0.15-git documentation
The class SGDClassifier implements a plain stochastic gradient descent learning routine which supports different loss functions and penalties for classification. As other classifiers, SGD has to be fitted with two arrays: an array X of size [n_samples, n_features] holding the training samples, ...
🌐
TutorialsPoint
tutorialspoint.com › scikit_learn › scikit_learn_stochastic_gradient_descent.htm
Scikit Learn - Stochastic Gradient Descent
Like other classifiers, Stochastic Gradient Descent (SGD) has to be fitted with following two arrays − · An array X holding the training samples. It is of size [n_samples, n_features]. An array Y holding the target values i.e. class labels for the training samples. It is of size [n_samples]. ... import numpy as np from sklearn import linear_model X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]]) Y = np.array([1, 1, 2, 2]) SGDClf = linear_model.SGDClassifier(max_iter = 1000, tol=1e-3,penalty = "elasticnet") SGDClf.fit(X, Y)
🌐
Simplilearn
simplilearn.com › home › resources › data science & business analytics › stochastic gradient descent in sklearn and other types of gradient descent
Scikit Learn: Stochastic Gradient Descent (Complete Guide) | Sklearn Tutorial
February 14, 2026 - The Stochastic Gradient Descent classifier class in the Scikit-learn API is utilized to carry out the SGD approach for classification issues. But, how they work? Let's discuss.
Address   5851 Legacy Circle, 6th Floor, Plano, TX 75024 United States
🌐
Python Guides
pythonguides.com › scikit-learn-gradient-descent
Scikit-Learn Gradient Descent
July 8, 2025 - Learn to implement and optimize Gradient Descent using Scikit-Learn in Python. A step-by-step guide with practical examples tailored for USA-based data projects
Find elsewhere
🌐
Medium
medium.com › @agrofail › ml-fundamentals-scikit-learns-linear-regression-gradient-descent-from-scratch-78e27adbaab5
ML Fundamentals: Scikit Learn’s Linear Regression + Gradient Descent From Scratch | by Alexander Rofail | Medium
October 30, 2021 - In short, we want to compute the intercept and coefficients (thetas) using our training set, and then apply those parameters to the test set, then we can find the errors of our predicted value vs actual value of Y, and optimize our parameters using gradient descent. ... The test_size parameter here indicates we want to reserve 30% of our data for testing. The output from our above code snippet shows that x_train.shape is (30,2) and y_train.shape is (30,1) indicating that our training features are 30 rows and two columns, and our training targets are 30 rows and one column. So now we can actually compute our parameters. There are a couple different avenues we can go down. We can use sklearn’s built in LinearRegression() class and functions or we can do it from scratch with numpy.
🌐
scikit-learn
scikit-learn.org › 1.5 › modules › sgd.html
1.5. Stochastic Gradient Descent — scikit-learn 1.5.2 documentation
Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results. This can be easily done using StandardScaler: from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X_train) # Don't cheat - fit only on training data X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) # apply same transformation to test data # Or better yet: use a pipeline!
🌐
Codecademy
codecademy.com › docs › python:sklearn › stochastic gradient descent
Python:Sklearn | Stochastic Gradient Descent | Codecademy
December 22, 2024 - Unlike traditional gradient descent, which calculates the gradient using the entire dataset, SGD computes the gradient using a single training example at a time. This makes it computationally efficient for large datasets. Sklearn provides two primary classes for implementing SGD:
🌐
LinkedIn
linkedin.com › pulse › gradient-descent-demystified-code-using-samar-srivastava
Gradient Descent | Demystified - with code using scikit-learn
May 15, 2020 - A lot of you must have heard about Gradient Descent during your Machine Learning journey. Let's take a closer look at what is gradient descent and how does it work. We will be writing code using scikit-learn. So, I assume that the readers are well verse with the Python programming language.
🌐
Medium
nitin9809.medium.com › linear-regression-with-gradient-descent-maths-implementation-and-example-using-scikit-learn-1ed1ed3440cc
Linear Regression with Gradient Descent Maths, Implementation and Example Using Scikit-Learn | by Nitin | Medium
April 20, 2020 - We can apply the gradient descent algorithm using the scikit learn library. It provides us with SGDClassfier and SGDRegressor algorithms. Since this is a Linear Regression tutorial I will show you how to use SGDRegressor to make predictions.
🌐
GeeksforGeeks
geeksforgeeks.org › stochastic-gradient-descent-regressor-using-scikit-learn
Stochastic Gradient Descent Regressor using Scikit-learn - GeeksforGeeks
May 18, 2024 - Unlike traditional gradient descent, which computes the gradient of the cost function using the entire dataset, stochastic gradient descent updates the model parameters iteratively using each training example. ... We will use the diabetes dataset to build and evaluate a linear regression model using SGD. ... from sklearn.datasets import load_diabetes import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split
🌐
scikit-learn
scikit-learn.org › 1.5 › modules › generated › sklearn.linear_model.SGDRegressor.html
SGDRegressor — scikit-learn 1.5.2 documentation
SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
🌐
O'Reilly
oreilly.com › library › view › hands-on-machine-learning › 9781491962282 › ch04.html
Training Models - Hands-On Machine Learning with Scikit-Learn and TensorFlow [Book]
March 30, 2017 - Using an iterative optimization approach, called Gradient Descent (GD), that gradually ...
Author   Aurélien Géron
Published   2017
Pages   572
🌐
LabEx
labex.io › tutorials › implementing-stochastic-gradient-descent-71102
Stochastic Gradient Descent | Machine Learning | Python | LabEx
Learn how to implement Stochastic Gradient Descent (SGD), a popular optimization algorithm used in machine learning, using Python and scikit-learn.
🌐
Bogotobogo
bogotobogo.com › python › scikit-learn › scikit-learn_batch-gradient-descent-versus-stochastic-gradient-descent.php
scikit-learn: Batch gradient descent versus stochastic gradient descent - 2020
To find the weights that minimize our cost function, we can use optimization algorithm called gradient descent: picture source: Python Machine Learning by Sebastian Raschka