logistic regression l2 regularization

Math behind L2 Regularization for Logistic Regression

datascience.stackexchange.com › questions › 27258 › math-behind-l2-regularization-for-logistic-regression

Your question is really about the method of Lagrange multipliers in constrained optimization, not logistic regression per se. The gist of it is that a constrained optimization problem can be recast as an unconstrained optimization problem by adding a term, called the regularizer, and vice versa. The sphere comes from recasting the unconstrained problem into a constrained one; recall that a constant $\text{[math]}$ norm defines a hypersphere.

Answer from Emre on Stack Exchange

Medium

medium.com › @bneeraj026 › logistic-regression-with-l2-regularization-from-scratch-1bbb078f1e88

Logistic Regression with L2 Regularization from scratch | by Neeraj Bhatt | Medium

September 7, 2023 - Logistic Regression in many cases serves as a good baseline model that can be used as a benchmark to evaluate all subsequent Machine Learning models. As we saw its simple to implement, highly interpretable, and can handle cases like outliers & overfitting with L2 regularization.

Google

developers.google.com › machine learning › overfitting: l2 regularization

Overfitting: L2 regularization | Machine Learning | Google for Developers

April 9, 2026 - L2 regularization is a technique used to reduce model complexity and prevent overfitting by penalizing large weights.

Discussions

Why do we need regularisation (L2 or L1 norm) in logistic regression?

Regularization doesn't necessarily have anything to do with improving your accuracy or minimizing your loss. Rather, regularization is generally effective in situations where the coefficients exhibit high variance, since these cases are normally dominated by the noise of the data. Regularization allows us to bias our model towards "rules of thumb" rather than blindly following statistical noise, which can be desirable in some contexts depending on the goals. High variance in the coefficients usually comes from either (1) very noisy data, or (2) correlation between variables. In either case, the loss landscape of the coefficients exhibits a very "broad" minima, instead of a "sharp" minima which is more stable under small perturbations. Regularization is a way of "sharpening" the broad minima, injecting stability through bias. Instead of letting the exact minimum value be dominated by random error (which is the case in high-variance contexts), we bias the value according to what basically amounts as rules-of-thumb. L1 regularization biases towards sparsity, selecting the smallest set of coefficients that achieve similar accuracy (and selecting the smallest-valued coefficients among sets of equal size). L2 regularization is the opposite, "spreading out" the predictive weights among the coefficients as much as possible. Illustrative simple example using linear regression: suppose X1 and X2 are highly correlated in a 2:1 ratio. If the true model is y = 3X1 + 2X2, this will look a whole lot like (X1 + 6X2), or (2X1 + 4X2), etc, allowing basically all coefficient sets that satisfy 2*B1 + B2 = 8 (depending on the degree of collinearity). This is because the high degree of collinearity creates a "taco" shape in the loss landscape, relating model loss as a function of B1 and B2. In this example L1 regularization would end up choosing y = 4X1 (retaining the smallest coefficient of the correlated set), while L2 regularization would end up choosing y = 8/3 X1 + 8/3 X2 (balancing B1 and B2 values). More on reddit.com

r/learnmachinelearning

2

8

December 1, 2021

machine learning - Math behind L2 Regularization for Logistic Regression - Data Science Stack Exchange

I read that L2 regularization in logistic regression creates a sort of sphere that limits the choice of $w$ weight, but why does this happen? More on datascience.stackexchange.com

datascience.stackexchange.com

January 30, 2018

Implementing logistic regression with L2 regularization in Matlab - Stack Overflow

Matlab has built in logistic regression using mnrfit, however I need to implement a logistic regression with L2 regularization. I'm completely at a loss at how to proceed. I've found some good pap... More on stackoverflow.com

stackoverflow.com

[D] L1 vs L2 regularization

The goal of both are to reduce the size of your coefficients, keeping them small to avoid/reduce overfitting. L2 regularization puts more emphasis on punishing larger coefficients, which will also reduce the chance that there is just a small subset of features that very disproportionally control most of the output. L1 regularization is often also though of as a form of feature selection, and it actually pushes the coefficients of less impactful feature towards 0. This can be very useful if you want to have sparse models for either serving faster predictions, and/or when memory is a concern (e.g. when deploying to small remote IoT devices that can't store a bunch of parameters in RAM). More on reddit.com

r/statistics

26

20

May 29, 2020

Videos

youtube.com

Regularization in Logistic Regression | L1 & L2 Regularization ...

February 7, 2026

04:52

YouTube

L2 regularized logistic regression - YouTube

May 13, 2019

youtube.com

Ridge Regression - L2 regularization

24:25

YouTube

14 Regularization Techniques in Logistic Regression Models - YouTube

February 8, 2024

m.youtube.com

Machine Learning Tutorial Python - 17: L1 and L2 ...

21:14

YouTube

Regulaziation in Machine Learning | L1 and L2 Regularization | ...

April 19, 2022

View all

Compgenomr

compgenomr.github.io › book › logistic-regression-and-regularization.html

5.13 Logistic regression and regularization | Computational Genomics with R

Therefore these types of methods within the framework of regression are also called “shrinkage” methods or “penalized regression” methods. One way to ensure shrinkage is to add the penalty term, $\lambda\sum{\beta_j}^2$, to the loss function. This penalty term is also known as the L2 norm or L2 penalty.

CodeSignal

codesignal.com › learn › courses › fixing-classical-models-diagnosis-regularization › lessons › tuning-l2-regularization-in-logistic-regression

Tuning L2 Regularization in Logistic Regression

L2 regularization works by adding a penalty to the loss function that discourages large coefficient values. The result is a model that favors simpler explanations and is less likely to overfit the training data. Example: Training and Evaluating with Different C Values · Let’s train multiple ...

reddit.com › r/learnmachinelearning › why do we need regularisation (l2 or l1 norm) in logistic regression?

r/learnmachinelearning on Reddit: Why do we need regularisation (L2 or L1 norm) in logistic regression?

December 1, 2021 -

As I was revising through my logistic regression notes and came around the loss minimization interpretation of logistic regression which is:

argmin(w) log(1 + exp(-Zi)) + 1/2lambda||w||² where Zi = Yi.Wi.Xi summation i : 1->n

I know that, the L2 regularisation as used in the above optimization function is used to find a balance between a good seperating hyperplane (decision surface) and weight coefficients that are not too large (tending to infinity) to be overestimated. I can't seem to intuitively understand as to how regularisation is working to balance the weight coefficients to avoid overfitting/underfitting? Also I might be having a misunderstanding here but in the loss function optimization part of the expression, if we consider that we are not using any regularisation, then ideally to minimise the loss function, For points that are correctly seperated, the weights corresponding to features should tend to infinity such the value of Zi tends to infinity which results in log(1 + exp(-Zi)) tending to 0 so we are minimizing the sum over correctly classified points but for the same plane with infinitely big weights, if a point comes out to be incorrectly classified it's loss function value will tend to infinity which makes it working against the optimisation problem. So accordingly the weights should get readjusted to smaller values, such that the sum of loss is minimized, without the need of a regularisation term. So I am really very confused as do we even need regularisation in logistic regression, if yes, how regularisation term in the expression is working towards balancing the weights?

Top answer

1 of 2

2

Regularization doesn't necessarily have anything to do with improving your accuracy or minimizing your loss. Rather, regularization is generally effective in situations where the coefficients exhibit high variance, since these cases are normally dominated by the noise of the data. Regularization allows us to bias our model towards "rules of thumb" rather than blindly following statistical noise, which can be desirable in some contexts depending on the goals. High variance in the coefficients usually comes from either (1) very noisy data, or (2) correlation between variables. In either case, the loss landscape of the coefficients exhibits a very "broad" minima, instead of a "sharp" minima which is more stable under small perturbations. Regularization is a way of "sharpening" the broad minima, injecting stability through bias. Instead of letting the exact minimum value be dominated by random error (which is the case in high-variance contexts), we bias the value according to what basically amounts as rules-of-thumb. L1 regularization biases towards sparsity, selecting the smallest set of coefficients that achieve similar accuracy (and selecting the smallest-valued coefficients among sets of equal size). L2 regularization is the opposite, "spreading out" the predictive weights among the coefficients as much as possible. Illustrative simple example using linear regression: suppose X1 and X2 are highly correlated in a 2:1 ratio. If the true model is y = 3X1 + 2X2, this will look a whole lot like (X1 + 6X2), or (2X1 + 4X2), etc, allowing basically all coefficient sets that satisfy 2*B1 + B2 = 8 (depending on the degree of collinearity). This is because the high degree of collinearity creates a "taco" shape in the loss landscape, relating model loss as a function of B1 and B2. In this example L1 regularization would end up choosing y = 4X1 (retaining the smallest coefficient of the correlated set), while L2 regularization would end up choosing y = 8/3 X1 + 8/3 X2 (balancing B1 and B2 values).

2 of 2

2

Someone correct me if I'm wrong but the logistic regression decision boundary is w^TΦ(x)=0 i.e. w0+w1Φ1(x)+w2Φ2(x)+...+wMΦM(x)=0, which you can verify from σ(w^TΦ(x))=0.5. If you multiply both sides of w^TΦ(x)=0 by a constant you get the same decision boundary. So you should be able to scale the magnitude of the weights up or down and it shouldn't change the decision boundary. i.e. a larger w vector does not lead to a more complex decision boundary. It seems regularisation can only control how ‘hard’ the decision boundary is i.e. how quickly the probability changes from one class to the other near the decision boundary. If you watch Andrew Ngs regularised logistic regression videos however he clearly says that the point is to control the complexity of the decision boundary which seems to be incorrect.

Stack Exchange

datascience.stackexchange.com › questions › 27258 › math-behind-l2-regularization-for-logistic-regression

machine learning - Math behind L2 Regularization for Logistic Regression - Data Science Stack Exchange

Top answer

1 of 2

5

Your question is really about the method of Lagrange multipliers in constrained optimization, not logistic regression per se. The gist of it is that a constrained optimization problem can be recast as an unconstrained optimization problem by adding a term, called the regularizer, and vice versa. The sphere comes from recasting the unconstrained problem into a constrained one; recall that a constant $\text{[math]}$ norm defines a hypersphere.

2 of 2

1

A simple way to think about this is to appreciate that you are minimizing an objective function. L2 regularization alters the output of the objective function such that smaller values are favored. So you have this constant 'pressure' on the parameters aiming towards 0.

GitHub

github.com › pickus91 › Logistic-Regression-Classifier-with-L2-Regularization

GitHub - pickus91/Logistic-Regression-Classifier-with-L2-Regularization: Logistic regression with L2 regularization for binary classification · GitHub

If the testing data follows this same pattern, a logistic regression classifier would be an advantageous model choice for classification. We now turn to training our logistic regression classifier with L2 regularization using 20 iterations of gradient descent, a tolerance threshold of 0.001, and a regularization parameter of 0.01.

Starred by 18 users

Forked by 9 users

Languages Python

Find elsewhere

Google Bing Mojeek

Dataversity

dataversity.net › home › articles › regularization for logistic regression: l1, l2, gauss or laplace?

Regularization for Logistic Regression: L1, L2, Gauss or Laplace? - Dataversity

September 15, 2025 - The two common regularization terms, which are added to penalize high coefficients, are the l1 norm or the square of the norm l2 multiplied by ½, which motivates the names L1 and L2 regularization.

Stack Overflow

stackoverflow.com › questions › 9369379 › implementing-logistic-regression-with-l2-regularization-in-matlab

Implementing logistic regression with L2 regularization in Matlab - Stack Overflow

Top answer

1 of 1

3

Here is an annotated piece of code for plain gradient descent for logistic regression. To introduce regularisation, you will want to update the cost and gradient equations. In this code, theta are the parameters, X are the class predictors, y are the class-labels and alpha is the learning rate

I hope this helps :)

function [theta,J_store] = logistic_gradientDescent(theta, X, y,alpha,numIterations)

% Initialize some useful values
m = length(y); % number of training examples
n = size(X,2); %number of features

J_store = 0;
%J_store = zeros(numIterations,1);


for iter=1:numIterations

    %predicts the class labels using the current weights (theta)
    Z = X*theta;
    h = sigmoid(Z);

    %This is the normal cost function equation
    J = (1/m).*sum(-y.*log(h) - (1-y).*log(1-h));


    %J_store(iter) = J;



    %This is the equation to obtain the given the current weights, without regularisation
    grad = [(1/m) .* sum(repmat((h - y),1,n).*X)]';


    theta = theta - alpha.*grad;


end

end

GeeksforGeeks

geeksforgeeks.org › machine learning › regularization-in-machine-learning

Regularization in Machine Learning - GeeksforGeeks

16:15

A regression model that uses the L2 regularization technique is called Ridge regression. It adds the squared magnitude of the coefficient as a penalty term to the loss function(L).

Published April 30, 2026

AI/ML

helloaiml.com › regularization-in-logistic-regression-80f736fc79904cee8540c89dd073ed24

Regularization In Logistic Regression - AI/ML

Introduction to basic concepts of AI and Machine Learning with the python code

Opporture

opporture.org › lexicon › l2-regularization

L2-regularization | Opporture

April 7, 2023 - Linear regression uses L2 ... regression and neural networks use L2 regularization by adding the L2 penalty term to the loss function during training....

Daily Dose of DS

blog.dailydoseofds.com › daily dose of data science › a lesser-known advantage of using l2 regularization

A Lesser-known Advantage of Using L2 Regularization

September 17, 2024 - Out of nowhere, L2 regularization helped us eliminate multicollinearity. In fact, this is where “ridge regression” also gets its name from — it eliminates the ridge in the likelihood function when the L2 penalty is used.

reddit.com › r/statistics › [d] l1 vs l2 regularization

r/statistics on Reddit: [D] L1 vs L2 regularization

May 29, 2020 -

Can anyone explain the differences/advantages for using L1 vs L2 regularization? Are there circumstances in which one of them is more advantageous than the other?

Thanks!

Top answer

1 of 5

31

The goal of both are to reduce the size of your coefficients, keeping them small to avoid/reduce overfitting. L2 regularization puts more emphasis on punishing larger coefficients, which will also reduce the chance that there is just a small subset of features that very disproportionally control most of the output. L1 regularization is often also though of as a form of feature selection, and it actually pushes the coefficients of less impactful feature towards 0. This can be very useful if you want to have sparse models for either serving faster predictions, and/or when memory is a concern (e.g. when deploying to small remote IoT devices that can't store a bunch of parameters in RAM).

2 of 5

5

L2 regularization encourages all values to be small L1 regularization encourages most values to be 0, with some higher values

Towards Data Science

towardsdatascience.com › home › latest › l1 vs l2 regularization in machine learning: differences, advantages and how to apply them in…

L1 vs L2 Regularization in Machine Learning: Differences, Advantages and How to Apply Them in... | Towards Data Science

January 19, 2025 - L2 regularization can improve model stability when training data is noisy or incomplete, by reducing the impact of outliers or noise on variables. In this example we will see how to applyregularization to a logistic regression model for a ...

XGBoost

xgboost.readthedocs.io › en › stable › parameter.html

XGBoost Parameters — xgboost 3.2.0 documentation

L2 regularization term on weights. Increasing this value will make model more conservative.

Rohan-paul

rohan-paul.com › p › ml-interview-q-series-l2-regularizations

ML Interview Q Series: L2 Regularization's Impact on Selecting the Optimal Logistic Regression Hyperplane

June 6, 2025 - The question then is which of these ... L2 regularization). L2 regularization penalizes large coefficient magnitudes by adding the squared norm of the parameter vector to the cost function....

Google

developers.google.com › machine learning › logistic regression: loss and regularization

Logistic regression: Loss and regularization | Machine Learning | Google for Developers

Log Loss is used in logistic regression ... such as L2 regularization or early stopping, is crucial in logistic regression to prevent overfitting due to the model's asymptotic nature....

Ed

mlpr.inf.ed.ac.uk › 2024 › notes › w9a_bayes_logistic_regression.html

Bayesian logistic regression

Curielrodrigo

curielrodrigo.com › home › how to classify data with ai? logistic regression

How to classify data with AI? Logistic Regression

March 5, 2024 - This can lead to some coefficients being reduced to zero, effectively performing feature selection. L2 regularization (Ridge): Adds the square of the magnitude of coefficients to the cost function.