Your question is really about the method of Lagrange multipliers in constrained optimization, not logistic regression per se. The gist of it is that a constrained optimization problem can be recast as an unconstrained optimization problem by adding a term, called the regularizer, and vice versa. The sphere comes from recasting the unconstrained problem into a constrained one; recall that a constant norm defines a hypersphere.

Answer from Emre on Stack Exchange
🌐
Medium
medium.com › @bneeraj026 › logistic-regression-with-l2-regularization-from-scratch-1bbb078f1e88
Logistic Regression with L2 Regularization from scratch | by Neeraj Bhatt | Medium
September 7, 2023 - Logistic Regression in many cases serves as a good baseline model that can be used as a benchmark to evaluate all subsequent Machine Learning models. As we saw its simple to implement, highly interpretable, and can handle cases like outliers & overfitting with L2 regularization.
🌐
Google
developers.google.com › machine learning › overfitting: l2 regularization
Overfitting: L2 regularization | Machine Learning | Google for Developers
April 9, 2026 - L2 regularization is a technique used to reduce model complexity and prevent overfitting by penalizing large weights.
Discussions

Why do we need regularisation (L2 or L1 norm) in logistic regression?
Regularization doesn't necessarily have anything to do with improving your accuracy or minimizing your loss. Rather, regularization is generally effective in situations where the coefficients exhibit high variance, since these cases are normally dominated by the noise of the data. Regularization allows us to bias our model towards "rules of thumb" rather than blindly following statistical noise, which can be desirable in some contexts depending on the goals. High variance in the coefficients usually comes from either (1) very noisy data, or (2) correlation between variables. In either case, the loss landscape of the coefficients exhibits a very "broad" minima, instead of a "sharp" minima which is more stable under small perturbations. Regularization is a way of "sharpening" the broad minima, injecting stability through bias. Instead of letting the exact minimum value be dominated by random error (which is the case in high-variance contexts), we bias the value according to what basically amounts as rules-of-thumb. L1 regularization biases towards sparsity, selecting the smallest set of coefficients that achieve similar accuracy (and selecting the smallest-valued coefficients among sets of equal size). L2 regularization is the opposite, "spreading out" the predictive weights among the coefficients as much as possible. Illustrative simple example using linear regression: suppose X1 and X2 are highly correlated in a 2:1 ratio. If the true model is y = 3X1 + 2X2, this will look a whole lot like (X1 + 6X2), or (2X1 + 4X2), etc, allowing basically all coefficient sets that satisfy 2*B1 + B2 = 8 (depending on the degree of collinearity). This is because the high degree of collinearity creates a "taco" shape in the loss landscape, relating model loss as a function of B1 and B2. In this example L1 regularization would end up choosing y = 4X1 (retaining the smallest coefficient of the correlated set), while L2 regularization would end up choosing y = 8/3 X1 + 8/3 X2 (balancing B1 and B2 values). More on reddit.com
🌐 r/learnmachinelearning
2
8
December 1, 2021
machine learning - Math behind L2 Regularization for Logistic Regression - Data Science Stack Exchange
I read that L2 regularization in logistic regression creates a sort of sphere that limits the choice of $w$ weight, but why does this happen? More on datascience.stackexchange.com
🌐 datascience.stackexchange.com
January 30, 2018
Implementing logistic regression with L2 regularization in Matlab - Stack Overflow
Matlab has built in logistic regression using mnrfit, however I need to implement a logistic regression with L2 regularization. I'm completely at a loss at how to proceed. I've found some good pap... More on stackoverflow.com
🌐 stackoverflow.com
[D] L1 vs L2 regularization
The goal of both are to reduce the size of your coefficients, keeping them small to avoid/reduce overfitting. L2 regularization puts more emphasis on punishing larger coefficients, which will also reduce the chance that there is just a small subset of features that very disproportionally control most of the output. L1 regularization is often also though of as a form of feature selection, and it actually pushes the coefficients of less impactful feature towards 0. This can be very useful if you want to have sparse models for either serving faster predictions, and/or when memory is a concern (e.g. when deploying to small remote IoT devices that can't store a bunch of parameters in RAM). More on reddit.com
🌐 r/statistics
26
20
May 29, 2020
🌐
Compgenomr
compgenomr.github.io › book › logistic-regression-and-regularization.html
5.13 Logistic regression and regularization | Computational Genomics with R
Therefore these types of methods within the framework of regression are also called “shrinkage” methods or “penalized regression” methods. One way to ensure shrinkage is to add the penalty term, \(\lambda\sum{\beta_j}^2\), to the loss function. This penalty term is also known as the L2 norm or L2 penalty.
🌐
CodeSignal
codesignal.com › learn › courses › fixing-classical-models-diagnosis-regularization › lessons › tuning-l2-regularization-in-logistic-regression
Tuning L2 Regularization in Logistic Regression
L2 regularization works by adding a penalty to the loss function that discourages large coefficient values. The result is a model that favors simpler explanations and is less likely to overfit the training data. Example: Training and Evaluating with Different C Values · Let’s train multiple ...
🌐
Reddit
reddit.com › r/learnmachinelearning › why do we need regularisation (l2 or l1 norm) in logistic regression?
r/learnmachinelearning on Reddit: Why do we need regularisation (L2 or L1 norm) in logistic regression?
December 1, 2021 -

As I was revising through my logistic regression notes and came around the loss minimization interpretation of logistic regression which is:

argmin(w) log(1 + exp(-Zi)) + 1/2lambda||w||2 where Zi = Yi.Wi.Xi summation i : 1->n

I know that, the L2 regularisation as used in the above optimization function is used to find a balance between a good seperating hyperplane (decision surface) and weight coefficients that are not too large (tending to infinity) to be overestimated. I can't seem to intuitively understand as to how regularisation is working to balance the weight coefficients to avoid overfitting/underfitting? Also I might be having a misunderstanding here but in the loss function optimization part of the expression, if we consider that we are not using any regularisation, then ideally to minimise the loss function, For points that are correctly seperated, the weights corresponding to features should tend to infinity such the value of Zi tends to infinity which results in log(1 + exp(-Zi)) tending to 0 so we are minimizing the sum over correctly classified points but for the same plane with infinitely big weights, if a point comes out to be incorrectly classified it's loss function value will tend to infinity which makes it working against the optimisation problem. So accordingly the weights should get readjusted to smaller values, such that the sum of loss is minimized, without the need of a regularisation term. So I am really very confused as do we even need regularisation in logistic regression, if yes, how regularisation term in the expression is working towards balancing the weights?

Top answer
1 of 2
2
Regularization doesn't necessarily have anything to do with improving your accuracy or minimizing your loss. Rather, regularization is generally effective in situations where the coefficients exhibit high variance, since these cases are normally dominated by the noise of the data. Regularization allows us to bias our model towards "rules of thumb" rather than blindly following statistical noise, which can be desirable in some contexts depending on the goals. High variance in the coefficients usually comes from either (1) very noisy data, or (2) correlation between variables. In either case, the loss landscape of the coefficients exhibits a very "broad" minima, instead of a "sharp" minima which is more stable under small perturbations. Regularization is a way of "sharpening" the broad minima, injecting stability through bias. Instead of letting the exact minimum value be dominated by random error (which is the case in high-variance contexts), we bias the value according to what basically amounts as rules-of-thumb. L1 regularization biases towards sparsity, selecting the smallest set of coefficients that achieve similar accuracy (and selecting the smallest-valued coefficients among sets of equal size). L2 regularization is the opposite, "spreading out" the predictive weights among the coefficients as much as possible. Illustrative simple example using linear regression: suppose X1 and X2 are highly correlated in a 2:1 ratio. If the true model is y = 3X1 + 2X2, this will look a whole lot like (X1 + 6X2), or (2X1 + 4X2), etc, allowing basically all coefficient sets that satisfy 2*B1 + B2 = 8 (depending on the degree of collinearity). This is because the high degree of collinearity creates a "taco" shape in the loss landscape, relating model loss as a function of B1 and B2. In this example L1 regularization would end up choosing y = 4X1 (retaining the smallest coefficient of the correlated set), while L2 regularization would end up choosing y = 8/3 X1 + 8/3 X2 (balancing B1 and B2 values).
2 of 2
2
Someone correct me if I'm wrong but the logistic regression decision boundary is w^TΦ(x)=0 i.e. w0+w1Φ1(x)+w2Φ2(x)+...+wMΦM(x)=0, which you can verify from σ(w^TΦ(x))=0.5. If you multiply both sides of w^TΦ(x)=0 by a constant you get the same decision boundary. So you should be able to scale the magnitude of the weights up or down and it shouldn't change the decision boundary. i.e. a larger w vector does not lead to a more complex decision boundary. It seems regularisation can only control how ‘hard’ the decision boundary is i.e. how quickly the probability changes from one class to the other near the decision boundary. If you watch Andrew Ngs regularised logistic regression videos however he clearly says that the point is to control the complexity of the decision boundary which seems to be incorrect.
🌐
GitHub
github.com › pickus91 › Logistic-Regression-Classifier-with-L2-Regularization
GitHub - pickus91/Logistic-Regression-Classifier-with-L2-Regularization: Logistic regression with L2 regularization for binary classification · GitHub
If the testing data follows this same pattern, a logistic regression classifier would be an advantageous model choice for classification. We now turn to training our logistic regression classifier with L2 regularization using 20 iterations of gradient descent, a tolerance threshold of 0.001, and a regularization parameter of 0.01.
Starred by 18 users
Forked by 9 users
Languages   Python
Find elsewhere
🌐
Dataversity
dataversity.net › home › articles › regularization for logistic regression: l1, l2, gauss or laplace?
Regularization for Logistic Regression: L1, L2, Gauss or Laplace? - Dataversity
September 15, 2025 - The two common regularization terms, which are added to penalize high coefficients, are the l1 norm or the square of the norm l2 multiplied by ½, which motivates the names L1 and L2 regularization.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › regularization-in-machine-learning
Regularization in Machine Learning - GeeksforGeeks
A regression model that uses the L2 regularization technique is called Ridge regression. It adds the squared magnitude of the coefficient as a penalty term to the loss function(L).
Published   April 30, 2026
🌐
Opporture
opporture.org › lexicon › l2-regularization
L2-regularization | Opporture
April 7, 2023 - Linear regression uses L2 ... regression and neural networks use L2 regularization by adding the L2 penalty term to the loss function during training....
🌐
Daily Dose of DS
blog.dailydoseofds.com › daily dose of data science › a lesser-known advantage of using l2 regularization
A Lesser-known Advantage of Using L2 Regularization
September 17, 2024 - Out of nowhere, L2 regularization helped us eliminate multicollinearity. In fact, this is where “ridge regression” also gets its name from — it eliminates the ridge in the likelihood function when the L2 penalty is used.
🌐
Towards Data Science
towardsdatascience.com › home › latest › l1 vs l2 regularization in machine learning: differences, advantages and how to apply them in…
L1 vs L2 Regularization in Machine Learning: Differences, Advantages and How to Apply Them in... | Towards Data Science
January 19, 2025 - L2 regularization can improve model stability when training data is noisy or incomplete, by reducing the impact of outliers or noise on variables. In this example we will see how to applyregularization to a logistic regression model for a ...
🌐
XGBoost
xgboost.readthedocs.io › en › stable › parameter.html
XGBoost Parameters — xgboost 3.2.0 documentation
L2 regularization term on weights. Increasing this value will make model more conservative.
🌐
Rohan-paul
rohan-paul.com › p › ml-interview-q-series-l2-regularizations
ML Interview Q Series: L2 Regularization's Impact on Selecting the Optimal Logistic Regression Hyperplane
June 6, 2025 - The question then is which of these ... L2 regularization). L2 regularization penalizes large coefficient magnitudes by adding the squared norm of the parameter vector to the cost function....
🌐
Google
developers.google.com › machine learning › logistic regression: loss and regularization
Logistic regression: Loss and regularization | Machine Learning | Google for Developers
Log Loss is used in logistic regression ... such as L2 regularization or early stopping, is crucial in logistic regression to prevent overfitting due to the model's asymptotic nature....
🌐
Ed
mlpr.inf.ed.ac.uk › 2024 › notes › w9a_bayes_logistic_regression.html
Bayesian logistic regression
MLPR 2024 | Activities | Notes | Forum | FAQ | Feedback | Accessibility · This note: w9a PDF of this page Previous: Netflix Prize Next: The Laplace approximation applied to Bayesian logistic regression
🌐
Curielrodrigo
curielrodrigo.com › home › how to classify data with ai? logistic regression
How to classify data with AI? Logistic Regression
March 5, 2024 - This can lead to some coefficients being reduced to zero, effectively performing feature selection. L2 regularization (Ridge): Adds the square of the magnitude of coefficients to the cost function.