🌐
Medium
medium.com › @bneeraj026 › logistic-regression-with-l2-regularization-from-scratch-1bbb078f1e88
Logistic Regression with L2 Regularization from scratch | by Neeraj Bhatt | Medium
September 7, 2023 - Logistic Regression in many cases serves as a good baseline model that can be used as a benchmark to evaluate all subsequent Machine Learning models. As we saw its simple to implement, highly interpretable, and can handle cases like outliers & overfitting with L2 regularization.
🌐
scikit-learn
scikit-learn.org › stable › modules › generated › sklearn.linear_model.LogisticRegression.html
LogisticRegression — scikit-learn 1.8.0 documentation
Use l1_ratio instead. l1_ratio=0 for penalty='l2', l1_ratio=1 for penalty='l1' and l1_ratio set to any float between 0 and 1 for 'penalty='elasticnet'. ... Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. C=np.inf results in unpenalized logistic regression.
Discussions

python - Machine learning Logistic Regression L2 regularization - Stack Overflow
Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com
🌐 stackoverflow.com
Why do we need regularisation (L2 or L1 norm) in logistic regression?
Regularization doesn't necessarily have anything to do with improving your accuracy or minimizing your loss. Rather, regularization is generally effective in situations where the coefficients exhibit high variance, since these cases are normally dominated by the noise of the data. Regularization allows us to bias our model towards "rules of thumb" rather than blindly following statistical noise, which can be desirable in some contexts depending on the goals. High variance in the coefficients usually comes from either (1) very noisy data, or (2) correlation between variables. In either case, the loss landscape of the coefficients exhibits a very "broad" minima, instead of a "sharp" minima which is more stable under small perturbations. Regularization is a way of "sharpening" the broad minima, injecting stability through bias. Instead of letting the exact minimum value be dominated by random error (which is the case in high-variance contexts), we bias the value according to what basically amounts as rules-of-thumb. L1 regularization biases towards sparsity, selecting the smallest set of coefficients that achieve similar accuracy (and selecting the smallest-valued coefficients among sets of equal size). L2 regularization is the opposite, "spreading out" the predictive weights among the coefficients as much as possible. Illustrative simple example using linear regression: suppose X1 and X2 are highly correlated in a 2:1 ratio. If the true model is y = 3X1 + 2X2, this will look a whole lot like (X1 + 6X2), or (2X1 + 4X2), etc, allowing basically all coefficient sets that satisfy 2*B1 + B2 = 8 (depending on the degree of collinearity). This is because the high degree of collinearity creates a "taco" shape in the loss landscape, relating model loss as a function of B1 and B2. In this example L1 regularization would end up choosing y = 4X1 (retaining the smallest coefficient of the correlated set), while L2 regularization would end up choosing y = 8/3 X1 + 8/3 X2 (balancing B1 and B2 values). More on reddit.com
🌐 r/learnmachinelearning
2
8
December 1, 2021
L1 vs L2 regularization. Which is "better"?
L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself. More on reddit.com
🌐 r/learnmachinelearning
32
193
August 12, 2024
why does l2 regularization not make the weights to be exactly zero?
I’m a big fan of the Bayesian interpretation of L1 and L2 regularization. Under a Bayesian paradigm, L1 regularization is equal to a Laplace (double exponential) prior on the parameter. Intuitively, you can think about this as saying, “I really think this parameter should be zero and unless we have compelling evidence otherwise it should stay at zero”. This is reflected by the pdf of the prior, with a laplace prior we have a really sharp pointy part centered right at 0 that exponentially falls off as we move away from 0. A Laplace prior is considered sparsity inducing, which means lots of our parameters will end up being 0. Compare this to L2 regularization which is equivalent to a Gaussian prior mean centered at 0. We’re still expecting the parameters to be close to 0, but we don’t have nearly as strong as assumption as we do when we use a Laplace prior. We essentially allow the parameters to have a little more wiggle room around 0, which leads to lots of our parameters being close to, but not actually 0. More on reddit.com
🌐 r/learnmachinelearning
11
18
June 19, 2023
People also ask

How do I choose between L1 and L2 regularization?
Choose L1 (Laplace) regularization when you want feature selection and sparse models with many coefficients set to zero. Choose L2 (Gauss) regularization when you want to shrink all coefficients toward zero without eliminating features entirely. L1 is better for interpretability and feature selection, while L2 is better for overall coefficient shrinkage.
🌐
knime.com
knime.com › home › blog › understanding regularization for logistic regression
Understanding regularization for logistic regression | KNIME
What is the difference between L1 and L2 regularization?
L1 regularization (equivalent to Laplace prior) uses the sum of absolute values of coefficients and leads to sparse coefficient vectors with many coefficients becoming zero, effectively performing feature selection. L2 regularization (equivalent to Gauss prior) uses the sum of squared coefficients and leads to smaller coefficient values in general without making them zero.
🌐
knime.com
knime.com › home › blog › understanding regularization for logistic regression
Understanding regularization for logistic regression | KNIME
Are Gauss and L2 regularization the same thing?
Yes, Gauss prior and L2 regularization are equivalent approaches that produce the same results. Similarly, Laplace prior and L1 regularization are equivalent. The relationship in KNIME is: Gauss prior equals L2 if λ = 1/σ², and Laplace prior equals L1 if λ = √2/σ.
🌐
knime.com
knime.com › home › blog › understanding regularization for logistic regression
Understanding regularization for logistic regression | KNIME
🌐
Compgenomr
compgenomr.github.io › book › logistic-regression-and-regularization.html
5.13 Logistic regression and regularization | Computational Genomics with R
Therefore these types of methods within the framework of regression are also called “shrinkage” methods or “penalized regression” methods. One way to ensure shrinkage is to add the penalty term, \(\lambda\sum{\beta_j}^2\), to the loss function. This penalty term is also known as the L2 norm or L2 penalty.
🌐
CodeSignal
codesignal.com › learn › courses › fixing-classical-models-diagnosis-regularization › lessons › tuning-l2-regularization-in-logistic-regression
Tuning L2 Regularization in Logistic Regression
L2 regularization works by adding a penalty to the loss function that discourages large coefficient values. The result is a model that favors simpler explanations and is less likely to overfit the training data. Example: Training and Evaluating with Different C Values · Let’s train multiple ...
🌐
GitHub
github.com › pickus91 › Logistic-Regression-Classifier-with-L2-Regularization
GitHub - pickus91/Logistic-Regression-Classifier-with-L2-Regularization: Logistic regression with L2 regularization for binary classification · GitHub
If the testing data follows this same pattern, a logistic regression classifier would be an advantageous model choice for classification. We now turn to training our logistic regression classifier with L2 regularization using 20 iterations of gradient descent, a tolerance threshold of 0.001, and a regularization parameter of 0.01.
Starred by 18 users
Forked by 9 users
Languages   Python
🌐
Medium
medium.com › @aditya97p › l1-and-l2-regularization-237438a9caa6
L1 and L2 Regularization.. Logistic Regression basic intuition : | by Aditya .P | Medium
November 11, 2018 - If we use L1 regularization in Logistic Regression all the Less important features will become zero. If hyper parameter(Λ) is 0 then there is no regularization term then it will overfit and if hyper parameter(Λ) is very large then it will add too much weight which leads to underfit. In L2 regularization.
🌐
Stack Overflow
stackoverflow.com › questions › 69934443 › machine-learning-logistic-regression-l2-regularization
python - Machine learning Logistic Regression L2 regularization - Stack Overflow
def logicalregP3(xtr,ytr,learning_rate,iteration,lamda): m=xtrain.T.shape[1] n=xtrain.T.shape[0] W= np.zeros((n,1)) B = 0 cost_list = [] for i in range (iteration): z= np.array(np.dot(W.T, xtr.T),dtype=np.float32)+B a= 1/(1 + np.exp(-z)) cost=-(1/m)*np.sum(ytr.T*np.log(a)+(1-ytr.T)*np.log(1-a))+(lamda*np.sum(W)) # Gradient Descent regular=(lamda/(2*m))*W dW=(1/m)*np.dot(a-ytr.T,xtr)+regular dB=(1/m)*np.sum(a-ytr.T) W=W-learning_rate*dW.T B=B-learning_rate*dB print("cost ", i ," ", cost) cost_list.append(cost) return W,B,cost_list
Find elsewhere
🌐
Medium
medium.com › @vincefav › regularization-in-logistic-regression-14b50d7cc31
Regularization in Logistic Regression | by Vincent Favilla | Medium
June 6, 2023 - Convexity is an important property ... coefficients the algorithm starts with. L2 regularization induces convexity in the cost function by adding a quadratic penalty term, which makes the cost function smoother and easier to ...
🌐
Dataversity
dataversity.net › home › articles › regularization for logistic regression: l1, l2, gauss or laplace?
Regularization for Logistic Regression: L1, L2, Gauss or Laplace? - Dataversity
September 15, 2025 - The two common regularization terms, which are added to penalize high coefficients, are the l1 norm or the square of the norm l2 multiplied by ½, which motivates the names L1 and L2 regularization.
🌐
KNIME
knime.com › home › blog › understanding regularization for logistic regression
Understanding regularization for logistic regression | KNIME
March 12, 2018 - If a feature occurs only in one ... The two common regularization terms, which are added to penalize high coefficients, are the l1 norm or the square of the norm l2 multiplied by ½, which motivates the names L1 and L2 regularization....
🌐
Towards Data Science
towardsdatascience.com › implement-logistic-regression-with-l2-regularization-from-scratch-in-python-20bd4ee88a59
Implement Logistic Regression with L2 Regularization from scratch in Python | by Tulrose Deori | Towards Data Science | Medium
December 15, 2021 - Regularization is a technique to solve the problem of overfitting in a machine learning algorithm by penalizing the cost function. It does so by using an additional penalty term in the cost function.
🌐
Reddit
reddit.com › r/learnmachinelearning › why do we need regularisation (l2 or l1 norm) in logistic regression?
r/learnmachinelearning on Reddit: Why do we need regularisation (L2 or L1 norm) in logistic regression?
December 1, 2021 -

As I was revising through my logistic regression notes and came around the loss minimization interpretation of logistic regression which is:

argmin(w) log(1 + exp(-Zi)) + 1/2lambda||w||2 where Zi = Yi.Wi.Xi summation i : 1->n

I know that, the L2 regularisation as used in the above optimization function is used to find a balance between a good seperating hyperplane (decision surface) and weight coefficients that are not too large (tending to infinity) to be overestimated. I can't seem to intuitively understand as to how regularisation is working to balance the weight coefficients to avoid overfitting/underfitting? Also I might be having a misunderstanding here but in the loss function optimization part of the expression, if we consider that we are not using any regularisation, then ideally to minimise the loss function, For points that are correctly seperated, the weights corresponding to features should tend to infinity such the value of Zi tends to infinity which results in log(1 + exp(-Zi)) tending to 0 so we are minimizing the sum over correctly classified points but for the same plane with infinitely big weights, if a point comes out to be incorrectly classified it's loss function value will tend to infinity which makes it working against the optimisation problem. So accordingly the weights should get readjusted to smaller values, such that the sum of loss is minimized, without the need of a regularisation term. So I am really very confused as do we even need regularisation in logistic regression, if yes, how regularisation term in the expression is working towards balancing the weights?

Top answer
1 of 2
2
Regularization doesn't necessarily have anything to do with improving your accuracy or minimizing your loss. Rather, regularization is generally effective in situations where the coefficients exhibit high variance, since these cases are normally dominated by the noise of the data. Regularization allows us to bias our model towards "rules of thumb" rather than blindly following statistical noise, which can be desirable in some contexts depending on the goals. High variance in the coefficients usually comes from either (1) very noisy data, or (2) correlation between variables. In either case, the loss landscape of the coefficients exhibits a very "broad" minima, instead of a "sharp" minima which is more stable under small perturbations. Regularization is a way of "sharpening" the broad minima, injecting stability through bias. Instead of letting the exact minimum value be dominated by random error (which is the case in high-variance contexts), we bias the value according to what basically amounts as rules-of-thumb. L1 regularization biases towards sparsity, selecting the smallest set of coefficients that achieve similar accuracy (and selecting the smallest-valued coefficients among sets of equal size). L2 regularization is the opposite, "spreading out" the predictive weights among the coefficients as much as possible. Illustrative simple example using linear regression: suppose X1 and X2 are highly correlated in a 2:1 ratio. If the true model is y = 3X1 + 2X2, this will look a whole lot like (X1 + 6X2), or (2X1 + 4X2), etc, allowing basically all coefficient sets that satisfy 2*B1 + B2 = 8 (depending on the degree of collinearity). This is because the high degree of collinearity creates a "taco" shape in the loss landscape, relating model loss as a function of B1 and B2. In this example L1 regularization would end up choosing y = 4X1 (retaining the smallest coefficient of the correlated set), while L2 regularization would end up choosing y = 8/3 X1 + 8/3 X2 (balancing B1 and B2 values).
2 of 2
2
Someone correct me if I'm wrong but the logistic regression decision boundary is w^TΦ(x)=0 i.e. w0+w1Φ1(x)+w2Φ2(x)+...+wMΦM(x)=0, which you can verify from σ(w^TΦ(x))=0.5. If you multiply both sides of w^TΦ(x)=0 by a constant you get the same decision boundary. So you should be able to scale the magnitude of the weights up or down and it shouldn't change the decision boundary. i.e. a larger w vector does not lead to a more complex decision boundary. It seems regularisation can only control how ‘hard’ the decision boundary is i.e. how quickly the probability changes from one class to the other near the decision boundary. If you watch Andrew Ngs regularised logistic regression videos however he clearly says that the point is to control the complexity of the decision boundary which seems to be incorrect.
🌐
SWARCH
aptech.com › blog › classification-with-regularized-logistic-regression
Classification with Regularized Logistic Regression | Aptech
June 7, 2023 - Since the main purpose of regularization is to address overfitting the model to the training data, we don't have much reason to use it. However, for demonstration purposes, we'll show how to implement L2 regularization. To implement regularization with the logisticRegFit, we'll use a logisticRegControl structure.
🌐
DataCamp
campus.datacamp.com › courses › linear-classifiers-in-python › logistic-regression-3
Logistic regression and regularization | Python
As you can see, L1 regularization set many of the coefficients to zero, thus ignoring those features; in other words, it performed feature selection for us. On the other hand, L2 regularization just shrinks the coefficients to be smaller. This is analogous to what happens with Lasso and Ridge ...
🌐
Medium
ujangriswanto08.medium.com › a-beginners-guide-to-l1-and-l2-regularization-in-logistic-regression-ec93ed1dea4f
A Beginner’s Guide to L1 and L2 Regularization in Logistic Regression | by Ujang Riswanto | Medium
March 5, 2025 - Implementing L1 and L2 regularization in logistic regression is super easy with Scikit-learn, and tuning them is as simple as adjusting the C parameter. Play around with the regularization strengths to see what works best for your data.
🌐
Quora
quora.com › What-is-L2-regularization-in-logistic-regression
What is L2 regularization in logistic regression? - Quora
Answer: We want to penalize the high coefficients. Imagine a feature happens only in one of classes. Consequently our logistic regression will assign a very high coefficient. So we use regularization methods to penalize that high coefficient. I wrote how to implement it mathematically in image b...
🌐
Number Analytics
numberanalytics.com › blog › expert-tips-l1-l2-logreg-models
Expert Tips on L1 & L2 in LogReg Models
May 16, 2025 - Key takeaways from this exploration include: - Choose L1 regularization when feature selection and model sparsity are priorities - Opt for L2 regularization when dealing with multicollinearity and when all features are potentially relevant - Consider elastic net for the best of both worlds, especially with high-dimensional data - Always combine regularization with proper cross-validation and feature preprocessing - Monitor regularization effects through techniques like coefficient path visualization As datasets continue to grow in size and dimensionality, the role of regularization becomes inc
🌐
scikit-learn
scikit-learn.org › 1.5 › modules › generated › sklearn.linear_model.LogisticRegression.html
Scikit-learn Logistic Regression
This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers. Note that regularization is applied by default. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied). The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2 ...