L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself. Answer from AhmedMostafa16 on reddit.com
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › regularization-in-machine-learning
Regularization in Machine Learning - GeeksforGeeks
The coefficients reflect the regularized feature weights. Elastic Net Regression is a combination of both L1 as well as L2 regularization. It combines both L1 (absolute values) and L2 (squared values) penalties on the coefficients.
Published   April 30, 2026
🌐
Built In
builtin.com › data-science › l2-regularization
L1 and L2 Regularization Methods, Explained | Built In
The L1 regularization norm is calculated as the sum of absolute values of the vector. The L2 regularization norm is calculated as the square root of the sum of the squared vector values.
Discussions

[D] Why is L2 preferred over L1 Regularization?

Just to add to what everyone else is saying:

If you have 2 extremely correlated features, you will get more understandable results with L2 regression because the coefficients will be quite evenly distributed among the features. If you use L1, you can get coefficients that differ greatly in magnitude even though they will probably be directionally the same.

More on reddit.com
🌐 r/MachineLearning
97
156
October 12, 2019
how does regularization work(especially l1 and l2?)
L1 and L2 regularisation add a cost for large weights and have a hyper-parameter (lambda) for the regularisation strength. This effectively constrains the possible weight values that the model can learn, so it reduces the size of the hypothesis set, which means it lowers the model complexity. The fact that it favours small weights over large weights is what additionally reduces overfitting: in a linear model almost all weights represent a 'partial slope', and smaller slopes mean smoother surfaces which are harder to fit to irregular/noisy data points. For dropout, I only know of the intuitions. Of all those I have read/heard about, the one that makes most sense to me is that the effective number of neurons in a layer is reduced, thus also the effective number of parameters of the model, so that model complexity is reduced. That by itself may explain reduced overfitting, but why it is as good as it is (math/theory-wise) is not clear to me. More on reddit.com
🌐 r/MLQuestions
8
12
September 22, 2019
[Question] With L1/L2 Regularization in a neural network, why are the weights regularized, but not the biases?
It's not typical to regularize the biases, the probable reason being that doing so directly limits the amount of nonlinearity you can learn (edit: in a sigmoidal net, anyway). If you do regularize them it would make sense to have a much smaller coefficient than for your weights. More on reddit.com
🌐 r/MachineLearning
9
11
April 8, 2015
L1 vs L2 regularization. Which is "better"?
L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself. More on reddit.com
🌐 r/learnmachinelearning
32
193
August 12, 2024
🌐
Reddit
reddit.com › r/learnmachinelearning › l1 vs l2 regularization. which is "better"?
r/learnmachinelearning on Reddit: L1 vs L2 regularization. Which is "better"?
August 12, 2024 -

In plain english can anyone explain situations where one is better than the other? I know L1 induces sparsity which is useful for variable selection but can L2 also do this? How do we determine which to use in certain situations or is it just trial and error?

🌐
Medium
medium.com › @alejandro.itoaramendia › l1-and-l2-regularization-part-1-a-complete-guide-51cf45bb4ade
L1 and L2 Regularization (Part 1): A Complete Guide
March 31, 2024 - L1 regularization, also known as LASSO regression adds the absolute value of each coefficient as a penalty term to the loss function. L2 regularization, also known as Ridge regression adds the squared value of each coefficient as a penalty term ...
🌐
Towards Data Science
towardsdatascience.com › home › latest › understanding l1 and l2 regularization
Understanding l1 and l2 Regularization | Towards Data Science
January 16, 2025 - When overfitting occurs in linear regression, we can try to regularize our linear model; Regularization is the most used technique to penalize complex models in machine learning: it avoids overfitting by penalizing the regression coefficients ...
🌐
Google
developers.google.com › machine learning › overfitting: l2 regularization
Overfitting: L2 regularization | Machine Learning | Google for Developers
April 9, 2026 - Learn how the L2 regularization metric is calculated and how to set a regularization rate to minimize the combination of loss and complexity during model training, or to use alternative regularization techniques like early stopping.
🌐
CCS NEU
ccs.neu.edu › home › vip › teach › MLcourse › 1.1_LinearRegression › LectureNotes › L1_and_L2_reg_regression,pdf.pdf pdf
Intuition MACHINE LEARNING AND MATHEMATICS Understanding L1 and L2
May 25, 2024 - importance of regularization, we use 15 polynomial regression, meaning we use an overly complex function to predict data. ... Understanding L1 and L2 regularization with analytical and probabilistic views | by Yuki Shizuya | Intuition | Medium
Find elsewhere
🌐
E2E Networks
e2enetworks.com › blog › regularization-in-deep-learning-l1-l2-dropout
Regularization in Deep Learning: L1, L2 & Dropout | E2E Networks
August 24, 2022 - The penalty for L1 regularization is equal to the amount of the coefficient in absolute terms. With this form of regularization, sparse models with few coefficients may be produced.
🌐
Analytics Vidhya
analyticsvidhya.com › home › regularization in machine learning
Regularization in Machine Learning | Analytics Vidhya
October 29, 2024 - The most common regularization techniques are L1 regularization (Lasso), which adds the absolute values of the model weights to the loss function, and L2 regularization (Ridge), which adds the squared values of the weights.
🌐
GeeksforGeeks
geeksforgeeks.org › machine learning › how-does-l1-and-l2-regularization-prevent-overfitting
How does L1 and L2 regularization prevent overfitting? - GeeksforGeeks
July 23, 2025 - Avoiding overfitting is crucial in developing robust and generalizable machine learning models. To improve a model's performance, various techniques can be applied. These include methods like dropout, which randomly removes neurons during training, adaptive regularization to adjust regularization strength based on data, and early stopping to halt training when performance plateaus, along with experimenting with different architectures and applying L1 or L2 regularization for controlling overfitting.
🌐
IBM
ibm.com › think › topics › ridge-regression
What Is Ridge Regression? | IBM
November 17, 2025 - Ridge regression—also known as L2 regularization—is one of several types of regularization for linear regression models. Regularization is a statistical method to reduce errors caused by overfitting on training data.
🌐
YouTube
youtube.com › codebasics
Machine Learning Tutorial Python - 17: L1 and L2 Regularization | Lasso, Ridge Regression - YouTube
In this Python machine learning tutorial for beginners, we will look into,1) What is overfitting, underfitting2) How to address overfitting using L1 and L2 r...
Published   November 26, 2020
Views   269K
🌐
PubMed Central
pmc.ncbi.nlm.nih.gov › articles › PMC3224215
Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features - PMC
Thus, L1 regularization combines efficient feature selection and model generation into one single optimization step. In recent years, considerable advancements were made in high throughput techniques to generate for a large number of relevant molecular compounds the target values in focus.
🌐
YouTube
youtube.com › watch
L1 and L2 Regularization in Machine Learning: Easy Explanation for Data Science Interviews - YouTube
Regularization is a machine learning technique that introduces a regularization term to the loss function of a model in order to improve the generalization o...
Published   November 28, 2022
🌐
GeeksforGeeks
geeksforgeeks.org › l1l2-regularization-in-pytorch
L1/L2 Regularization in PyTorch - GeeksforGeeks
July 31, 2024 - L1 regularization fosters sparsity by driving some weights to zero, leading to simpler and more interpretable models. In contrast, L2 regularization reduces model complexity by shrinking weights, improving numerical stability and overall performance.
🌐
Pickl
pickl.ai › home › machine learning › learn l1 and l2 regularisation in machine learning
Learn L1 and L2 Regularisation in Machine Learning
February 19, 2025 - Summary: L1 and L2 Regularisation in Machine Learning prevent overfitting by adding penalty terms to model parameters. L1 Regularisation selects important features by reducing some coefficients to zero, while L2 Regularisation smooths weight ...
🌐
Quora
quora.com › What-is-the-advantage-of-combining-L2-and-L1-regularizations
What is the advantage of combining L2 and L1 regularizations? - Quora
Answer (1 of 5): The L2 penalty ... (leave one out cross-validation). Similarly the L1 penalty hyperparameter can be optimized efficiently using regularization path methods. However, optimizing both the L1 and L2 ...
🌐
Dataheadhunters
dataheadhunters.com › academy › understanding-regularization-l1-vs-l2-methods-compared
Understanding Regularization: L1 vs. L2 Methods Compared
January 7, 2024 - Regularization works by limiting the complexity of a machine learning model. This is done by adding a regularization term to the loss function that gets minimized during training. The regularization term penalizes model complexity, acting as a tradeoff between fitting the training data perfectly and keeping the model simple enough to generalize well. There are two main types of regularization used in practice: L1 regularization and L2 regularization.
🌐
Turing
turing.com › kb › ultimate-guidebook-for-regularization-techniques-in-deep-learning
Ultimate Guidebook for Regularization Techniques in Deep Learning.
L2 regularization works best when all the weights are roughly of the same size, i.e., input features are of the same range. This technique also helps the model to learn more complex patterns from data without overfitting easily.