L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself. Answer from AhmedMostafa16 on reddit.com
GeeksforGeeks
geeksforgeeks.org › machine learning › regularization-in-machine-learning
Regularization in Machine Learning - GeeksforGeeks
The coefficients reflect the regularized feature weights. Elastic Net Regression is a combination of both L1 as well as L2 regularization. It combines both L1 (absolute values) and L2 (squared values) penalties on the coefficients.
Published April 30, 2026
[D] Why is L2 preferred over L1 Regularization?
Just to add to what everyone else is saying:
If you have 2 extremely correlated features, you will get more understandable results with L2 regression because the coefficients will be quite evenly distributed among the features. If you use L1, you can get coefficients that differ greatly in magnitude even though they will probably be directionally the same.
More on reddit.comhow does regularization work(especially l1 and l2?)
L1 and L2 regularisation add a cost for large weights and have a hyper-parameter (lambda) for the regularisation strength. This effectively constrains the possible weight values that the model can learn, so it reduces the size of the hypothesis set, which means it lowers the model complexity. The fact that it favours small weights over large weights is what additionally reduces overfitting: in a linear model almost all weights represent a 'partial slope', and smaller slopes mean smoother surfaces which are harder to fit to irregular/noisy data points. For dropout, I only know of the intuitions. Of all those I have read/heard about, the one that makes most sense to me is that the effective number of neurons in a layer is reduced, thus also the effective number of parameters of the model, so that model complexity is reduced. That by itself may explain reduced overfitting, but why it is as good as it is (math/theory-wise) is not clear to me. More on reddit.com
[Question] With L1/L2 Regularization in a neural network, why are the weights regularized, but not the biases?
It's not typical to regularize the biases, the probable reason being that doing so directly limits the amount of nonlinearity you can learn (edit: in a sigmoidal net, anyway). If you do regularize them it would make sense to have a much smaller coefficient than for your weights. More on reddit.com
L1 vs L2 regularization. Which is "better"?
L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself. More on reddit.com
Videos
12:00
L1 and L2 Regularization in Machine Learning: Easy Explanation ...
04:04
L1 vs L2 Regularization - YouTube
03:08
Regularization in Machine Learning Explained | L1 vs L2 with Simple ...
10:45
L1 & L2 Regularization Techniques Explained | Simplifying Machine ...
08:19
When Should You Use L1/L2 Regularization - YouTube
21:14
Regulaziation in Machine Learning | L1 and L2 Regularization | ...
Reddit
reddit.com › r/learnmachinelearning › l1 vs l2 regularization. which is "better"?
r/learnmachinelearning on Reddit: L1 vs L2 regularization. Which is "better"?
August 12, 2024 -
In plain english can anyone explain situations where one is better than the other? I know L1 induces sparsity which is useful for variable selection but can L2 also do this? How do we determine which to use in certain situations or is it just trial and error?
Top answer 1 of 10
90
L1 regularization helps perform feature selection in sparse feature spaces, and that is a good practical reason to use L1 in some situations. However, beyond that particular reason I have never seen L1 to perform better than L2 in practice. If you take a look at LIBLINEAR FAQ on this issue you will see how they have not seen a practical example where L1 beats L2 and encourage users of the library to contact them if they find one. Even in a situation where you might benefit from L1's sparsity in order to do feature selection, using L2 on the remaining variables is likely to give better results than L1 by itself.
2 of 10
18
L1 Regularization (Lasso): Use When: You want feature selection, as L1 can shrink some coefficients to zero, effectively removing less important features. You have a sparse dataset and expect only a few features to be significant. Your model can benefit from simplicity and interpretability by reducing the number of features. L2 Regularization (Ridge): Use When: You want to reduce the impact of multicollinearity by shrinking the coefficients but not to zero. You have many correlated features, and you want to distribute the error among them. You need a smooth and stable model without completely eliminating features.
CCS NEU
ccs.neu.edu › home › vip › teach › MLcourse › 1.1_LinearRegression › LectureNotes › L1_and_L2_reg_regression,pdf.pdf pdf
Intuition MACHINE LEARNING AND MATHEMATICS Understanding L1 and L2
May 25, 2024 - importance of regularization, we use 15 polynomial regression, meaning we use an overly complex function to predict data. ... Understanding L1 and L2 regularization with analytical and probabilistic views | by Yuki Shizuya | Intuition | Medium
GeeksforGeeks
geeksforgeeks.org › machine learning › how-does-l1-and-l2-regularization-prevent-overfitting
How does L1 and L2 regularization prevent overfitting? - GeeksforGeeks
July 23, 2025 - Avoiding overfitting is crucial in developing robust and generalizable machine learning models. To improve a model's performance, various techniques can be applied. These include methods like dropout, which randomly removes neurons during training, adaptive regularization to adjust regularization strength based on data, and early stopping to halt training when performance plateaus, along with experimenting with different architectures and applying L1 or L2 regularization for controlling overfitting.
YouTube
youtube.com › codebasics
Machine Learning Tutorial Python - 17: L1 and L2 Regularization | Lasso, Ridge Regression - YouTube
In this Python machine learning tutorial for beginners, we will look into,1) What is overfitting, underfitting2) How to address overfitting using L1 and L2 r...
Published November 26, 2020 Views 269K
PubMed Central
pmc.ncbi.nlm.nih.gov › articles › PMC3224215
Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features - PMC
Thus, L1 regularization combines efficient feature selection and model generation into one single optimization step. In recent years, considerable advancements were made in high throughput techniques to generate for a large number of relevant molecular compounds the target values in focus.
YouTube
youtube.com › watch
L1 and L2 Regularization in Machine Learning: Easy Explanation for Data Science Interviews - YouTube
Regularization is a machine learning technique that introduces a regularization term to the loss function of a model in order to improve the generalization o...
Published November 28, 2022
Dataheadhunters
dataheadhunters.com › academy › understanding-regularization-l1-vs-l2-methods-compared
Understanding Regularization: L1 vs. L2 Methods Compared
January 7, 2024 - Regularization works by limiting the complexity of a machine learning model. This is done by adding a regularization term to the loss function that gets minimized during training. The regularization term penalizes model complexity, acting as a tradeoff between fitting the training data perfectly and keeping the model simple enough to generalize well. There are two main types of regularization used in practice: L1 regularization and L2 regularization.