python - Machine learning Logistic Regression L2 regularization - Stack Overflow
Why do we need regularisation (L2 or L1 norm) in logistic regression?
L1 vs L2 regularization. Which is "better"?
why does l2 regularization not make the weights to be exactly zero?
How do I choose between L1 and L2 regularization?
What is the difference between L1 and L2 regularization?
Are Gauss and L2 regularization the same thing?
Videos
As I was revising through my logistic regression notes and came around the loss minimization interpretation of logistic regression which is:
argmin(w) log(1 + exp(-Zi)) + 1/2lambda||w||2 where Zi = Yi.Wi.Xi summation i : 1->n
I know that, the L2 regularisation as used in the above optimization function is used to find a balance between a good seperating hyperplane (decision surface) and weight coefficients that are not too large (tending to infinity) to be overestimated. I can't seem to intuitively understand as to how regularisation is working to balance the weight coefficients to avoid overfitting/underfitting? Also I might be having a misunderstanding here but in the loss function optimization part of the expression, if we consider that we are not using any regularisation, then ideally to minimise the loss function, For points that are correctly seperated, the weights corresponding to features should tend to infinity such the value of Zi tends to infinity which results in log(1 + exp(-Zi)) tending to 0 so we are minimizing the sum over correctly classified points but for the same plane with infinitely big weights, if a point comes out to be incorrectly classified it's loss function value will tend to infinity which makes it working against the optimisation problem. So accordingly the weights should get readjusted to smaller values, such that the sum of loss is minimized, without the need of a regularisation term. So I am really very confused as do we even need regularisation in logistic regression, if yes, how regularisation term in the expression is working towards balancing the weights?