Your question is really about the method of Lagrange multipliers in constrained optimization, not logistic regression per se. The gist of it is that a constrained optimization problem can be recast as an unconstrained optimization problem by adding a term, called the regularizer, and vice versa. The sphere comes from recasting the unconstrained problem into a constrained one; recall that a constant norm defines a hypersphere.
Why do we need regularisation (L2 or L1 norm) in logistic regression?
machine learning - Math behind L2 Regularization for Logistic Regression - Data Science Stack Exchange
Implementing logistic regression with L2 regularization in Matlab - Stack Overflow
[D] L1 vs L2 regularization
Videos
As I was revising through my logistic regression notes and came around the loss minimization interpretation of logistic regression which is:
argmin(w) log(1 + exp(-Zi)) + 1/2lambda||w||2 where Zi = Yi.Wi.Xi summation i : 1->n
I know that, the L2 regularisation as used in the above optimization function is used to find a balance between a good seperating hyperplane (decision surface) and weight coefficients that are not too large (tending to infinity) to be overestimated. I can't seem to intuitively understand as to how regularisation is working to balance the weight coefficients to avoid overfitting/underfitting? Also I might be having a misunderstanding here but in the loss function optimization part of the expression, if we consider that we are not using any regularisation, then ideally to minimise the loss function, For points that are correctly seperated, the weights corresponding to features should tend to infinity such the value of Zi tends to infinity which results in log(1 + exp(-Zi)) tending to 0 so we are minimizing the sum over correctly classified points but for the same plane with infinitely big weights, if a point comes out to be incorrectly classified it's loss function value will tend to infinity which makes it working against the optimisation problem. So accordingly the weights should get readjusted to smaller values, such that the sum of loss is minimized, without the need of a regularisation term. So I am really very confused as do we even need regularisation in logistic regression, if yes, how regularisation term in the expression is working towards balancing the weights?
Your question is really about the method of Lagrange multipliers in constrained optimization, not logistic regression per se. The gist of it is that a constrained optimization problem can be recast as an unconstrained optimization problem by adding a term, called the regularizer, and vice versa. The sphere comes from recasting the unconstrained problem into a constrained one; recall that a constant norm defines a hypersphere.
A simple way to think about this is to appreciate that you are minimizing an objective function. L2 regularization alters the output of the objective function such that smaller values are favored. So you have this constant 'pressure' on the parameters aiming towards 0.
Can anyone explain the differences/advantages for using L1 vs L2 regularization? Are there circumstances in which one of them is more advantageous than the other?
Thanks!