The genereal notation for -norm for
of vector
is this:
$$ \| v \|_p = \sqrt[p]{\sum^n_{i=1} |v_i|^p}. $$
It is easy to see that $\| v\|_2$ is indeed an Euclidean norm (let in the formula above) That is, Euclidean norm is 2-norm.
Then squaring produces
$$ \| v\|_2^2 = (\|v\|_2)^2 =\sum^n_{i=1} v_i^2 = v_1^2 + v_2^2 \ldots + v_n^2 $$
which is what you have specified.
Answer from Nik Bren on Stack ExchangeVideos
The genereal notation for -norm for
of vector
is this:
$$ \| v \|_p = \sqrt[p]{\sum^n_{i=1} |v_i|^p}. $$
It is easy to see that $\| v\|_2$ is indeed an Euclidean norm (let in the formula above) That is, Euclidean norm is 2-norm.
Then squaring produces
$$ \| v\|_2^2 = (\|v\|_2)^2 =\sum^n_{i=1} v_i^2 = v_1^2 + v_2^2 \ldots + v_n^2 $$
which is what you have specified.
If you read Boyd in chapter six there is regularization and least squares problems. Regularization follows the following problem like this.
$$ \textrm{ minimize w.r.t }R_{+}^{2} (\| Ax -b\|,\|x \|) $$
this is called the bi-criterion problem which is a convex optimization problem.
Regularization has a general pattern which looks like this $$ \textrm{ minimize} \| Ax -b\| + \gamma \|x \| $$
Where we have a parameter which is our regularization parameter. In the case of
regularization we have
$$ \textrm{ minimize} \| Ax -b\|_{2} + \delta \|x \|_{2} $$
where our 2-norm here $\|x \|_{2} = \left( \sum_{i=1}^{m} |x_{i} |^{2} \right)^{\frac{1}{2}}$
The superscript simply means
$$ \| x \|_{2}^{2} = \sum_{i=1}^{m} |x_{i} |^{2} $$
I know it reduces overfitting/model complexity.
And in L1 & L2 regularization you add a term to the loss function(lambda* sum of l1 norm of weights or l2 norm of weights).
How does this lower complexity? Is it because its makes the weights as small as possible?(I got asked this in an interview and didn't know how to explain the process by which complexity gets reduced).
I am also familiar with dropout. I know you train while deactivating units. Mathematically how does this reduce overfitting? I get the intuition somewhat.