As mentioned by others, the least-squares problem is much easier to solve. But there’s another important reason: assuming IID Gaussian noise, the least-squares solution is the Maximum-Likelihood estimate.
has a simple analytical solution.
is difficult.
One of reasons is that the absolute value is not differentiable.
Videos
Why do we use Least squares, why not absolute value, or cubes, or whatever. I understand visually that it is the square of the vertical distance....but why?
Both are done.
Least squares is easier, and the fact that for independent random variables "variances add" means that it's considerably more convenient; for examples, the ability to partition variances is particularly handy for comparing nested models. It's somewhat more efficient at the normal (least squares is maximum likelihood), which might seem to be a good justification -- however, some robust estimators with high breakdown can have surprisingly high efficiency at the normal.
But L1 norms are certainly used for regression problems and these days relatively often.
If you use R, you might find the discussion in section 5 here useful:
https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Robust-Regression.pdf
(though the stuff before it on M estimation is also relevant, since it's also a special case of that)
I can't help quoting from Huber, Robust Statistics, p.10 on this (sorry the quote is too long to fit in a comment):
Two time-honored measures of scatter are the mean absolute deviation
$$d_n=\frac{1}{n}\sum|x_i-\bar{x}|$$
and the mean square deviation
$$s_n=\left[\frac{1}{n}\sum(x_i-\bar{x})^2\right]^{1/2}$$
There was a dispute between Eddington (1914, p.147) and Fisher (1920, footnote on p. 762) about the relative merits of $d_n$ and $s_n$.[...] Fisher seemingly settled the matter by pointing out that for normal observations $s_n$ is about 12% more efficient than $d_n$.
By the relation between the conditional mean $\hat{y}$ and the unconditional mean $\bar{x}$ a similar argument applies to the residuals.
