As mentioned by others, the least-squares problem is much easier to solve. But there’s another important reason: assuming IID Gaussian noise, the least-squares solution is the Maximum-Likelihood estimate.
has a simple analytical solution.
is difficult.
One of reasons is that the absolute value is not differentiable.
Least Absolute Value based regression
Why Least SQUARES Regression instead of Least ABSOLUTE VALUE Regression?
Why Do We Use Least *Squares* In Linear Regression?
regression - Why squared residuals instead of absolute residuals in OLS estimation? - Cross Validated
Videos
Why do we use Least squares, why not absolute value, or cubes, or whatever. I understand visually that it is the square of the vertical distance....but why?
I understand the idea is the minimize the sum of the squares of the errors compared to the y = mx + b regression, but why the squares? Why not minimize then sum of the absolute value of the errors? Or the fourth powers of the errors?
Both are done.
Least squares is easier, and the fact that for independent random variables "variances add" means that it's considerably more convenient; for examples, the ability to partition variances is particularly handy for comparing nested models. It's somewhat more efficient at the normal (least squares is maximum likelihood), which might seem to be a good justification -- however, some robust estimators with high breakdown can have surprisingly high efficiency at the normal.
But L1 norms are certainly used for regression problems and these days relatively often.
If you use R, you might find the discussion in section 5 here useful:
https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Robust-Regression.pdf
(though the stuff before it on M estimation is also relevant, since it's also a special case of that)
I can't help quoting from Huber, Robust Statistics, p.10 on this (sorry the quote is too long to fit in a comment):
Two time-honored measures of scatter are the mean absolute deviation
$$d_n=\frac{1}{n}\sum|x_i-\bar{x}|$$
and the mean square deviation
$$s_n=\left[\frac{1}{n}\sum(x_i-\bar{x})^2\right]^{1/2}$$
There was a dispute between Eddington (1914, p.147) and Fisher (1920, footnote on p. 762) about the relative merits of $d_n$ and $s_n$.[...] Fisher seemingly settled the matter by pointing out that for normal observations $s_n$ is about 12% more efficient than $d_n$.
By the relation between the conditional mean $\hat{y}$ and the unconditional mean $\bar{x}$ a similar argument applies to the residuals.
