statistical optimality criterion
least absolute deviations regression method diagram
Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based on … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Least_absolute_deviations
Least absolute deviations - Wikipedia
November 22, 2024 - {\displaystyle \tau =1/2} gives the standard regression by least absolute deviations and is also known as median regression. The least absolute deviation problem may be extended to include multiple explanators, constraints and regularization, e.g., a linear model with ...
Discussions

Least Absolute Value based regression
Hi , I want to use linear regression based on least absolute value deviation to find the coefficients of my model with the help of measured data and 3 independent variables. The number of measur... More on mathworks.com
🌐 mathworks.com
2
0
November 9, 2013
Why Least SQUARES Regression instead of Least ABSOLUTE VALUE Regression?
Least squares is traditionally used for lots of reasons. It's computationally simple minimizing squares corresponds to fitting the expected response, rather than some quantile of the response It's mathematically simple The Least squares estimator corresponded to the maximum likelihood estimator if errors are Normal. More on reddit.com
🌐 r/AskStatistics
15
7
July 25, 2018
Why Do We Use Least *Squares* In Linear Regression?
The least squares solution to a linear regression falls directly out of a Maximum Likelihood Estimation of the data conditioned on it being normally distributed about the "curve of best fit". So, if you maximize the likelihood of observing the data conditioned on another distribution, the least squares solution will not, in general, give you the "correct" parameters. That being said, if you only need a qualitative description of the data given by a curve that has the same general behavior, there's no reason to prefer least squares over any other metric. More on reddit.com
🌐 r/math
119
317
October 10, 2024
regression - Why squared residuals instead of absolute residuals in OLS estimation? - Cross Validated
$\begingroup$ @cardinal, I was ... least absolute value you can get infinite which makes it harder to interpret the results. $\endgroup$ ... $\begingroup$ I am refering to the least squares estimate of the regression coefficients being unique, though I was making the assumption (not originally stated) that the columns of the x matrix are linearly independent, ... More on stats.stackexchange.com
🌐 stats.stackexchange.com
🌐
Wayne State University
digitalcommons.wayne.edu › cgi › viewcontent.cgi pdf
Least Absolute Value vs. Least Squares Estimation and ...
Open Access research and scholarship produced by Wayne State University community and home of Wayne State University Press Journals.
🌐
Real-Statistics
real-statistics.com › multiple-regression › lad-regression
Least Absolute Deviation (LAD) Regression
Free downloadable statistics software (Excel add-in) plus comprehensive statistics tutorial for carrying out a wide range of statistical analyses in Excel.
🌐
Hong Kong University of Science and Technology
math.hkust.edu.hk › ~makchen › Paper › LAD.pdf pdf
Analysis of least absolute deviation By KANI CHEN
squares or L2 method for statistical analysis of linear regression models. Instead of minimizing the · sum of squared errors, it minimizes the sum of absolute values of errors.
🌐
Taylor & Francis Online
tandfonline.com › home › all journals › mathematics, statistics & data science › communications in statistics - simulation and computation › list of issues › volume 6, issue 4 › least absolute values estimation: an int ....
Least absolute values estimation: an introduction: Communications in Statistics - Simulation and Computation: Vol 6, No 4
A special purpose linear programming algorithm for obtaining least absolute value estimators in a linear model with dummy variables · Source: Communications in Statistics - Simulation and Computation ... Minimization Techniques for Piecewise Differentiable Functions: The $l_1$ Solution to an Overdetermined Linear System ... Erratum—A Note on Sharpe's Algorithm for Minimizing the Sum of Absolute Deviations in a Simple Regression Problem
Find elsewhere
🌐
MathWorks
mathworks.com › matlabcentral › answers › 105516-least-absolute-value-based-regression
Least Absolute Value based regression - MATLAB Answers - MATLAB Central
November 9, 2013 - Hi , I want to use linear regression based on least absolute value deviation to find the coefficients of my model with the help of measured data and 3 independent variables. The number of measur...
🌐
Mobook
mobook.github.io › MO-book › notebooks › 02 › 02-lad-regression.html
2.2 Least Absolute Deviation (LAD) Regression — Companion code for the book "Hands-On Mathematical Optimization with Python"
Suppose that we have a finite dataset consisting of \(n\) points \(\{({X}^{(i)}, y^{(i)})\}_{i=1,\dots,n}\) with \({X}^{(i)} \in \mathbb{R}^k\) and \(y^{(i)} \in \mathbb{R}\). A linear regression model assumes the relationship between the vector of \(k\) regressors \({X}\) and the dependent variable \(y\) is linear. This relationship is modeled through an error or deviation term \(e_i\), which quantifies how much each of the data points diverge from the model prediction and is defined as follows: \[ \begin{equation} e_i:= y^{(i)} - {m}^\top {X}^{(i)} - b = y^{(i)} - \sum_{j=1}^k X^{(i)}_j m_j - b, \end{equation} \] for some real numbers \(m_1,\dots,m_k\) and \(b\). The Least Absolute Deviation (LAD) is a possible statistical optimality criterion for such a linear regression.
🌐
Bradthiessen
bradthiessen.com › html5 › docs › ols.pdf pdf
Why we use “least squares” regression instead of “least ...
Instead of using absolute values, let’s square all the values. If · we need to, we can always take a square root at the end. ... We can now use (relatively) straightforward Calculus methods to find the regression line that will minimize this formula.
🌐
Reddit
reddit.com › r/askstatistics › why least squares regression instead of least absolute value regression?
r/AskStatistics on Reddit: Why Least SQUARES Regression instead of Least ABSOLUTE VALUE Regression?
July 25, 2018 -

Why do we use Least squares, why not absolute value, or cubes, or whatever. I understand visually that it is the square of the vertical distance....but why?

🌐
Readthedocs
gurobi-optimods.readthedocs.io › en › stable › mods › lad-regression.html
Least Absolute Deviation Regression - gurobi-optimods documentation v3.0.0
LADRegression chooses coefficients \(w\) of a linear model \(y = Xw\) so as to minimize the sum of absolute errors on a training dataset \((X, y)\). In other words, it aims to minimize the following loss function: ... The fitting algorithm of the LAD regression Mod is implemented by formulating the loss function as a Linear Program (LP), which is then solved using Gurobi. Here \(I\) is the set of observations and \(J\) the set of fields. Response values \(y_i\) are predicted from predictor values \(x_{ij}\) by fitting coefficients \(w_j\).
🌐
ScienceDirect
sciencedirect.com › science › article › abs › pii › 0169207094900221
Forecasting in least absolute value regression with autocorrelated errors: a small-sample study - ScienceDirect
April 19, 2002 - Least absolute value (LAV) regression is a robust alternative to ordinary least squares (OLS) and is particularly useful when model disturbances follow distributions that are nonnormal and subject to outliers.
🌐
ResearchGate
researchgate.net › publication › 288612305_Least_Squares_versus_Least_Absolute_Deviations_estimation_in_regression_models
(PDF) Least Squares versus Least Absolute Deviations estimation in regression models
December 1, 2009 - In regression problems alternative criteria of “best fit” to least squares are least absolute deviations and least maximum deviations. In this paper it is noted that linear programming techniques may be employed to solve the latter two problems. In particular, if the linear regression relation contains p parameters, minimizing the sum of the absolute value of the “vertical” deviations from the regression line is shown to reduce to a p equation linear programming model with bounded variables; and fitting by the Chebyshev criterion is exhibited to lead to a standard-form p+1 equation linear programming model.
🌐
SAPUB
article.sapub.org › 10.5923.j.statistics.20150503.02.html
Robust Regression by Least Absolute Deviations Method
Loss denotes the seriousness of the nonzero prediction error to the investigator, where prediction error is the difference between the predicted and the observed value of the response variable. Meyer & Glauber (1964) [9] stated that for at least certain economic problems absolute error may be amore satisfactory measure of loss than the squared error. The least absolute deviation errors regression (or for brevity, absolute errors regression) overcomes the aforementioned drawbacks of the least squares regression and provides an attractive alternative.
🌐
RDocumentation
rdocumentation.org › packages › Blossom › versions › 1.4 › topics › lad
lad function - Least absolute deviation
Least absolute deviation (LAD) regression is an alternative to ordinary least squares (OLS) regression that has greater power for thick-tailed symmetric and asymmetric error distributions (Cade and Richards 1996). LAD regression estimates the conditional median (a conditional 0.50 quantile) ...
🌐
Ampl
ampl.com › mo-book › notebooks › 02 › lad-regression.html
LAD Regression — Hands-On Mathematical Optimization with AMPL in Python
Suppose we have a finite dataset consisting of \(n\) points \(\{({X}^{(i)}, y^{(i)})\}_{i=1,\dots,n}\) with \({X}^{(i)} \in \mathbb{R}^k\) and \(y^{(i)} \in \mathbb{R}\). A linear regression model assumes the relationship between the vector of \(k\) regressors \({X}\) and the dependent variable \(y\) is linear. This relationship is modeled through an error or deviation term \(e_i\), which quantifies how much each of the data points diverge from the model prediction and is defined as follows: \[ \begin{equation}\label{eq:regression} e_i:= y^{(i)} - {m}^\top {X}^{(i)} - b = y^{(i)} - \sum_{j=1}^k X^{(i)}_j m_j - b, \end{equation} \] for some real numbers \(m_1,\dots,m_k\) and \(b\). The Least Absolute Deviation (LAD) is a possible statistical optimality criterion for such a linear regression.
🌐
Springer
link.springer.com › home › annals of operations research › article
Estimation and testing in least absolute value regression with serially correlated disturbances | Annals of Operations Research
Least absolute value (LAV) regression provides a robust alternative to least squares, particularly when the disturbances follow distributions that are nonnormal and subject to outliers.
Top answer
1 of 4
24

Both are done.

Least squares is easier, and the fact that for independent random variables "variances add" means that it's considerably more convenient; for examples, the ability to partition variances is particularly handy for comparing nested models. It's somewhat more efficient at the normal (least squares is maximum likelihood), which might seem to be a good justification -- however, some robust estimators with high breakdown can have surprisingly high efficiency at the normal.

But L1 norms are certainly used for regression problems and these days relatively often.

If you use R, you might find the discussion in section 5 here useful:

https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Robust-Regression.pdf

(though the stuff before it on M estimation is also relevant, since it's also a special case of that)

2 of 4
27

I can't help quoting from Huber, Robust Statistics, p.10 on this (sorry the quote is too long to fit in a comment):

Two time-honored measures of scatter are the mean absolute deviation

$$d_n=\frac{1}{n}\sum|x_i-\bar{x}|$$

and the mean square deviation

$$s_n=\left[\frac{1}{n}\sum(x_i-\bar{x})^2\right]^{1/2}$$

There was a dispute between Eddington (1914, p.147) and Fisher (1920, footnote on p. 762) about the relative merits of $d_n$ and $s_n$.[...] Fisher seemingly settled the matter by pointing out that for normal observations $s_n$ is about 12% more efficient than $d_n$.

By the relation between the conditional mean $\hat{y}$ and the unconditional mean $\bar{x}$ a similar argument applies to the residuals.