statistical optimality criterion
least absolute deviations regression method diagram
Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based on … Wikipedia
🌐
Wikipedia
en.wikipedia.org › wiki › Least_absolute_deviations
Least absolute deviations - Wikipedia
November 22, 2024 - {\displaystyle \tau =1/2} gives the standard regression by least absolute deviations and is also known as median regression. The least absolute deviation problem may be extended to include multiple explanators, constraints and regularization, e.g., a linear model with ...
🌐
Wayne State University
digitalcommons.wayne.edu › cgi › viewcontent.cgi pdf
Least Absolute Value vs. Least Squares Estimation and ...
Open Access research and scholarship produced by Wayne State University community and home of Wayne State University Press Journals.
🌐
Real-Statistics
real-statistics.com › multiple-regression › lad-regression
Least Absolute Deviation (LAD) Regression
Free downloadable statistics software (Excel add-in) plus comprehensive statistics tutorial for carrying out a wide range of statistical analyses in Excel.
🌐
Taylor & Francis Online
tandfonline.com › home › all journals › mathematics, statistics & data science › communications in statistics - simulation and computation › list of issues › volume 6, issue 4 › least absolute values estimation: an int ....
Least absolute values estimation: an introduction: Communications in Statistics - Simulation and Computation: Vol 6, No 4
A special purpose linear programming algorithm for obtaining least absolute value estimators in a linear model with dummy variables · Source: Communications in Statistics - Simulation and Computation ... Minimization Techniques for Piecewise Differentiable Functions: The $l_1$ Solution to an Overdetermined Linear System ... Erratum—A Note on Sharpe's Algorithm for Minimizing the Sum of Absolute Deviations in a Simple Regression Problem
🌐
Hong Kong University of Science and Technology
math.hkust.edu.hk › ~makchen › Paper › LAD.pdf pdf
Analysis of least absolute deviation By KANI CHEN
squares or L2 method for statistical analysis of linear regression models. Instead of minimizing the · sum of squared errors, it minimizes the sum of absolute values of errors.
🌐
Bradthiessen
bradthiessen.com › html5 › docs › ols.pdf pdf
Why we use “least squares” regression instead of “least ...
Instead of using absolute values, let’s square all the values. If · we need to, we can always take a square root at the end. ... We can now use (relatively) straightforward Calculus methods to find the regression line that will minimize this formula.
Find elsewhere
🌐
Mobook
mobook.github.io › MO-book › notebooks › 02 › 02-lad-regression.html
2.2 Least Absolute Deviation (LAD) Regression — Companion code for the book "Hands-On Mathematical Optimization with Python"
Suppose that we have a finite dataset consisting of \(n\) points \(\{({X}^{(i)}, y^{(i)})\}_{i=1,\dots,n}\) with \({X}^{(i)} \in \mathbb{R}^k\) and \(y^{(i)} \in \mathbb{R}\). A linear regression model assumes the relationship between the vector of \(k\) regressors \({X}\) and the dependent variable \(y\) is linear. This relationship is modeled through an error or deviation term \(e_i\), which quantifies how much each of the data points diverge from the model prediction and is defined as follows: \[ \begin{equation} e_i:= y^{(i)} - {m}^\top {X}^{(i)} - b = y^{(i)} - \sum_{j=1}^k X^{(i)}_j m_j - b, \end{equation} \] for some real numbers \(m_1,\dots,m_k\) and \(b\). The Least Absolute Deviation (LAD) is a possible statistical optimality criterion for such a linear regression.
🌐
MathWorks
mathworks.com › matlabcentral › answers › 105516-least-absolute-value-based-regression
Least Absolute Value based regression - MATLAB Answers - MATLAB Central
November 9, 2013 - Hi , I want to use linear regression based on least absolute value deviation to find the coefficients of my model with the help of measured data and 3 independent variables. The number of measur...
🌐
ResearchGate
researchgate.net › publication › 288612305_Least_Squares_versus_Least_Absolute_Deviations_estimation_in_regression_models
(PDF) Least Squares versus Least Absolute Deviations estimation in regression models
December 1, 2009 - In regression problems alternative criteria of “best fit” to least squares are least absolute deviations and least maximum deviations. In this paper it is noted that linear programming techniques may be employed to solve the latter two problems. In particular, if the linear regression relation contains p parameters, minimizing the sum of the absolute value of the “vertical” deviations from the regression line is shown to reduce to a p equation linear programming model with bounded variables; and fitting by the Chebyshev criterion is exhibited to lead to a standard-form p+1 equation linear programming model.
🌐
Reddit
reddit.com › r/askstatistics › why least squares regression instead of least absolute value regression?
r/AskStatistics on Reddit: Why Least SQUARES Regression instead of Least ABSOLUTE VALUE Regression?
March 23, 2018 -

Why do we use Least squares, why not absolute value, or cubes, or whatever. I understand visually that it is the square of the vertical distance....but why?

🌐
Readthedocs
gurobi-optimods.readthedocs.io › en › stable › mods › lad-regression.html
Least Absolute Deviation Regression - gurobi-optimods documentation v3.0.0
LADRegression chooses coefficients \(w\) of a linear model \(y = Xw\) so as to minimize the sum of absolute errors on a training dataset \((X, y)\). In other words, it aims to minimize the following loss function: ... The fitting algorithm of the LAD regression Mod is implemented by formulating the loss function as a Linear Program (LP), which is then solved using Gurobi. Here \(I\) is the set of observations and \(J\) the set of fields. Response values \(y_i\) are predicted from predictor values \(x_{ij}\) by fitting coefficients \(w_j\).
🌐
ScienceDirect
sciencedirect.com › science › article › abs › pii › 0169207094900221
Forecasting in least absolute value regression with autocorrelated errors: a small-sample study - ScienceDirect
April 19, 2002 - Least absolute value (LAV) regression is a robust alternative to ordinary least squares (OLS) and is particularly useful when model disturbances follow distributions that are nonnormal and subject to outliers.
🌐
SAPUB
article.sapub.org › 10.5923.j.statistics.20150503.02.html
Robust Regression by Least Absolute Deviations Method
Loss denotes the seriousness of the nonzero prediction error to the investigator, where prediction error is the difference between the predicted and the observed value of the response variable. Meyer & Glauber (1964) [9] stated that for at least certain economic problems absolute error may be amore satisfactory measure of loss than the squared error. The least absolute deviation errors regression (or for brevity, absolute errors regression) overcomes the aforementioned drawbacks of the least squares regression and provides an attractive alternative.
🌐
RDocumentation
rdocumentation.org › packages › Blossom › versions › 1.4 › topics › lad
lad function - Least absolute deviation
Least absolute deviation (LAD) regression is an alternative to ordinary least squares (OLS) regression that has greater power for thick-tailed symmetric and asymmetric error distributions (Cade and Richards 1996). LAD regression estimates the conditional median (a conditional 0.50 quantile) ...
🌐
Ampl
ampl.com › mo-book › notebooks › 02 › lad-regression.html
LAD Regression — Hands-On Mathematical Optimization with AMPL in Python
Suppose we have a finite dataset consisting of \(n\) points \(\{({X}^{(i)}, y^{(i)})\}_{i=1,\dots,n}\) with \({X}^{(i)} \in \mathbb{R}^k\) and \(y^{(i)} \in \mathbb{R}\). A linear regression model assumes the relationship between the vector of \(k\) regressors \({X}\) and the dependent variable \(y\) is linear. This relationship is modeled through an error or deviation term \(e_i\), which quantifies how much each of the data points diverge from the model prediction and is defined as follows: \[ \begin{equation}\label{eq:regression} e_i:= y^{(i)} - {m}^\top {X}^{(i)} - b = y^{(i)} - \sum_{j=1}^k X^{(i)}_j m_j - b, \end{equation} \] for some real numbers \(m_1,\dots,m_k\) and \(b\). The Least Absolute Deviation (LAD) is a possible statistical optimality criterion for such a linear regression.
🌐
Springer
link.springer.com › home › annals of operations research › article
Estimation and testing in least absolute value regression with serially correlated disturbances | Annals of Operations Research
Least absolute value (LAV) regression provides a robust alternative to least squares, particularly when the disturbances follow distributions that are nonnormal and subject to outliers.
🌐
PubMed Central
pmc.ncbi.nlm.nih.gov › articles › PMC3762514
Least Absolute Relative Error Estimation - PMC - NIH
Khoshgoftaar et al (1992) gave sufficient conditions to ensure the strong consistency of the estimators minimizing the sum of squared relative errors: ... ∣ (MRE for minimum relative errors) for nonlinear regression model Yi = f(Xi,β)+εi, where f(x,β) is the regression function and Yi, ...
Top answer
1 of 4
24

Both are done.

Least squares is easier, and the fact that for independent random variables "variances add" means that it's considerably more convenient; for examples, the ability to partition variances is particularly handy for comparing nested models. It's somewhat more efficient at the normal (least squares is maximum likelihood), which might seem to be a good justification -- however, some robust estimators with high breakdown can have surprisingly high efficiency at the normal.

But L1 norms are certainly used for regression problems and these days relatively often.

If you use R, you might find the discussion in section 5 here useful:

https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Robust-Regression.pdf

(though the stuff before it on M estimation is also relevant, since it's also a special case of that)

2 of 4
27

I can't help quoting from Huber, Robust Statistics, p.10 on this (sorry the quote is too long to fit in a comment):

Two time-honored measures of scatter are the mean absolute deviation

$$d_n=\frac{1}{n}\sum|x_i-\bar{x}|$$

and the mean square deviation

$$s_n=\left[\frac{1}{n}\sum(x_i-\bar{x})^2\right]^{1/2}$$

There was a dispute between Eddington (1914, p.147) and Fisher (1920, footnote on p. 762) about the relative merits of $d_n$ and $s_n$.[...] Fisher seemingly settled the matter by pointing out that for normal observations $s_n$ is about 12% more efficient than $d_n$.

By the relation between the conditional mean $\hat{y}$ and the unconditional mean $\bar{x}$ a similar argument applies to the residuals.