There are, in fact, two different formulas for standard deviation here: The population standard deviation and the sample standard deviation
.
If denote all
values from a population, then the (population) standard deviation is
where
is the mean of the population.
If denote
values from a sample, however, then the (sample) standard deviation is
where
is the mean of the sample.
The reason for the change in formula with the sample is this: When you're calculating you are normally using
(the sample variance) to estimate
(the population variance). The problem, though, is that if you don't know
you generally don't know the population mean
, either, and so you have to use
in the place in the formula where you normally would use
. Doing so introduces a slight bias into the calculation: Since
is calculated from the sample, the values of
are on average closer to
than they would be to
, and so the sum of squares
turns out to be smaller on average than
. It just so happens that that bias can be corrected by dividing by
instead of
. (Proving this is a standard exercise in an advanced undergraduate or beginning graduate course in statistical theory.) The technical term here is that
(because of the division by
) is an unbiased estimator of
.
Another way to think about it is that with a sample you have independent pieces of information. However, since
is the average of those
pieces, if you know
, you can figure out what
is. So when you're squaring and adding up the residuals
, there are really only
independent pieces of information there. So in that sense perhaps dividing by
rather than
makes sense. The technical term here is that there are
degrees of freedom in the residuals
.
For more information, see Wikipedia's article on the sample standard deviation.
Answer from Mike Spivey on Stack ExchangeThe two forms of standard deviation are relevant to two different types of variability. One is the variability of values within a set of numbers and one is an estimate of the variability of a population from which a sample of numbers has been drawn.
The population standard deviation is relevant where the numbers that you have in hand are the entire population, and the sample standard deviation is relevant where the numbers are a sample of a much larger population.
For any given set of numbers the sample standard deviation is larger than the population standard deviation because there is extra uncertainty involved: the uncertainty that results from sampling. See this for a bit more information: Intuitive explanation for dividing by when calculating standard deviation?
For an example, the population standard deviation of 1,2,3,4,5 is about 1.41 and the sample standard deviation is about 1.58.
My question is similar pnd1987's question. I wish to use a standard deviation in order to appraise the repeatability of a measurement. Suppose I'm measuring one stable thing over and over. A perfect measuring instrument (with a perfect operator) would give the same number over and over. Instead there is variation, and let's assume there's a normal distribution about the mean.
We'd like to appraise the measurement repeatability by the SD of that normal distribution. But we take just N measurements at a time, and hope the SD of those N can estimate the SD of the normal distribution. As N increases, sampleSD and populationSD both converge to the distribution's SD, but for small N, like 5, we get only weak estimates of the distribution's SD. PopulationSD gives an obviously worse estimate than sampleSD, because when N=1 populationSD gives the ridiculous value 0, while sampleSD is correctly indeterminate. However, sampleSD does not correctly estimate the disribution's SD. That is, if we measure N times and take the sampleSD, then measure another N times and take the sampleSD, over and over, and average all the sampleSDs, that average does not converge to the distribution's SD. For N=5, it converges to around 0.94× the distribution SD. (There must be a little theorem here.) SampleSD doesn't quite do what it is said to do.
If the measurement variation is normally distributed, then it would be very nice to know the distribution's SD. For example, we can then determine how many measurements to take in order tolerate the variation. Averages of N measurements are also normally distributed, but with a standard deviation 1/sqrt(N) times the original distribution's.
Note added: the theorem is not so little -- Cochran's Theorem
why is the population standard deviation the square root of the sum of the (values - means)^2 ÷ n , while the sample standard deviation is all that over n - 1? I don't understand why you have to subtract 1 from the number of things.