statistical property
confidence interval - What does standard error of the mean ACTUALLY show? - Cross Validated
Statistics question: Why is the standard error, which is calculated from 1 sample, a good approximation for the spread of many hypothetical means? - Cross Validated
Sampling Error vs Standard Error of the Sample Mean - CFA Level I - AnalystForum
How does the standard error work? - Cross Validated
What is standard error?
Whatβs the difference between standard error and standard deviation?
Whatβs the difference between a point estimate and an interval estimate?
Videos
My understanding is that standard error is essentially a measure of how different the means you obtain when you sample from a population will be. According to statistical theory, if you have a population, and you take a sample of this population, you can calculate standard deviation by comparing each value to the mean of your sample. But then, when you take that number and simply divide it by the square root of your sample size, then voila, you magically know how spread out the mean of every single sample you could ever take of that population is.
To me, that seems like a HUGE stretch that you can make such a huge assumption. It is already a bit of a stretch to think that your sample is a decent representation of an actual population mean, and sure, I get that these formulas are actually just estimates rather than concrete math. But I never would have guessed that the deviation of a sample, divided by a modification of the sample size, could tell you how much any mean sample could ever vary, ever.
Am I way off in assuming this? Am I missing something that should make me think more clearly about this all?
As I understand it, the standard error is the spread of many sample means in an attempt to gauge how precise (not accurate) our estimate of the population mean is, but what if there's just the one sample?
Very short
A sample is not just one sample but contains many individual observations. Each of the observations can be considered as a sample (is there a difference between '$n$ samples of size 1' and '1 sample of size $n$'?). So you actually have multiple samples that can help to estimate the standard error in sample means.
In order to estimate the variance of the mean of samples, would you rather have a sample of size one million or multiple (say a hundred) samples of ten?
A bit longer
A sample will almost never be picked such that it perfectly matches the population. Sometimes a sample might pick relatively low values, sometimes a sample might pick relatively high values.
The variation in the sample mean, due to these random variations in picking the sample, is related to the variation in the population that is sampled. If the population has a wide spread in high and low values, than the deviations in a random samples with relatively high/low values will be corresponding to this wide spread and they will be large.
The error/variation in the means of samples relates to the variance of the population. So we can estimate the former with the help of an estimate of the latter. We can estimate the variance of sample.means by the variance of the population. And for this estimate of the variance of the population, one single sample is sufficient.
In formula form
The variance of the sample means $\sigma_n$ where the samples are of size $n$ is related to the variance of the population $\sigma$ $$\sigma_n = \frac{\sigma}{\sqrt{n}}$$
So an estimate of $\sigma$, for which a single sample is sufficient, can also be used to estimate $\sigma_n$.
I propose to put some visuals/intuition to your question... using an empirical approach (bootstrapping) to make it more concrete, especially in reference to the following:
Usually experiments can't or just aren't repeated and only have 1 sample from a population
As you highlighted it, we are talking about the standard error of a statistic (the mean in our case). So, Let's assume that you have a random sample of 20 people's height from a given country:
## [1] 192.3214 144.4797 151.3796 155.2519 147.5844 147.9056 171.1867 159.3074
## [9] 163.0097 190.9857 165.8155 198.2192 192.2418 165.3628 186.9498 167.3355
## [17] 148.6400 156.6933 160.8472 174.4827
From this sample, you get a mean of 167 and a standard deviation of 17.
You have only one random sample, but you can imagine that if you could take another one, you might get similar values, sometimes duplicates or sometimes more extreme values... but something that will look like to your initial random sample.
So, from these initial sample values and without inventing new ones (only resampling with replacement), you can imagine many other samples. For example, we can imagine three as follows:
## [1] 165.8155 159.3074 148.6400 165.3628 155.2519 151.3796 192.2418 163.0097
## [9] 159.3074 192.2418 186.9498 163.0097 144.4797 198.2192 159.3074 190.9857
## [17] 165.3628 159.3074 167.3355 156.6933
## [1] 147.5844 147.9056 151.3796 163.0097 167.3355 159.3074 167.3355 156.6933
## [9] 156.6933 159.3074 147.9056 190.9857 192.2418 171.1867 198.2192 147.9056
## [17] 155.2519 167.3355 148.6400 165.8155
## [1] 192.2418 198.2192 156.6933 192.3214 148.6400 192.3214 198.2192 165.8155
## [9] 167.3355 144.4797 163.0097 148.6400 159.3074 163.0097 163.0097 174.4827
## [17] 165.3628 165.8155 174.4827 159.3074
Their respective mean will be different from the initial one... but what is interesting is that if we repeat this resampling exercise 10,000 times, for instance, and we calculate the mean for each of these generated samples, we will get something like that (leaving the R code here, just to illustrate it), a distribution of means centered around the initial sample mean:
set.seed(007)
spl <- 167+17*scale(rnorm(20))[,1] #Forcing to have same mean and sd for all samples
library(boot)
myFunc <- function(data, i){
return(mean(data[i]))
}
bootMean <- boot(spl , statistic=myFunc, R=10000)
hist(bootMean$t, xlim=c(150,185), main="Sample size n=20")
abline(v=mean(spl), col="blue")
So, the histogram above represents the distribution of means of 10,000 samples⦠that we constructed from our initial sample. Empirically, we can determine the standard deviation of this (sampling) distribution (which is our standard error of the mean):
sd(bootMean$t)
## [1] 3.74095
Interestingly enough, if we calculate the formula for the standard error $\frac{s}{\sqrt n}$, we get something very similar:
sd(spl)/sqrt(20)
## [1] 3.801316
The standard error of the mean tells us about the spread our data around the mean.
To finish this intuitive overview, let's see what happen if we increase our initial sample size (to understand the impact of this $\sqrt{n}$).
So, if we increase the sample size, the standard error gets unsurprisingly smaller... we reduce the error in estimating the population mean. Again, we can empirically see that the formula still holds:
sd(bootMean$t)
## [1] 0.7740625
sd(spl)/sqrt(500)
## [1] 0.7602631
Yes, the standard error of the mean (SEM) is the standard deviation (SD) of the means. (Standard error is another way to say SD of a sampling distribution. In this case, the sampling distribution is means for samples of a fixed size, say N.) There is a mathematical relationship between the SEM and the population SD: SEM = population SD / the square root of N. This mathematical relationship is very helpful, since we almost never have a direct estimate of the SEM but we do have an estimate of the population SD (namely the SD of our sample). As to your second question, if you were to collect multiple samples of size N and calculate the mean for each sample you could estimate the SEM simply by calculating the SD of the means. So the formula for SEM does indeed mirror the formula for the SD of a single sample.
Suppose $X_1, X_2, \ldots, X_n$ are independent and identically distributed. This is the situation I am pretty sure you are referring to. Let their common mean be $\mu$ and their common variance be $\sigma^2$.
Now the sample mean is $X_b=\sum_i X_i/n$. Linearity of expectation shows that the mean of $X_b$ is also $\mu$. The independence assumption implies the variance of $X_b$ is the sum of the variances of its terms. Each such term $X_i/n$ has variance $\sigma^2/n^2$ (because the variance of a constant times a random variable is the constant squared times the variance of the random variable). We have $n$ identically distributed such variables to sum, so each term has that same variance. As a result, we get $n \sigma^2/n^2 = \sigma^2/n$ for the variance of the sample mean.
Usually we do not know $\sigma^2$ and so we must estimate it from the data. Depending on the setting, there are various ways to do this. The two most common, general-purpose estimates of $\sigma^2$ are the sample variance $s^2 = \frac{1}{n}\sum_i(X_i-X_b)^2$ and a small multiple of it, $s_u^2 = \frac{n}{n-1}s^2$ (which is an unbiased estimator of $\sigma^2$). Using either one of these in place of $\sigma^2$ in the preceding paragraph and taking the square root gives the standard error in the form of $s/\sqrt{n}$ or $s_u/\sqrt{n}$.