In both scenarios and
are unknown. The bottom formula is using the assumption that
and attempting to estimate that shared variance by pooling all observations together and calculating a weighted mean. Thus, the factor on the left plays the role of both
and
in the bottom equation. This method is usually used when you have small sample sizes and the equal variance assumption is plausible.
In both scenarios and
are unknown. The bottom formula is using the assumption that
and attempting to estimate that shared variance by pooling all observations together and calculating a weighted mean. Thus, the factor on the left plays the role of both
and
in the bottom equation. This method is usually used when you have small sample sizes and the equal variance assumption is plausible.
There are two different versions of the two-sample t test in common usage.
Pooled. The assumption, often unwarranted in practice, is made that the two populations have the same variance In that case one seeks to estimate the common population variance, using both of the sample variances, to obtain what is called a pooled estimate
.
If the two sample sizes are equal, then this is simply
But if sample sizes differ, then greater weight is put on the sample variance from the larger sample. The weights use the degrees of freedom
instead of the
The first factor under the radical in your
is
Under the assumption of equal population variances,
the standard deviation of
(estimated standard error)
is your
.
Consequently, the -statistic is
. Under the null hypothesis that population means
and
are equal, this
-statistic has Student's T distribution with
degrees of freedom.
Separate variances (Welch). The assumption of equal population variances is not made. Then the variance of
is
This variance is estimated by
So the (estimated) standard error is
So your first formula
is has typos and is incorrect. This may account for "ludicrous" difference you are getting. If
, then you should get
But the two (estimated) standard errors will not necessarily be equal if sample sizes differ.
An crucial difference between the pooled and Welch t tests is that the Welch test uses a rather complicated formula involving both sample sizes and sample variances for the degrees of freedom (DF). The Welch DF is always between the minimum of and
on the one hand and
on the other. So if both sample sizes are moderately large both
-statistics will be nearly normally distributed when
The Welch
-statistic is only approximate, but simulation studies have shown that it is a very accurate approximation over a large variety of sample sizes (equal and not) and population variances (equal or not).
The current consensus among applied statisticians is always to use the Welch t test and not worry about whether population variances are equal. Most statistical computer packages use the Welch procedure by default and the pooled procedure only if specifically requested.
Videos
What is standard error?
What’s the difference between standard error and standard deviation?
What’s the difference between a point estimate and an interval estimate?
You seem to be thinking that .
This is not the case for independent variables.
For independent,
Further,
(if the are independent of each other).
http://en.wikipedia.org/wiki/Variance#Basic_properties
In summary: the correct term:
has
terms because we're looking at averages and that's the variance of an average of independent random variables;
has a
because the two samples are independent, so their variances (of the averages) add; and
has a square root because we want the standard deviation of the distribution of the difference in sample means (the standard error of the difference in means). The part under the bar of the square root is the variance of the difference (the square of the standard error). Taking square roots of squared standard errors gives us standard errors.
The reason why we don't just add standard errors is standard errors don't add - the standard error of the difference in means is NOT the sum of the standard errors of the sample means for independent samples - the sum will always be too large. The variances do add, though, so we can use that to work out the standard errors.
Here's some intuition about why it's not standard deviations that add, rather than variances.
To make things a little simpler, just consider adding random variables.
If , why is
?
Imagine (for
); that is,
and
are perfectly linearly dependent. That is, they always 'move together' in the same direction and in proportion.
Then - which is simply a rescaling. Clearly
.
That is, when and
are perfectly positively linearly dependent, always moving up or down together, standard deviations add.
When they don't always move up or down together, sometimes they move opposite directions. That means that their movements partly 'cancel out', yielding a smaller standard deviation than the direct sum.
Algebraic intuition
The standard error of the mean for independent observations is
where
is the standard deviation.
So if we have two independent samples we have the standard errors for the means of group 1 and group 2.
If we square these values we get the variance of the mean:
The variance of the sum or difference of two independent random variables is the sum of the two variances. Thus,
So if we want the standard error of the difference we take the square root of the variance:
So I imagine this is intuitive if the component steps are intuitive. In particular it helps if you find intuitive the idea that the variance of the sum of independent variables is the sum of the variances of the component variables.
Fuzzy Intuition
In terms of more general intuition, if and
then the standard error of the difference between means will be
. It makes sense that this value of approximately 1.4 is greater than 1 (i.e., the variance of a variable after adding a constant; i.e., equivalent to one sample t-test) and less than 2 (i.e., the standard deviation of the sum of two perfectly correlated variables (with equal variance) and the standard error implied by the formula you mention:
).
Let's say I recover B1 and B2 from a regression (y = B0 + B1x1 + B2x2), and a quantity of interest is the difference between the two: B2 - B1. What's the formula for the standard error of this quantity?