Videos
Why can a null hypothesis not be accepted?
We can’t accept a null hypothesis because a lack of evidence does not prove something that does not exist. Instead, we fail to reject it.
Failing to reject the null indicates that the sample did not provide sufficient enough evidence to conclude that an effect exists.
If the p-value is greater than the significance level, then you fail to reject the null hypothesis.
What are some problems with the null hypothesis?
What is the difference between a null hypothesis and an alternative hypothesis?
It is the claim that you expect or hope will be true. The null hypothesis and the alternative hypothesis are always mutually exclusive, meaning that only one can be true at a time.
I have been thinking about how we do not accept a null hypothesis if we reject it, and I am not sure if i do not understand it well enough, what I think is that we do not accept the null hypothesis because when we fail to reject the null hypothesis we are only saying that the alternative hypothesis is incorrect but that does not make it impossible to another alternative hypothesis to appear and this one be correct. Please let me know if this is correct
In case that the last paragraph is correct then I do not know why we say that we do not accept the null hypothesis if this is based in how we think things are, would it not be more appropiate to say that the null hypothesis is correct when we compare it to the the alternative that we just reject, because we do not know which alternative hypothesis might make us reject the null
Thank you
You can generally continue to improve your estimate of whatever parameter you might be testing with more data. Stopping data collection once a test achieves some semi-arbitrary degree of significance is a good way to make bad inferences. That analysts may misunderstand a significant result as a sign that the job is done is one of many unintended consequences of the Neyman–Pearson framework, according to which people interpret p values as cause to either reject or fail to reject a null without reservation depending on which side of the critical threshold they fall on.
Without considering Bayesian alternatives to the frequentist paradigm (hopefully someone else will), confidence intervals continue to be more informative well beyond the point at which a basic null hypothesis can be rejected. Assuming collecting more data would just make your basic significance test achieve even greater significance (and not reveal that your earlier finding of significance was a false positive), you might find this useless because you'd reject the null either way. However, in this scenario, your confidence interval around the parameter in question would continue to shrink, improving the degree of confidence with which you can describe your population of interest precisely.
Here's a very simple example in r – testing the null hypothesis that $\mu=0$ for a simulated variable:
One Sample t-test
data: rnorm(99)
t = -2.057, df = 98, p-value = 0.04234
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.377762241 -0.006780574
sample estimates:
mean of x
-0.1922714
Here I just used t.test(rnorm(99)), and I happened to get a false positive (assuming I've defaulted to $\alpha=.05$ as my choice of acceptable false positive error rate). If I ignore the confidence interval, I can claim my sample comes from a population with a mean that differs significantly from zero. Technically the confidence interval doesn't dispute this either, but it suggests that the mean could be very close to zero, or even further from it than I think based on this sample. Of course, I know the null is actually literally true here, because the mean of the rnorm population defaults to zero, but one rarely knows with real data.
Running this again as set.seed(8);t.test(rnorm(99,1)) produces a sample mean of .91, a p = 5.3E-13, and a 95% confidence interval for $\mu=[.69,1.12]$. This time I can be quite confident that the null is false, especially because I constructed it to be by setting the mean of my simulated data to 1.
Still, say it's important to know how different from zero it is; maybe a mean of .8 would be too close to zero for the difference to matter. I can see I don't have enough data to rule out the possibility that $\mu=.8$ from both my confidence interval and from a t-test with mu=.8, which gives a p = .33. My sample mean is high enough to seem meaningfully different from zero according to this .8 threshold though; collecting more data can help improve my confidence that the difference is at least this large, and not just trivially larger than zero.
Since I'm "collecting data" by simulation, I can be a little unrealistic and increase my sample size by an order of magnitude. Running set.seed(8);t.test(rnorm(999,1),mu=.8) reveals that more data continue to be useful after rejecting the null hypothesis of $\mu=0$ in this scenario, because I can now reject a null of $\mu=.8$ with my larger sample. The confidence interval of $\mu=[.90,1.02]$ even suggests I could've rejected null hypotheses up to $\mu=.89$ if I'd set out to do so initially.
I can't revise my null hypothesis after the fact, but without collecting new data to test an even stronger hypothesis after this result, I can say with 95% confidence that replicating my "study" would allow me to reject a $H_0:\mu=.9$. Again, just because I can simulate this easily, I'll rerun the code as set.seed(9);t.test(rnorm(999,1),mu=.9): doing so demonstrates my confidence wasn't misplaced.
Testing progressively more stringent null hypotheses, or better yet, simply focusing on shrinking your confidence intervals is just one way to proceed. Of course, most studies that reject null hypotheses lay the groundwork for other studies that build on the alternative hypothesis. E.g., if I was testing an alternative hypothesis that a correlation is greater than zero, I could test for mediators or moderators in a follow-up study next...and while I'm at it, I'd definitely want to make sure I could replicate the original result.
Another approach to consider is equivalence testing. If you want to conclude that a parameter is within a certain range of possible values, not just different from a single value, you can specify that range of values you'd want the parameter to lie within according to your conventional alternative hypothesis and test it against a different set of null hypotheses that together represent the possibility that the parameter lies outside that range. This last possibility might be most similar to what you had in mind when you wrote:
We have "some evidence" for the alternative to be true, but we can't draw that conclusion. If I really want to draw that conclusion conclusively...
Here's an example using similar data as above (using set.seed(8), rnorm(99) is the same as rnorm(99,1)-1, so the sample mean is -.09). Say I want to test the null hypothesis of two one-sided t-tests that jointly posit that the sample mean is not between -.2 and .2. This corresponds loosely to the previous example's premise, according to which I wanted to test if $\mu=.8$. The difference is that I've shifted my data down by 1, and I'm now going to perform two one-sided tests of the alternative hypothesis that $-.2\le\mu\le.2$. Here's how that looks:
require(equivalence);set.seed(8);tost(rnorm(99),epsilon=.2)
tost sets the confidence level of the interval to 90%, so the confidence interval around the sample mean of -.09 is $\mu=[-.27,.09]$, and p = .17. However, running this again with rnorm(999) (and the same seed) shrinks the 90% confidence interval to $\mu=[-.09,.01]$, which is within the equivalence range specified in the null hypothesis with p = 4.55E-07.
I still think the confidence interval is more interesting than the equivalence test result. It represents what the data suggest the population mean is more specifically than the alternative hypothesis, and suggests I can be reasonably confident that it lies within an even smaller interval than I've specified in the alternative hypothesis. To demonstrate, I'll abuse my unrealistic powers of simulation once more and "replicate" using set.seed(7);tost(rnorm(999),epsilon=.09345092): sure enough, p = .002.
Note first that @Nick Stauner makes some very important arguments regarding optional stopping. If you repeatedly test the data as samples come in, stopping once a test is significant, you're all but guaranteed a significant result. However, a guaranteed result is practically worthless.
In the following, I'll present my best attempts to elaborate on a deductivist, skeptical, falsificationist position. It's certainly not the only one, but I think a rather mainstream one, or at least one with a bit of tradition.
As far as I understand it, Fisher originally introduced significance tests as a first step in data exploration - establish which factors might be worth investigating further. Unless the null hypothesis you've put under test actually was the critical hypothesis your favoured theory depended on (unlikely), in a way, your initial test was rather exploratory in nature. Amongst the possible steps following exploration I see
- Further exploration
- Parameter Estimation
- Prediction & Confirmation
Further exploration consists of follow-up tests where you try to infer if any variables you have information on moderate or interact with your effect. For example, maybe the age of the participants plays a role? Note that such analyses must be clearly labelled as exploratory, or they basically amount to lying. If you stumble upon something, it first requires confirmation. Generally, you should always be clear- both in your thoughts, and in your writings - about when you're working exploratory, and when confirmatory.
Next, once you have established that you have no confidence in one parameter's value being precisely zero - once you have decided you'll for now consider the factor under test to have some influence - one feasible next step could be to further estimate the precise value of the parameter. For example, for now, you've only excluded one value, 0 (assuming a two-sided test). However, your data also cast doubt on many further possible values.
A (100-$\alpha$)% Confidence Interval/CI contains the range of parameter values not rejected at p<$\alpha$, corresponding to the many more possible hypotheses your data also concern beyond your initial H0. Since your test is significant, the value associated with H0 is not amongst them. But many extremely large and small values will also be excluded.
Hume famously argued we can never inductively prove correct a statement. Generally, non-trivial hypotheses are always a lot easier to falsify than to support; being easy to falsify in principle (by being non-trivial, making precise predictions), but yet not being falsified so far is in fact one of the highest virtues of a theory.
So a CI won't get you to proving a specific value. However, it narrows down the candidate set. Maybe the only candidates left alive help you decide between two theories both incompatible with H0. For example, maybe 0 is excluded, but theory 1 predicts a value around 5, and theory 2 predicts a value around 15. If your 95% CI includes 5, but excludes 15, you have now also lost confidence in theory 2, but theory 1 remains in the game. Note that this is actually independent of your initial test being significant - even if 0 is amongst the values not rejected, many values will be rejected. Maybe for some other researchers, some of these values were of interest.
After you have thus somewhat specified your understanding of the effect at hand, you could ideally make a more precise prediction for a follow-up confirmatory experiment that would aim to test a more precise hypothesis you can derive from your current analysis. Admittedly, rejecting your initial statistical null hypothesis wasn't that severe of a test of your original research hypothesis, wasn't it? Many more explanations than the one you prefer do not depend on H0. Also, since you never were in danger to actually accept H0, you were in no position to falsify your favoured theory! So you need a more severe test. Arguably, this is actually what you want; you do not want to prove your theory, you want to put it under increasingly severe tests, attempting to falsify it. Withstanding such genuine (but fair) efforts to disprove it is the best a theory can deliver. But for a severe test, you need a more precise theory than "0 it ain't".
You now have learned multiple important facts concerning a confirmatory study; for example, you have an idea of the variance and effect magnitude in question, allowing you to estimate the required sample size for a follow-up study via power analysis. You can also predict a specific value and assume a region of practical equivalence/ROPE around it. You won't ever be able to prove that this specific value is the true value; however, if the CI from a follow-up experiment falls entirely within your ROPE, you have corroborating evidence for your theory (and possibly brought in trouble the competition).