Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say % of coming from this population, then you reject at this confidence level--so
% is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.
A concrete example: You want to test the claim that the average adult male weight is . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least
% . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation=
, and you assume the mean is
. How do you determine how likely it is for the sample data to come from this population: the
value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.
So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is ? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean
and standard deviation
would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of
is, in a population
by finding its
value. If this
-value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.
Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say % of coming from this population, then you reject at this confidence level--so
% is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.
A concrete example: You want to test the claim that the average adult male weight is . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least
% . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation=
, and you assume the mean is
. How do you determine how likely it is for the sample data to come from this population: the
value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.
So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is ? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean
and standard deviation
would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of
is, in a population
by finding its
value. If this
-value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.
You can reject whatever you want. Sometimes you will be wrong to do so, and some other times you will be wrong when you fail to reject.
But if your aim is to make Type I errors (rejecting the null hypothesis when it is true) less than a certain proportion of times then you need something like an , and given that approach if you want to minimise Type II errors (failing to reject the null hypothesis when it is false) then you need to reject when you have extreme values of the test statistic as shown by the
-value which are suggestive of the alternative hypothesis.
As you say, is an arbitrary number. It comes from RA Fisher, who initially thought that two standard deviations was a reasonable approach, then noted that for a two-sided test with a normal distribution this gave $\alpha \approx 0.0455$, and decided to round it to
.
Title. I get that you do, but my question is why.
Alpha seems to read as "the highest acceptable probability that we reject the null hypothesis based on the data we have when it's actually true if we had better data"
My understanding of the p value is that it is the probability of getting a value more extreme than the sample statistic- but I can't quite wrap my head around what that entails.
Similar question: Why is a p value that is greater than alpha not rejected? Basically the same just might help my head from this angle idk.
Thank you in advance! :)
When do you reject the null hypothesis?
The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.
Remember, rejecting the null hypothesis doesn't prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.
The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.
Can a non-significant p-value indicate that there is no effect or difference in the data?
There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.
Other factors like sample size, study design, and measurement precision can influence the p-value. It's important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.
How does sample size affect the interpretation of p-values?
With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values.
In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.
Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.
The given definition of the p-value, "probability of obtaining a test result, or a more extreme one, given that H0 is true", is more or less fine. I'd say "probability of obtaining the observed test result or a more extreme one". This means that if the p-value is very low, we have seen something so extreme that we wouldn't normally expect such a thing under the null hypothesis, and therefore we reject the H0. The level basically decides how small is too small.
is chosen small and therefore the probability to wrongly reject a true H0 is small.
"...how certain are we that when rejecting the H0, we're actually seeing difference in our samples and the samples are not at random?"
In case your H0 is chosen so that you interpret it as "samples are at random" (which is somewhat ambiguous), the only thing you can be certain about is that something has happened that was not to be expected and will rarely happen under the H0. The only "performance guarantee" is that if you run such tests often, you will only reject a true H0 a proportion of times.
"but we could very well take a group of individuals from population A that have a mean of 150 and another group from population B that have a mean of 180 and the resulting test statistic could be high enough so that the p-value would be < than alpha, thus leading to the rejection of H0, although the populations have the same mean (H0: uA = uB)"
That's true, however if small, this will happen very rarely.
" So we want the risk of committing this type 1 error (if H0 is true) to be less than 5% and only then do we accept our test as statistical significant?"
There is some confusion of terminology here. The term "accept" is not normally used for the situation in which the test is significant. The test is not "accepted" but rather defined to be statistically significant in this case, which is just a different way of saying that the H0 is rejected, i.e., incompatible with the data in the sense explained above.
"I'm just trying to wrap my head around how p-value is actually useful (given that we can very well have a significant value even though it's false - I believe this would be quite bad in clinical trials and I'm assuming the alpha would be even less)"
There's no way around this problem though if we have random variation. You may observe means 150 and 180 even if the distribution within the two groups is actually the same. This is just how it is. You can hardly do better than having a decision rule that occasionally but rarely gets things wrong.
(I should say that there are other criticisms of p-values and statistical tests, but I will not comment on them here. Also, as said in a comment, if you want a probability that the H0 is true or not, you need a Bayesian approach, and you need to specify a prior distribution first. The basic idea of a test is very simple and intuitive and hard to replace: If something happens that is very unexpected under H0, this counts as evidence against H0, made more precise by the p-value.)
In other words, if alpha = 0.05, we know that if H0 is true, there’s a 5% risk that we will incorrectly reject it (commiting a type 1 error if H0 is true). So we want the risk of commiting this type 1 error (if H0 is true) to be less than 5% and only then do we accept our test as statistical significant?
Correct (aside from some infelicity of phrasing).
We have to arrive at some decision rule to use to decide when a test statistic is 'too discrepant' to continue to accept that is a reasonable description of the data.
If a suitable test statistic is chosen, it will tend to behave differently when is true from how it behaves when
is the case. We can then choose to reject
in those situations that most clearly suggest
over
.
We then are left to choose the boundary between the two alternative decisions (usually this critical value is the least discrepant case for which we'd still reject) which tells us the rejection region for the test statistic. This is usually done by choosing an upper limit on the rate at which we would wrongly reject ,
.
Given the usual approach is to fix , we then would like to choose other aspects of the test to give a good chance to reject
when its false (choice of test statistic, choice of rejection region, sample size)
Doesn't the fact that p value being non zero mean there's always a chance of the null being true? Why do we reject then? Can't quite understand the reasoning behind it
We set some value, called $\alpha$, as our maximum tolerance for type I error rate. That is, we accept that our work could reject true null hypotheses $100\alpha\%$ of the time the null hypothesis is true. In the common situation of $\alpha=0.05$, we accept that to be $5\%$. In fact, $\alpha=0.05$ is so common that it typically is implied when no $\alpha$ is specified, and we consider p-values of $0.05$ or smaller to be “small” p-values.
Then we run the test and calculate a p-value. If $p\le\alpha$, we reject the null hypothesis in favor of the alternative hypothesis.
The outcome of a hypothesis test is reported in two ways:
- The p-value is p where p is a given small number.
- The null hypothesis is rejected at the α significance level; usually α = 0.05.
If the p-value p is smaller than α, then the null hypothesis is rejected at the α level. And if the null hypothesis is rejected, we know the corresponding p-value is < α. However, we don't know the exact p-value. It might be 0.049, it might be 0.000001.
The first statement is preferred because it presents more information (the strength of evidence against the null hypothesis). Note that we don't say that p-value is significant or not; it's enough to report the p-value since it's obvious that the null hypothesis will be rejected at all significance levels α > p.
I got into a discussion with someone who arguably deals with statistics much more than I do, but I nevertheless think they are wrong on the subject. He was explaining p-values and alpha-values and how one should reject the null hypothesis if the p-value was less than the alpha value and accept it otherwise. Given everything he explained earlier about what the p-value and alpha values mean, it seems to me that this interpretation is wrong and that rather you should reject (as "statistical likely" of course) the null hypothesis (i.e. accepting the test hypothesis as statistically likely) if the p-value is less than alpha, but not reject the null hypothesis otherwise. The difference is subtle, but if switched the test hypothesis and null hypothesis, then the p-value would become (1-original p-value) while alpha would remain unchanged. By using the same logic, we wouldn't be rejecting the original test hypothesis / accepting the original test hypothesis as statistically likely unless the original p-value were above (1-alpha). So in mind, there's really 3 areas to think about:
p-value <= alpha: Accept test hypothesis as statistically likely.
alpha < p-value < 1-alpha: Not enough statistical evidence to strongly suggest one hypothesis over another
1-alpha <= p-value: Accept the null hypothesis as statistically likely.
This person said I was simply wrong and we had to either reject or accept the null hypothesis -- that we had to make a binary choice. I argued that in that case we should either "reject the null hypothesis" or "not reject the null hypothesis" instead of "reject the null hypothesis" or "accept the null hypothesis". He still claimed I was wrong.
Anyway, it's been bugging me and that's why I'm here. Who's right? Thank you for helping me with my understanding.
I have seen this alternative explanation of the p-value using Schweser's study guide and on Mark Meldrum's videos, but I still do not get the explanation.
I have an intuitive understanding of the p-value, that it is simply an alternative metric to the test statistic for rejecting/failing to reject the null hypothesis (i.e., p-value < alpha -> reject null hypothesis).
I understand it intuitively, as when the p-value is less than alpha, it must mean that the test statistic is greater than (or less than) the critical values/rejection points, and hence the null hypothesis may be rejected.
What I am struggling to understand is why the p-value is also known as "the smallest level of significance at which the null hypothesis is rejected".
If anyone would have an understanding of this explanation, I would greatly appreciate the help.