Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say % of coming from this population, then you reject at this confidence level--so % is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.

A concrete example: You want to test the claim that the average adult male weight is . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least % . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation=, and you assume the mean is . How do you determine how likely it is for the sample data to come from this population: the value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.

So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is ? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean and standard deviation would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of is, in a population by finding its value. If this -value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.

Answer from user99680 on Stack Exchange
Top answer
1 of 7
5

Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say % of coming from this population, then you reject at this confidence level--so % is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.

A concrete example: You want to test the claim that the average adult male weight is . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least % . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation=, and you assume the mean is . How do you determine how likely it is for the sample data to come from this population: the value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.

So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is ? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean and standard deviation would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of is, in a population by finding its value. If this -value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.

2 of 7
3

You can reject whatever you want. Sometimes you will be wrong to do so, and some other times you will be wrong when you fail to reject.

But if your aim is to make Type I errors (rejecting the null hypothesis when it is true) less than a certain proportion of times then you need something like an , and given that approach if you want to minimise Type II errors (failing to reject the null hypothesis when it is false) then you need to reject when you have extreme values of the test statistic as shown by the -value which are suggestive of the alternative hypothesis.

As you say, is an arbitrary number. It comes from RA Fisher, who initially thought that two standard deviations was a reasonable approach, then noted that for a two-sided test with a normal distribution this gave $\alpha \approx 0.0455$, and decided to round it to .

🌐
Reddit
reddit.com › r/learnmath › why do you reject the null hypothesis when the p value is less than alpha?
r/learnmath on Reddit: Why do you reject the null hypothesis when the p value is less than alpha?
May 9, 2019 -

Title. I get that you do, but my question is why.

Alpha seems to read as "the highest acceptable probability that we reject the null hypothesis based on the data we have when it's actually true if we had better data"

My understanding of the p value is that it is the probability of getting a value more extreme than the sample statistic- but I can't quite wrap my head around what that entails.

Similar question: Why is a p value that is greater than alpha not rejected? Basically the same just might help my head from this angle idk.

Thank you in advance! :)

Top answer
1 of 8
86
Suppose we have a coin, and the null hypothesis is that this coin is fair, it will come up 50% heads and 50% tails on average. Note that this does not mean that it will come up exactly 50% heads and 50% tails in any given sampling. So if we test it with 100 flips and get 52 heads and 48 tails, this corresponds to a fairly high p value. If you flip a genuinely fair coin 100 times, 52 heads and 48 tails is reasonably likely to occur, so we do not reject the null hypothesis. We haven't actually proven that this is a fair coin, it could be a coin which is biased to come up heads 52% of the time and tails 48% of the time, or a coin which is biased to come up heads 55% of the time and tails 45% of the time and happened to get slightly more heads in this sample, but we don't have much proof of that either. The p value was high, which means what actually happened seems likely. Nothing surprising happened, so we have no reason to suspect the coin is unfair. If we flip a coin 100 times and get 81 heads and 19 tails, then this corresponds to a low p value. If you flip a genuinely fair coin 100 times, you are extremely unlikely to get 81 heads and 19 tails, so this is strong evidence that you didn't flip a fair coin, you flipped an unfair coin. We reject the null hypothesis, because the evidence we've observed and the predictions made by the hypothesis do not match very well. Technically it is possible to get 81 heads and 19 tails with a fair coin, it's just extremely unlikely, so the more likely explanation is that you flipped a biased coin that actually gives heads more often. High p values mean the evidence agrees with the null hypothesis, low p values mean the evidence disagrees with the null hypothesis. The higher or lower they are the stronger this agreement or disagreement is.
2 of 8
13
You can think of the p value as how likely you would be to get your observed sample estimate if the null hypothesis was true. I sometimes like to think of it using the analogy of a court trial. The null hypothesis (what we start off believing in the absence of any evidence) is that the defendant is not guilty. The alternative hypothesis (claim we're trying to prove) is that a defendant is guilty. at the beginning of the trial your p value was 100%. In other words if the prosecutor comes in and says "I have no evidence to present" you walk out of the Court 100% of the time As evidence is introduced in the trial however your p value decreases. That is the probability that there could be this much evidence against you if you were actually not guilty begins to decrease. (The evidence in statistics is how far away your estimate is from the parameter value under the null hypothesis) eventually if enough evidence is introduced the probability that you could be not guilty given all this evidence against you get so low that the jury flips and decides for the alternative (ie that you're guilty). The point at which we flip from the null hypothesis to the alternative is called the significance level. In other words it's the p value at which we claim something significant happened. That is to say the difference between are observed sample estimate and the parameter value under the null hypothesis is too unlikely to occur just by random chance
🌐
Simply Psychology
simplypsychology.org › statistics › understanding p-values and statistical significance
Understanding P-Values and Statistical Significance
August 11, 2025 - Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis.
People also ask

When do you reject the null hypothesis?
In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test.

The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn't prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.
🌐
simplypsychology.org
simplypsychology.org › statistics › understanding p-values and statistical significance
Understanding P-Values and Statistical Significance
Can a non-significant p-value indicate that there is no effect or difference in the data?
No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It's important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.
🌐
simplypsychology.org
simplypsychology.org › statistics › understanding p-values and statistical significance
Understanding P-Values and Statistical Significance
How does sample size affect the interpretation of p-values?
Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values.

In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.
🌐
simplypsychology.org
simplypsychology.org › statistics › understanding p-values and statistical significance
Understanding P-Values and Statistical Significance
🌐
Open Textbook BC
opentextbc.ca › researchmethods › chapter › understanding-null-hypothesis-testing
Understanding Null Hypothesis Testing – Research Methods in Psychology – 2nd Canadian Edition
October 13, 2015 - A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample result would be likely if the null hypothesis were true and ...
🌐
WMed
wmed.edu › sites › default › files › P-VALUES SIMPLIFIED.pdf pdf
P-VALUES SIMPLIFIED Preface
Hence, larger p-values result in failure to reject the null hypothesis. ***************************************************** Conversely, a small p-value means that there is a lesser chance that the data support the null hypothesis. Thereby lending acceptance of the alternative hypothesis.
Top answer
1 of 2
2

The given definition of the p-value, "probability of obtaining a test result, or a more extreme one, given that H0 is true", is more or less fine. I'd say "probability of obtaining the observed test result or a more extreme one". This means that if the p-value is very low, we have seen something so extreme that we wouldn't normally expect such a thing under the null hypothesis, and therefore we reject the H0. The level basically decides how small is too small. is chosen small and therefore the probability to wrongly reject a true H0 is small.

"...how certain are we that when rejecting the H0, we're actually seeing difference in our samples and the samples are not at random?"

In case your H0 is chosen so that you interpret it as "samples are at random" (which is somewhat ambiguous), the only thing you can be certain about is that something has happened that was not to be expected and will rarely happen under the H0. The only "performance guarantee" is that if you run such tests often, you will only reject a true H0 a proportion of times.

"but we could very well take a group of individuals from population A that have a mean of 150 and another group from population B that have a mean of 180 and the resulting test statistic could be high enough so that the p-value would be < than alpha, thus leading to the rejection of H0, although the populations have the same mean (H0: uA = uB)"

That's true, however if small, this will happen very rarely.

" So we want the risk of committing this type 1 error (if H0 is true) to be less than 5% and only then do we accept our test as statistical significant?"

There is some confusion of terminology here. The term "accept" is not normally used for the situation in which the test is significant. The test is not "accepted" but rather defined to be statistically significant in this case, which is just a different way of saying that the H0 is rejected, i.e., incompatible with the data in the sense explained above.

"I'm just trying to wrap my head around how p-value is actually useful (given that we can very well have a significant value even though it's false - I believe this would be quite bad in clinical trials and I'm assuming the alpha would be even less)"

There's no way around this problem though if we have random variation. You may observe means 150 and 180 even if the distribution within the two groups is actually the same. This is just how it is. You can hardly do better than having a decision rule that occasionally but rarely gets things wrong.

(I should say that there are other criticisms of p-values and statistical tests, but I will not comment on them here. Also, as said in a comment, if you want a probability that the H0 is true or not, you need a Bayesian approach, and you need to specify a prior distribution first. The basic idea of a test is very simple and intuitive and hard to replace: If something happens that is very unexpected under H0, this counts as evidence against H0, made more precise by the p-value.)

2 of 2
0

In other words, if alpha = 0.05, we know that if H0 is true, there’s a 5% risk that we will incorrectly reject it (commiting a type 1 error if H0 is true). So we want the risk of commiting this type 1 error (if H0 is true) to be less than 5% and only then do we accept our test as statistical significant?

Correct (aside from some infelicity of phrasing).

We have to arrive at some decision rule to use to decide when a test statistic is 'too discrepant' to continue to accept that is a reasonable description of the data.

If a suitable test statistic is chosen, it will tend to behave differently when is true from how it behaves when is the case. We can then choose to reject in those situations that most clearly suggest over .

We then are left to choose the boundary between the two alternative decisions (usually this critical value is the least discrepant case for which we'd still reject) which tells us the rejection region for the test statistic. This is usually done by choosing an upper limit on the rate at which we would wrongly reject , .

Given the usual approach is to fix , we then would like to choose other aspects of the test to give a good chance to reject when its false (choice of test statistic, choice of rejection region, sample size)

🌐
Reddit
reddit.com › r/askstatistics › why do we reject null hypothesis for small p value
r/AskStatistics on Reddit: why do we reject null hypothesis for small p value
March 1, 2023 -

Doesn't the fact that p value being non zero mean there's always a chance of the null being true? Why do we reject then? Can't quite understand the reasoning behind it

🌐
Investopedia
investopedia.com › terms › p › p-value.asp
P-Value: What It Is, How to Calculate It, and Examples
October 3, 2025 - The p-value hypothesis test does ... there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis....
Find elsewhere
🌐
Penn State Statistics
online.stat.psu.edu › statprogram › reviews › statistical-concepts › hypothesis-testing › p-value-approach
S.3.2 Hypothesis Testing (P-Value Approach) | STAT ONLINE
The P-value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t* in the direction of HA if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P-value, 0.0127, is less than ...
🌐
Quora
quora.com › Why-do-we-reject-the-null-hypothesis-if-the-p-value-is-less-than-the-significance-level
Why do we reject the null hypothesis if the p value is less than the significance level? - Quora
Answer (1 of 4): We do not reject the null hypothesis because we’re certain it’s not true. We reject it because we are are balancing the costs of rejecting it when it’s true versus the costs of not rejecting it when it’s false. If I use a conventional 5% significant level, then up to 5% of the n...
🌐
Quora
quora.com › What-is-the-reason-for-ignoring-small-P-values-in-hypothesis-testing-Why-arent-null-hypotheses-rejected-for-all-small-P-values
What is the reason for ignoring small P values in hypothesis testing? Why aren't null hypotheses rejected for all small P values? - Quora
Answer: It’s not the case of ignoring small P values, but just the Level of Confidence (LoC) that the researcher decided to set up. For example, if LoC is 90%, with a P value lower than 10%, the Level of Significance (LoS), will be significant ...
🌐
Wikipedia
en.wikipedia.org › wiki › P-value
p-value - Wikipedia
3 weeks ago - {\displaystyle T} . The lower the ... if it allows us to reject the null hypothesis. All other things being equal, smaller p-values are taken as stronger evidence against the null hypothesis....
🌐
PubMed Central
pmc.ncbi.nlm.nih.gov › articles › PMC2897873
Misleading P-Value:Do You Recognise It? - PMC
Moreover, the P-value does not ... small P-value signifies that the evidence in favour of the null hypothesis is weak and that the likelihood of the observed differences due to chance is so small that the null hypothesis is unlikely to be true.3,8–10 The rejection of the ...
🌐
Scribbr
scribbr.com › home › understanding p values | definition and examples
Understanding P-values | Definition and Examples
June 22, 2023 - The p value is a number, calculated ... to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis....
🌐
Reddit
reddit.com › r/statistics › p-values and "not rejecting" vs. "accepting" null hypothesis.
r/statistics on Reddit: P-values and "not rejecting" vs. "accepting" null hypothesis.
February 10, 2019 -

I got into a discussion with someone who arguably deals with statistics much more than I do, but I nevertheless think they are wrong on the subject. He was explaining p-values and alpha-values and how one should reject the null hypothesis if the p-value was less than the alpha value and accept it otherwise. Given everything he explained earlier about what the p-value and alpha values mean, it seems to me that this interpretation is wrong and that rather you should reject (as "statistical likely" of course) the null hypothesis (i.e. accepting the test hypothesis as statistically likely) if the p-value is less than alpha, but not reject the null hypothesis otherwise. The difference is subtle, but if switched the test hypothesis and null hypothesis, then the p-value would become (1-original p-value) while alpha would remain unchanged. By using the same logic, we wouldn't be rejecting the original test hypothesis / accepting the original test hypothesis as statistically likely unless the original p-value were above (1-alpha). So in mind, there's really 3 areas to think about:

  • p-value <= alpha: Accept test hypothesis as statistically likely.

  • alpha < p-value < 1-alpha: Not enough statistical evidence to strongly suggest one hypothesis over another

  • 1-alpha <= p-value: Accept the null hypothesis as statistically likely.

This person said I was simply wrong and we had to either reject or accept the null hypothesis -- that we had to make a binary choice. I argued that in that case we should either "reject the null hypothesis" or "not reject the null hypothesis" instead of "reject the null hypothesis" or "accept the null hypothesis". He still claimed I was wrong.

Anyway, it's been bugging me and that's why I'm here. Who's right? Thank you for helping me with my understanding.

🌐
Quora
quora.com › Why-the-p-value-needs-to-be-small-than-the-a-the-significant-value-to-reject-the-null-hypothesis
Why the p-value needs to be small than the a (the significant value) to reject the null hypothesis? - Quora
Answer (1 of 6): Essentially the significance value (we use the cut off 0.05 usually) is COMPLETELY arbitrary and essentially represents how extreme we want our evidence to be before we reject the null hypothesis. 0.05 means that IF the null were true, we would only get evidence that is extreme ...
🌐
Simply Psychology
simplypsychology.org › research methodology › what is the null hypothesis & when do you reject the null hypothesis
What Is The Null Hypothesis & When To Reject It
July 31, 2023 - The level of statistical significance is often expressed as a p -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
🌐
Reddit
reddit.com › r/cfa › why is the p-value the "smallest level of significance at which the null hypothesis is rejected"?
r/CFA on Reddit: Why is the p-value the "smallest level of significance at which the null hypothesis is rejected"?
August 11, 2023 -

I have seen this alternative explanation of the p-value using Schweser's study guide and on Mark Meldrum's videos, but I still do not get the explanation.

I have an intuitive understanding of the p-value, that it is simply an alternative metric to the test statistic for rejecting/failing to reject the null hypothesis (i.e., p-value < alpha -> reject null hypothesis).

I understand it intuitively, as when the p-value is less than alpha, it must mean that the test statistic is greater than (or less than) the critical values/rejection points, and hence the null hypothesis may be rejected.

What I am struggling to understand is why the p-value is also known as "the smallest level of significance at which the null hypothesis is rejected".

If anyone would have an understanding of this explanation, I would greatly appreciate the help.

Top answer
1 of 5
20
If p value < alpha, we reject the null hypothesis. Think about it in the other way. Any value of alpha that is greater than p would mean that we reject the null hypothesis. We can keep decreasing the value of alpha but as soon as it goes below the p value, we can’t reject anymore. Hence the p value is the smallest level since anything smaller and we can’t reject the null
2 of 5
6
It's hypothetical. I'll try explain with an example. Assume you calculate the avg height of men in your country to be 5.5 ft. Now you ask yourself, what is the probability that the avg. height of men is 4 ft? This becomes our null Now you clearly know it cannot be 4 ft (since you're aware of the avg) but still assume the probability of it being right is around 5% (We set our alpha to be 0.05) and anything below that we can reject it saying that it's almost improbable for the avg of height of men to be 4ft You calculate the p value and it comes out at 0.03 (A 3% probability of you choosing a sample that gives avg height of 4ft) What this implies is, there's a 97% probability of the avg height being greater than 4ft and hence we're rejecting the idea that avg. height of the population cannot be 4ft Since this is less than our required rejection criteria (5%), we reject the null hypothesis of the avg height being 4ft. P value is the small level of significance at which Null Hypothesis (Avg Height is 4ft) is rejected because in our scenario if we chose our alpha to be 0.02, our p-value would've been significant, meaning we would have a higher chance (3%) than our rejection criteria (2%)
🌐
PubMed Central
pmc.ncbi.nlm.nih.gov › articles › PMC10232224
Are Only p-Values Less Than 0.05 Significant? A p-Value Greater Than 0.05 Is Also Significant! - PMC
In addition, whether to reject the null hypothesis or not is determined according to whether the calculated significance probability value is greater than or less than the significance level value set in step 2. Therefore, the sentences “The value of the test statistic belongs to the rejection region generated by the value of the significance level” and “The value of the significance probability is smaller than the value of the significance level” are sentences with meaning.