why do we reject the null hypothesis when the p-value is small

In statistics, why do you reject the null hypothesis when the p-value is less than the alpha value (the level of significance)

math.stackexchange.com › questions › 582945 › in-statistics-why-do-you-reject-the-null-hypothesis-when-the-p-value-is-less-th

Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say $\text{[math]}$ % of coming from this population, then you reject at this confidence level--so $\text{[math]}$ % is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.

A concrete example: You want to test the claim that the average adult male weight is $\text{[math]}$ . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least $\text{[math]}$ % . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation= $\text{[math]}$ , and you assume the mean is $\text{[math]}$ . How do you determine how likely it is for the sample data to come from this population: the $\text{[math]}$ value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.

So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is $\text{[math]}$ ? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean $\text{[math]}$ and standard deviation $\text{[math]}$ would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of $\text{[math]}$ is, in a population $\text{[math]}$ by finding its $\text{[math]}$ value. If this $\text{[math]}$ -value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.

Answer from user99680 on Stack Exchange

Stack Exchange

math.stackexchange.com › questions › 582945 › in-statistics-why-do-you-reject-the-null-hypothesis-when-the-p-value-is-less-th

statistical inference - In statistics, why do you reject the null hypothesis when the p-value is less than the alpha value (the level of significance) - Mathematics Stack Exchange

Top answer

1 of 7

2 of 7

You can reject whatever you want. Sometimes you will be wrong to do so, and some other times you will be wrong when you fail to reject.

But if your aim is to make Type I errors (rejecting the null hypothesis when it is true) less than a certain proportion of times then you need something like an $\text{[math]}$ , and given that approach if you want to minimise Type II errors (failing to reject the null hypothesis when it is false) then you need to reject when you have extreme values of the test statistic as shown by the $\text{[math]}$ -value which are suggestive of the alternative hypothesis.

As you say, $\text{[math]}$ is an arbitrary number. It comes from RA Fisher, who initially thought that two standard deviations was a reasonable approach, then noted that for a two-sided test with a normal distribution this gave $\alpha \approx 0.0455$, and decided to round it to $\text{[math]}$ .

reddit.com › r/learnmath › why do you reject the null hypothesis when the p value is less than alpha?

r/learnmath on Reddit: Why do you reject the null hypothesis when the p value is less than alpha?

May 9, 2019 -

Title. I get that you do, but my question is why.

Alpha seems to read as "the highest acceptable probability that we reject the null hypothesis based on the data we have when it's actually true if we had better data"

My understanding of the p value is that it is the probability of getting a value more extreme than the sample statistic- but I can't quite wrap my head around what that entails.

Similar question: Why is a p value that is greater than alpha not rejected? Basically the same just might help my head from this angle idk.

Thank you in advance! :)

Top answer

1 of 8

Suppose we have a coin, and the null hypothesis is that this coin is fair, it will come up 50% heads and 50% tails on average. Note that this does not mean that it will come up exactly 50% heads and 50% tails in any given sampling. So if we test it with 100 flips and get 52 heads and 48 tails, this corresponds to a fairly high p value. If you flip a genuinely fair coin 100 times, 52 heads and 48 tails is reasonably likely to occur, so we do not reject the null hypothesis. We haven't actually proven that this is a fair coin, it could be a coin which is biased to come up heads 52% of the time and tails 48% of the time, or a coin which is biased to come up heads 55% of the time and tails 45% of the time and happened to get slightly more heads in this sample, but we don't have much proof of that either. The p value was high, which means what actually happened seems likely. Nothing surprising happened, so we have no reason to suspect the coin is unfair. If we flip a coin 100 times and get 81 heads and 19 tails, then this corresponds to a low p value. If you flip a genuinely fair coin 100 times, you are extremely unlikely to get 81 heads and 19 tails, so this is strong evidence that you didn't flip a fair coin, you flipped an unfair coin. We reject the null hypothesis, because the evidence we've observed and the predictions made by the hypothesis do not match very well. Technically it is possible to get 81 heads and 19 tails with a fair coin, it's just extremely unlikely, so the more likely explanation is that you flipped a biased coin that actually gives heads more often. High p values mean the evidence agrees with the null hypothesis, low p values mean the evidence disagrees with the null hypothesis. The higher or lower they are the stronger this agreement or disagreement is.

2 of 8

You can think of the p value as how likely you would be to get your observed sample estimate if the null hypothesis was true. I sometimes like to think of it using the analogy of a court trial. The null hypothesis (what we start off believing in the absence of any evidence) is that the defendant is not guilty. The alternative hypothesis (claim we're trying to prove) is that a defendant is guilty. at the beginning of the trial your p value was 100%. In other words if the prosecutor comes in and says "I have no evidence to present" you walk out of the Court 100% of the time As evidence is introduced in the trial however your p value decreases. That is the probability that there could be this much evidence against you if you were actually not guilty begins to decrease. (The evidence in statistics is how far away your estimate is from the parameter value under the null hypothesis) eventually if enough evidence is introduced the probability that you could be not guilty given all this evidence against you get so low that the jury flips and decides for the alternative (ie that you're guilty). The point at which we flip from the null hypothesis to the alternative is called the significance level. In other words it's the p value at which we claim something significant happened. That is to say the difference between are observed sample estimate and the parameter value under the null hypothesis is too unlikely to occur just by random chance

Simply Psychology

simplypsychology.org › statistics › understanding p-values and statistical significance

Understanding P-Values and Statistical Significance

August 11, 2025 - Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis.

Top answer

1 of 2

The given definition of the p-value, "probability of obtaining a test result, or a more extreme one, given that H0 is true", is more or less fine. I'd say "probability of obtaining the observed test result or a more extreme one". This means that if the p-value is very low, we have seen something so extreme that we wouldn't normally expect such a thing under the null hypothesis, and therefore we reject the H0. The level $\text{[math]}$ basically decides how small is too small. $\text{[math]}$ is chosen small and therefore the probability to wrongly reject a true H0 is small.

"...how certain are we that when rejecting the H0, we're actually seeing difference in our samples and the samples are not at random?"

In case your H0 is chosen so that you interpret it as "samples are at random" (which is somewhat ambiguous), the only thing you can be certain about is that something has happened that was not to be expected and will rarely happen under the H0. The only "performance guarantee" is that if you run such tests often, you will only reject a true H0 a proportion of $\text{[math]}$ times.

"but we could very well take a group of individuals from population A that have a mean of 150 and another group from population B that have a mean of 180 and the resulting test statistic could be high enough so that the p-value would be < than alpha, thus leading to the rejection of H0, although the populations have the same mean (H0: uA = uB)"

That's true, however if $\text{[math]}$ small, this will happen very rarely.

" So we want the risk of committing this type 1 error (if H0 is true) to be less than 5% and only then do we accept our test as statistical significant?"

There is some confusion of terminology here. The term "accept" is not normally used for the situation in which the test is significant. The test is not "accepted" but rather defined to be statistically significant in this case, which is just a different way of saying that the H0 is rejected, i.e., incompatible with the data in the sense explained above.

"I'm just trying to wrap my head around how p-value is actually useful (given that we can very well have a significant value even though it's false - I believe this would be quite bad in clinical trials and I'm assuming the alpha would be even less)"

There's no way around this problem though if we have random variation. You may observe means 150 and 180 even if the distribution within the two groups is actually the same. This is just how it is. You can hardly do better than having a decision rule that occasionally but rarely gets things wrong.

(I should say that there are other criticisms of p-values and statistical tests, but I will not comment on them here. Also, as said in a comment, if you want a probability that the H0 is true or not, you need a Bayesian approach, and you need to specify a prior distribution first. The basic idea of a test is very simple and intuitive and hard to replace: If something happens that is very unexpected under H0, this counts as evidence against H0, made more precise by the p-value.)

2 of 2

In other words, if alpha = 0.05, we know that if H0 is true, there’s a 5% risk that we will incorrectly reject it (commiting a type 1 error if H0 is true). So we want the risk of commiting this type 1 error (if H0 is true) to be less than 5% and only then do we accept our test as statistical significant?

Correct (aside from some infelicity of phrasing).

We have to arrive at some decision rule to use to decide when a test statistic is 'too discrepant' to continue to accept that $\text{[math]}$ is a reasonable description of the data.

If a suitable test statistic is chosen, it will tend to behave differently when $\text{[math]}$ is true from how it behaves when $\text{[math]}$ is the case. We can then choose to reject $\text{[math]}$ in those situations that most clearly suggest $\text{[math]}$ over $\text{[math]}$ .

We then are left to choose the boundary between the two alternative decisions (usually this critical value is the least discrepant case for which we'd still reject) which tells us the rejection region for the test statistic. This is usually done by choosing an upper limit on the rate at which we would wrongly reject $\text{[math]}$ , $\text{[math]}$ .

Given the usual approach is to fix $\text{[math]}$ , we then would like to choose other aspects of the test to give a good chance to reject $\text{[math]}$ when its false (choice of test statistic, choice of rejection region, sample size)

reddit.com › r/askstatistics › why do we reject null hypothesis for small p value

r/AskStatistics on Reddit: why do we reject null hypothesis for small p value

March 1, 2023 -

Doesn't the fact that p value being non zero mean there's always a chance of the null being true? Why do we reject then? Can't quite understand the reasoning behind it

Top answer

1 of 7

We use statistics for decision making in the presence of uncertainty. Stats does not eliminate uncertainty, it quantifies uncertainty. So a p value has an interpretation of "if the null were true, we would see a result this or more extreme at a rate of p." Alpha is a threshold you select in advance of how extraordinary your data must be to reject your null hypothesis. It can't ever be 0 because we are reasoning about randomly distributed observations. If we've chosen a valid null hypothesis, there's always some chance that we've witnessed some very extraordinary realization of that distribution.

2 of 7

Yes, there is always a chance of the null being true. You are establishing a threshold where you now think it is unlikely to be true with an accompanying risk that is acceptable to you. Remember, you control the alpha value (Type 1 risk). Don't blindly use an alpha of 0.05. If you are in an exploratory phase and plan to perform follow-up experiments, you might reasonably use an alpha value of 0.1. If you are not performing follow-up experiments and the consequences of being wrong are high, you might want to use a more conservative alpha value of 0.01.

Investopedia

investopedia.com › terms › p › p-value.asp

P-Value: What It Is, How to Calculate It, and Examples

October 3, 2025 - The p-value hypothesis test does ... there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis....

Find elsewhere

Google Bing Mojeek

Penn State Statistics

online.stat.psu.edu › statprogram › reviews › statistical-concepts › hypothesis-testing › p-value-approach

S.3.2 Hypothesis Testing (P-Value Approach) | STAT ONLINE

The P-value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t* in the direction of HA if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P-value, 0.0127, is less than ...

Quora

quora.com › Why-do-we-reject-the-null-hypothesis-if-the-p-value-is-less-than-the-significance-level

Why do we reject the null hypothesis if the p value is less than the significance level? - Quora

Answer (1 of 4): We do not reject the null hypothesis because we’re certain it’s not true. We reject it because we are are balancing the costs of rejecting it when it’s true versus the costs of not rejecting it when it’s false. If I use a conventional 5% significant level, then up to 5% of the n...

Quora

quora.com › What-is-the-reason-for-ignoring-small-P-values-in-hypothesis-testing-Why-arent-null-hypotheses-rejected-for-all-small-P-values

What is the reason for ignoring small P values in hypothesis testing? Why aren't null hypotheses rejected for all small P values? - Quora

Answer: It’s not the case of ignoring small P values, but just the Level of Confidence (LoC) that the researcher decided to set up. For example, if LoC is 90%, with a P value lower than 10%, the Level of Significance (LoS), will be significant ...

Stack Exchange

stats.stackexchange.com › questions › 572031 › when-to-reject-a-hypothesis-size-and-p-value

when to reject a hypothesis, size and p-value - Cross Validated

Top answer

1 of 2

We set some value, called $\alpha$, as our maximum tolerance for type I error rate. That is, we accept that our work could reject true null hypotheses $100\alpha\%$ of the time the null hypothesis is true. In the common situation of $\alpha=0.05$, we accept that to be $5\%$. In fact, $\alpha=0.05$ is so common that it typically is implied when no $\alpha$ is specified, and we consider p-values of $0.05$ or smaller to be “small” p-values.

Then we run the test and calculate a p-value. If $p\le\alpha$, we reject the null hypothesis in favor of the alternative hypothesis.

2 of 2

The outcome of a hypothesis test is reported in two ways:

The p-value is p where p is a given small number.
The null hypothesis is rejected at the α significance level; usually α = 0.05.

If the p-value p is smaller than α, then the null hypothesis is rejected at the α level. And if the null hypothesis is rejected, we know the corresponding p-value is < α. However, we don't know the exact p-value. It might be 0.049, it might be 0.000001.

The first statement is preferred because it presents more information (the strength of evidence against the null hypothesis). Note that we don't say that p-value is significant or not; it's enough to report the p-value since it's obvious that the null hypothesis will be rejected at all significance levels α > p.

Wikipedia

en.wikipedia.org › wiki › P-value

p-value - Wikipedia

3 weeks ago - {\displaystyle T} . The lower the ... if it allows us to reject the null hypothesis. All other things being equal, smaller p-values are taken as stronger evidence against the null hypothesis....

Basic concepts

Definition and interpretation

Top answer

1 of 5

121

You never accept, you fail to reject.

2 of 5

Yikes. You are definitely right. You should never ever say “accept” the null hypothesis.

Quora

quora.com › Why-the-p-value-needs-to-be-small-than-the-a-the-significant-value-to-reject-the-null-hypothesis

Why the p-value needs to be small than the a (the significant value) to reject the null hypothesis? - Quora

Answer (1 of 6): Essentially the significance value (we use the cut off 0.05 usually) is COMPLETELY arbitrary and essentially represents how extreme we want our evidence to be before we reject the null hypothesis. 0.05 means that IF the null were true, we would only get evidence that is extreme ...

Simply Psychology

simplypsychology.org › research methodology › what is the null hypothesis & when do you reject the null hypothesis

What Is The Null Hypothesis & When To Reject It

July 31, 2023 - The level of statistical significance is often expressed as a p -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

reddit.com › r/cfa › why is the p-value the "smallest level of significance at which the null hypothesis is rejected"?

r/CFA on Reddit: Why is the p-value the "smallest level of significance at which the null hypothesis is rejected"?

August 11, 2023 -

I have seen this alternative explanation of the p-value using Schweser's study guide and on Mark Meldrum's videos, but I still do not get the explanation.

I have an intuitive understanding of the p-value, that it is simply an alternative metric to the test statistic for rejecting/failing to reject the null hypothesis (i.e., p-value < alpha -> reject null hypothesis).

I understand it intuitively, as when the p-value is less than alpha, it must mean that the test statistic is greater than (or less than) the critical values/rejection points, and hence the null hypothesis may be rejected.

What I am struggling to understand is why the p-value is also known as "the smallest level of significance at which the null hypothesis is rejected".

If anyone would have an understanding of this explanation, I would greatly appreciate the help.

Top answer

1 of 5

If p value < alpha, we reject the null hypothesis. Think about it in the other way. Any value of alpha that is greater than p would mean that we reject the null hypothesis. We can keep decreasing the value of alpha but as soon as it goes below the p value, we can’t reject anymore. Hence the p value is the smallest level since anything smaller and we can’t reject the null

2 of 5

It's hypothetical. I'll try explain with an example. Assume you calculate the avg height of men in your country to be 5.5 ft. Now you ask yourself, what is the probability that the avg. height of men is 4 ft? This becomes our null Now you clearly know it cannot be 4 ft (since you're aware of the avg) but still assume the probability of it being right is around 5% (We set our alpha to be 0.05) and anything below that we can reject it saying that it's almost improbable for the avg of height of men to be 4ft You calculate the p value and it comes out at 0.03 (A 3% probability of you choosing a sample that gives avg height of 4ft) What this implies is, there's a 97% probability of the avg height being greater than 4ft and hence we're rejecting the idea that avg. height of the population cannot be 4ft Since this is less than our required rejection criteria (5%), we reject the null hypothesis of the avg height being 4ft. P value is the small level of significance at which Null Hypothesis (Avg Height is 4ft) is rejected because in our scenario if we chose our alpha to be 0.02, our p-value would've been significant, meaning we would have a higher chance (3%) than our rejection criteria (2%)

PubMed Central

pmc.ncbi.nlm.nih.gov › articles › PMC10232224

Are Only p-Values Less Than 0.05 Significant? A p-Value Greater Than 0.05 Is Also Significant! - PMC

In addition, whether to reject the null hypothesis or not is determined according to whether the calculated significance probability value is greater than or less than the significance level value set in step 2. Therefore, the sentences “The value of the test statistic belongs to the rejection region generated by the value of the significance level” and “The value of the significance probability is smaller than the value of the significance level” are sentences with meaning.