Does the software convert respect distribution to Normal distribution for calculating the p value? No. It does each one in its own way. On occasions there may be a normal approximation of the null distribution that's involved for large sample sizes, but it's case-by-case basis, not the usual thing. For example, with a Wilcoxon-Mann-Whitney two sample test; the null distribution of the test statistic is discrete, but we can also look at a standardized version of the statistic [U - μ₀]/σ₀ (where μ₀ and σ₀ are the mean and standard deviation of the null distribution when H0 is true, and are both functions of the two sample sizes). Asymptotically (as both sample sizes go to infinity) the distribution of that standardized statistic will go to a standard normal distribution when H0 is true and at sufficiently large sample sizes a normal approximation will give perfectly useful approximations to any but the most extreme p-values (those may be inaccurate but given they should be be much smaller than anything you'd care to compare them with, it's not practically a big issue). Nevertheless when there are no ties, R uses the exact discrete distribution until the sample sizes are very large; it has a special function for doing that (and you can use that function directly, its pwilcox). By contrast an F statistic has an F distribution and R won't be using any normal approximation there. While I haven't checked, I presume it's converting the F to an inverse regularized incomplete beta (a beta quantile function) except when the denominator df are extremely large, when it might use the chi-squared instead. edit: just checked the help (you can do that for almost all the tests built into R; it usually documents exactly what functions - with references - it uses). It said this: For ‘pf’, via ‘pbeta’ (or for large ‘df2’, via ‘pchisq’). For ‘qf’, via ‘qchisq’ for large ‘df2’, else via ‘qbeta’. Yeah, so essentially exactly what I said. When distributions are simply and closely related (on which see the Wikipedia pages on the various distributions, which usually have a section pointing out the main such relationships), it makes sense not to have to get (or write from scratch) and then test a whole new function for the quantile function but to get a related one that's really well tested and stable and use it again. Much less scope for mistakes if your code is a couple of lines of simple transformation and a call to something you already know works. There's a number of those relationships that get used in practice for calculating p values Could someone explain the concept behind NULL distribution with an simulated example if possible? Sure. First thing is to forget p-values to start with -- they're not part of the Neyman-Pearson framework, you don't need them for anything in it. Indeed p-values are from Fisher's approach, although the concept - albeit less clearly expressed - is older. Everything is easier to explain without them and then we can bring them back in at the end if needed. You choose some test statistic that behaves differently when H0 is true and when H1 is true. What you want to do is picking the values of the test statistic that are most consistent with H1 (e.g. in the sense that they're much more likely to occur under H1 than under H0) and reject H0 when you are in that region (or those regions -- you might have several disjoint parts of the distribution that are like that). Clearly there's some boundary between the part where you reject and where you don't and the most usual convention includes the boundary in the rejection region (though that distinction doesn't matter for a continuously distributed statistic). You can move those boundaries to make the overall rejection region smaller or larger. Now under the usual framework we choose some type I error rate, alpha1. This allows us to fix our boundary. For the moment I'll assume we have our test statistic so that the rejection region is one side of a single boundary value (the critical value) and the non-rejection region is on the other side. For example, for a two-tailed t-test, we'd look at the distribution of |t|, the absolute value of t, and then the rejection region would be in the right tail only. (It can potentially get more complicated than that ... but this will cover almost every case of actual hypothesis tests in practice.) If you move the critical value further toward the "most strongly consistent with H1" parts, the rejection region is smaller, so the type I error rate is smaller, and if you move it out into the "less strongly consistent with H1" parts, the region is larger, the type I error rate is larger. You should get a collection of nested rejection regions, where any larger region has at least the type I error rate of any of the regions within it2. So given nested rejection regions, you then simply move your critical value to the least extreme value it would have without making the type I error rate exceed alpha. That way you reject the most cases you can without exceeding your type I error budget. This makes your power as high as you can given the test statistic being used. Often you don't literally move the critical value back and forth, because if you can calculate the inverse cdf (the quantile function) of the test statistic when H0 is true (i.e. the quantile function of the null distribution of the test statistic), then you can directly compute that critical value. However in some cases you don't have a neat inverse cdf to work with, and then you may in fact be engaged in searching for where the "tail" probability is no more than alpha (literal root-finding, in that you're solving F(T) - (1-⍺) = 0, which may involve binary section, for example, or some more sophisticated root-finder; 'vanilla' R offers Brent's algorithm via uniroot which is quite decent.) In some cases, it's not possible (or possible but difficult) to directly compute the density/pmf nor the cdf (let alone its inverse). In those cases it may be possible to simulate the distribution of the test statistic under the assumptions, for example by drawing samples that satisfy the assumptions and then computing the test statistic. By repeating that many times you can obtain an estimate of the cdf of the distribution of the test statistic when H0 is true, and so get critical values (and hence, p-values). Now that we know how to obtain critical values and alphas (either can be found from the other), let's define a p-value. The p-value is simply the smallest alpha for which you'd still reject H0. I'll try to think of a simple example to discuss, but if not I've at least covered the process in a reasonably general way.