Videos
What is the importance of mode in grouped data?
Can grouped data have more than one mode?
Give the formula for finding the mode for grouped data?
So I wanted to practice how to find the mode of grouped datas but my teacher’s studying contents are a mess, so I went on YouTube to practice but most of the videos I found were using a completely different formula from the one I learned in class (the first pic’s formula is the one I learned in class, the second image’s one is the most used from what I’ve seen). I tried to use both but found really different results. Can someone enlighten me on how is it that there are two different formulas and are they used in different contexts ? Couldn’t find much about this on my own unfortunately.
I'm in 10th grade, we were taught the formulae to find the mean, median and mode of a grouped distribution of data today, and I was wondering how these formulae were derived. Forget the formula for a sec, how can you find the median of some data if you don't even know the exact values? All you have are the frequencies of groups of data that have an equal class interval.
we can take their value as 0. The frequency of the succeeding model class is taken as 0 if model class is the last observation.
You can also check it from the equation as-
$l =$ lower limit of the modal class,
$h =$ size of the class interval (assuming all class sizes to be equal),
$f_1 =$ frequency of the modal class,
$f_0 =$ frequency of the class preceding the modal class,
$f_2 =$ frequency of the class succeeding the modal class.
Even if $f_2$ is $0$, the mode can be easily found by using the above expression.
Here is an elementary example of the use of a density estimator in R.
First we generate a thousand observations from the gamma distribution $\mathsf{Gamma}(\mathsf{shape}=\alpha=2, \mathsf{rate} = \lambda = 1/3)$ and plot their histogram in such a way that the 'modal bin' includes the smallest values.
set.seed(327)
x = rgamma(1000, 2, 1/3)
hist(x, prob=T, br=7, col="skyblue2")
Then we find the default density estimator in R. It consists of 512 points. Plotting them imitates a smooth curve.
den.est = density(x)
hist(x, prob=T, br=7, ylim=c(0,.15),col="skyblue2")
lines(den.est, type="l", lwd=2, col="red")
Here is a summary of the $(x,y)$ points of the density estimator. We can use these points to find where the estimated density curve is at its highest points. Thus, we can locate the 'mode' of the data, as defined by the density estimator. For our simulated data its about $3.65.$ (We take the 'mean' of x's with highest y-value because there may be ties.)
den.est
Call:
density.default(x = x)
Data: x (1000 obs.); Bandwidth 'bw' = 0.8625
x y
Min. :-2.480 Min. :6.260e-06
1st Qu.: 4.383 1st Qu.:2.507e-03
Median :11.247 Median :1.828e-02
Mean :11.247 Mean :3.639e-02
3rd Qu.:18.110 3rd Qu.:6.596e-02
Max. :24.974 Max. :1.203e-01
mean(den.est$x[den.est$y == max(den.est$y)])
[1] 3.644313
Usually the point of finding the mode of a histogram is to estimate the mode of the population distribution. We did pretty well in this example: The gamma distribution $\mathsf{Gamma}(\alpha=2,\lambda=1/3),$ from which we simulated the data has its mode at $(\alpha-1)/\lambda = 1/(1/3) = 3.$
Note: By way of full disclosure: (1) With as many as $n = 1000$ observations, we might have used more bins in our original histogram so that the traditional formula could be used. Here is a frequency histogram of the data with more bins. (I will leave it to you to see what value the traditional method gives.)
hist(x, ylim=c(0,260), labels=T, col="skyblue2")
(2) Also, if the population distribution has its mode at one end of its support, a modification of the default kernel density estimator in R may be required for a good estimate of the mode. (An exponential distribution, with its 'mode' at $0,$ would be an example.)