When you group data into intervals, information is lost. So assumptions are made in order to make reasonable estimates of the sample mean, median, etc.
The assumption of this formula for estimating the median from grouped data is that the data are spread roughly uniformly throughout the interval. Clearly, this assumption
is not met in your situation because all ten of the 's lie at the lower
endpoint of the interval.
The idea of the formula is to estimate the median by interpolation, putting
the estimate somewhere within the interval. In your case the estimated
value
is in the middle of the 'median interval' (the interval
known to contain the median).
If you were trying to contrive a situation in which the estimate is even
farther from the truth, you could put your ten 's at the left end
of an interval
With no other data, your estimate of the
median would then be
There is nothing wrong with the formula, provided the assumption of data spread evenly throughout the interval is close to the truth. But any formula for estimating the median from grouped data will have to depend on assumptions. All that can be said for sure is the the median lies somewhere in the median interval. You have to recognize that the information lost in grouping data into intervals cannot be precisely recovered (unless the original data are saved and used).
Note: By contrast, the assumption usually made when trying to estimate the
sample mean from grouped data is that each observation lies precisely at the
midpoint of the interval that contains it. This idea gives rise to the
formula $\bar X \approx \frac 1 n \sum_{i=1}^k f_jm_j,$ where there are
intervals (usually of equal width), with midpoints
and frequencies
Videos
Why is the median useful in grouped data?
What is the Formula for the Median of Grouped Data?
How do you find the mean of a grouped data?
Because this is essentially a duplicate, I address a few issues that are do not explicitly overlap the related question or answer:
If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.
If is large (really the only case where this method is
generally successful), there is little difference between
and
in the formula. All references I checked use
.
Before computers were widely available, large datasets were customarily reduced to categories (classes) and plotted as histograms. Then the histograms were used to approximate the mean, variance, median, and other descriptive measures. Nowadays, it is best just to use a statistical computer package to find exact values of all measures.
One remaining application is to try to re-claim the descriptive measures from grouped data or from a histogram published in a journal. These are cases in which the original data are no longer available.
This procedure to approximate the sample median from grouped data $assumes$ that data are distributed in roughly a uniform fashion throughout the median interval. Then it uses interpolation to approximate the median. (By contrast, methods to approximate the sample mean and sample variance from grouped data one assumes that all obseervations are concentrated at their class midpoints.)
According to what I learned the class where the median is located is the lowest class for which the
cumulative frequency equals or exceeds
Therefore, the median class would be in 30-40. which would give 30.833 approximately as you said 31.
Each row is a separate dataset (up to 150 rows in a spreadsheet). The columns give the frequency in each group. I can manually find the median class and calculate the median for each row (albeit with some difficulty). But would like to make it a more automatic procedure.
I hope the screen shot below helps.