median formula for grouped data

how to calculate the median on grouped dataset?

stackoverflow.com › questions › 18887382 › how-to-calculate-the-median-on-grouped-dataset

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

By the way, with the sample data that you provided, where I think there was a mistake in one of your ranges (all were separated by dashes except one, which was separated by a comma), since strsplit uses a regular expression by default to split on, you can use the function like this:

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

Answer from A5C1D2H2I1M1N2O1R2T1 on Stack Overflow

ALLEN

allen.in › home › jee maths › median of grouped data

Median of Grouped Data with Solved Examples

May 19, 2025 - By locating the median class and applying the median formula, we can effectively determine the approximate center of the data distribution. This method is widely used in statistics to analyze large, categorized datasets. The median is the middle value that separates the higher half from the lower half of a data set. For ungrouped data, it's relatively easy to calculate. However, when the data is grouped into classes (like in frequency tables), we need to use a different approach.

CK-12 Foundation

flexbooks.ck12.org › cbook › ck-12-cbse-math-class-10 › section › 14.3 › primary › lesson › median-of-grouped-data

Median of Grouped Data - Formula, Steps and Examples

We cannot provide a description for this page right now

Discussions

r - how to calculate the median on grouped dataset? - Stack Overflow

My dataset is as following: salary number 1500-1600 110 1600-1700 180 1700-1800 320 1800-1900 460 1900-2000 850 2000-2100 250 2100-2200 130 2200-2300 70 2300-2400 20 2400-2500 ... More on stackoverflow.com

stackoverflow.com

Is there a single formula for Calculating Median of Grouped Data for multiple datasets ?

u/AussieRuth - Your post was submitted successfully. Once your problem is solved, reply to the answer(s) saying Solution Verified to close the thread. Follow the submission rules -- particularly 1 and 2. To fix the body, click edit. To fix your title, delete and re-post. Include your Excel version and all other relevant information Failing to follow these steps may result in your post being removed without warning. I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns. More on reddit.com

r/excel

August 11, 2022

statistics - Derivation of formulas for median and mode for grouped data - Mathematics Stack Exchange

I am studying in 10 th grade. In statistics chapter they gave formulas of mean,mode,median for grouped data. I could derive formula for mean easily but I can't for others. My textbook also didn't g... More on math.stackexchange.com

math.stackexchange.com

September 2, 2019

statistics - Derivation of formula for finding median for grouped data - Mathematics Stack Exchange

Accordingly, it is used when you have no complete data (e.g. when you only have grouped frequency tables). It is based on the assumption that data are uniformly distributed within the median class, so that you get the estimate by interpolation. Using $N/2$ in the formula gives the point where ... More on math.stackexchange.com

math.stackexchange.com

October 25, 2013

Videos

youtube.com

How To Calculate the Median of Grouped Data - Statistics

04:47

YouTube

How to calculate Median for Grouped Data? | Formula for Median ...

How to Calculate the Median of Grouped Data – Statistics - YouTube

April 20, 2025

youtube.com

Median of Grouped Data – Statistics - YouTube

April 20, 2025

09:47

YouTube

Mean, Median & Mode for a Grouped Frequency Data Table | Statistics ...

May 5, 2025

07:50

YouTube

Statistics - Mean, Median & Mode for a grouped frequency data - ...

byjus.com › maths › median-of-grouped-data

Median of Grouped Data

03:17

We know that the formula to find the median of the grouped data is: $\begin{array}{l}Median = l+ \left ( \frac{\frac{n}{2}-cf}{f} \right )\times h\end{array} $ Now, substituting the values in the formula, we get · $\begin{array}{l}Median = 145+ \left ( \frac{25.5-11}{18} \right )\times 5\end{array} $ ... Median = 149.03. Therefore, the median height for the given data is 149.

Published June 16, 2022

Views 34K

Testbook

testbook.com › home › maths › median of grouped data

How to Find the Median of Grouped Data with Step-by-Step Solved Examples

When the data is in grouped form, i.e., divided into intervals, the calculation of the median is slightly different from that of ungrouped data. The formula to calculate the median of grouped data is based on the assumption that the data is continuous and divided into intervals of equal width.

Stack Overflow

stackoverflow.com › questions › 18887382 › how-to-calculate-the-median-on-grouped-dataset

r - how to calculate the median on grouped dataset? - Stack Overflow

Top answer

1 of 6

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

2 of 6

I've written it like this to clearly explain how it's being worked out. A more compact version is appended.

library(data.table)

#constructing the dataset with the salary range split into low and high
salarydata <- data.table(
  salaries_low = 100*c(15:24),
  salaries_high = 100*c(16:25),
  numbers = c(110,180,320,460,850,250,130,70,20,10)
)

#calculating cumulative number of observations
salarydata <- salarydata[,cumnumbers := cumsum(numbers)]
salarydata
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1500          1600     110        110
   # 2:         1600          1700     180        290
   # 3:         1700          1800     320        610
   # 4:         1800          1900     460       1070
   # 5:         1900          2000     850       1920
   # 6:         2000          2100     250       2170
   # 7:         2100          2200     130       2300
   # 8:         2200          2300      70       2370
   # 9:         2300          2400      20       2390
   # 10:         2400          2500      10       2400

#identifying median group
mediangroup <- salarydata[
  (cumnumbers - numbers) <= (max(cumnumbers)/2) & 
  cumnumbers >= (max(cumnumbers)/2)]
mediangroup
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1900          2000     850       1920

#creating the variables needed to calculate median
mediangroup[,l := salaries_low]
mediangroup[,h := salaries_high - salaries_low]
mediangroup[,f := numbers]
mediangroup[,c := cumnumbers- numbers]
n = salarydata[,sum(numbers)]

#calculating median
median <- mediangroup[,l + ((h/f)*((n/2)-c))]
median
   # [1] 1915.294

The compact version -

EDIT: Changed to a function at @AnandaMahto's suggestion. Also, using more general variable names.

library(data.table)

#Creating function

CalculateMedian <- function(
   LowerBound,
   UpperBound,
   Obs
)
{
   #calculating cumulative number of observations and n
   dataset <- data.table(UpperBound, LowerBound, Obs)

   dataset <- dataset[,cumObs := cumsum(Obs)]
   n = dataset[,max(cumObs)]

   #identifying mediangroup and dynamically calculating l,h,f,c. We already have n.
   median <- dataset[
      (cumObs - Obs) <= (max(cumObs)/2) & 
      cumObs >= (max(cumObs)/2),

      LowerBound + ((UpperBound - LowerBound)/Obs) * ((n/2) - (cumObs- Obs))
   ]

   return(median)
}


# Using function
CalculateMedian(
  LowerBound = 100*c(15:24),
  UpperBound = 100*c(16:25),
  Obs = c(110,180,320,460,850,250,130,70,20,10)
)
# [1] 1915.294

Fctemis

fctemis.org › notes › 9903_STATISTICS III.pdf pdf

TOPIC: MEAN MEDIAN AND MODE OF GROUPED DATA Mean Of Grouped Data

The median of a grouped data · The median formula for grouped data is given as; C fb · n  · 2 · Median =L + [2 · ] C · fm · Where; L1 = lower class boundary of the median class · n = total frequency · Cfb = cumulative frequency before the median class ·

Find elsewhere

Google Bing Mojeek

Wikipedia

en.wikipedia.org › wiki › Median

Median - Wikipedia

1 week ago - {\displaystyle a\mapsto \operatorname {E} (\|X-a\|).\,} The spatial median is unique when the data-set's dimension is two or more. An alternative proof uses the one-sided Chebyshev inequality; it appears in an inequality on location and scale parameters. This formula also follows directly from Cantelli's inequality. For the case of unimodal distributions, one can achieve a sharper bound on the distance between the median and the mean:

Finite set of numbers

Definition and notation

Uses

Probability distributions

Properties

Jensen's inequality for medians

Medians for samples

Multivariate median

Conditional median

Other median-related concepts

Median-unbiased estimators

History

The Math Doctors

themathdoctors.org › finding-the-median-of-grouped-data

Finding the Median of Grouped Data – The Math Doctors

This is a linear interpolation ... class. One way to derive the formula is just to note that N/2 is the number of data values BELOW the median, so N/2 - F is the number of data values in the median class that are below the median....

GeeksforGeeks

geeksforgeeks.org › mathematics › median-of-grouped-data

Median of Grouped Data: Formula, How to Find, and Solved Examples - GeeksforGeeks

July 23, 2025 - The lower limit (l) and frequency (f) of the median class are 20 and 12 respectively. And, the cumulative frequency (cf) of class preceding the median class is 12. Now, we can substitute these values in the formula to calculate value of median, ... Thus, the value of median corresponding to the given grouped data comes out to be 26.67.

Cuemath

cuemath.com › data › median-of-grouped-data

Median of Grouped Data - Formula, Class 10, How to Find?

Then the formula to calculate the median of grouped data is l + [(n/2−c)/f] × h, where: ... Cuemath is one of the world's leading math learning platforms that offers LIVE 1-to-1 online math classes for grades K-12.

Statology

statology.org › home › how to find the median of grouped data (with examples)

How to Find the Median of Grouped Data (With Examples)

February 11, 2022 - This tutorial explains how to calculate the median value of grouped data, including several examples.

Slideshare

slideshare.net › home › education › median of grouped data

Median of grouped data | PPTX

3. The median is calculated using the formula: x = L + (n2 - F2)/f2 * i, where L is the lower limit of the median class, n2 is the median class, F2 is the cumulative frequency before the median class, f2 is the frequency of the median class, ...

reddit.com › r/excel › is there a single formula for calculating median of grouped data for multiple datasets ?

r/excel on Reddit: Is there a single formula for Calculating Median of Grouped Data for multiple datasets ?

August 11, 2022 -

Each row is a separate dataset (up to 150 rows in a spreadsheet). The columns give the frequency in each group. I can manually find the median class and calculate the median for each row (albeit with some difficulty). But would like to make it a more automatic procedure.

I hope the screen shot below helps.