how to find median of grouped data

how to calculate the median on grouped dataset?

stackoverflow.com › questions › 18887382 › how-to-calculate-the-median-on-grouped-dataset

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

By the way, with the sample data that you provided, where I think there was a mistake in one of your ranges (all were separated by dashes except one, which was separated by a comma), since strsplit uses a regular expression by default to split on, you can use the function like this:

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

Answer from A5C1D2H2I1M1N2O1R2T1 on Stack Overflow

CK-12 Foundation

flexbooks.ck12.org › cbook › ck-12-cbse-math-class-10 › section › 14.3 › primary › lesson › median-of-grouped-data

Median of Grouped Data - Formula, Steps and Examples

We cannot provide a description for this page right now

BYJUS

byjus.com › maths › median-of-grouped-data

Median of Grouped Data

03:17

... The formula to find the median of grouped data is: Median = l+ [((n/2) – cf)/f] × h Where l = lower limit of median class, n = number of observations, h = class size, f = frequency of median class, cf = cumulative frequency of class preceding ...

Published June 16, 2022

Views 34K

Videos

youtube.com

How To Calculate the Median of Grouped Data - Statistics

04:47

YouTube

How to calculate Median for Grouped Data? | Formula for Median ...

May 14, 2021

1.5M

youtube.com

Median of Grouped Data – Statistics - YouTube

April 20, 2025

khanacademy.org

Median of Grouped Data (video) | Statistics

09:47

YouTube

Mean, Median & Mode for a Grouped Frequency Data Table | Statistics ...

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

2 of 6

I've written it like this to clearly explain how it's being worked out. A more compact version is appended.

library(data.table)

#constructing the dataset with the salary range split into low and high
salarydata <- data.table(
  salaries_low = 100*c(15:24),
  salaries_high = 100*c(16:25),
  numbers = c(110,180,320,460,850,250,130,70,20,10)
)

#calculating cumulative number of observations
salarydata <- salarydata[,cumnumbers := cumsum(numbers)]
salarydata
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1500          1600     110        110
   # 2:         1600          1700     180        290
   # 3:         1700          1800     320        610
   # 4:         1800          1900     460       1070
   # 5:         1900          2000     850       1920
   # 6:         2000          2100     250       2170
   # 7:         2100          2200     130       2300
   # 8:         2200          2300      70       2370
   # 9:         2300          2400      20       2390
   # 10:         2400          2500      10       2400

#identifying median group
mediangroup <- salarydata[
  (cumnumbers - numbers) <= (max(cumnumbers)/2) & 
  cumnumbers >= (max(cumnumbers)/2)]
mediangroup
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1900          2000     850       1920

#creating the variables needed to calculate median
mediangroup[,l := salaries_low]
mediangroup[,h := salaries_high - salaries_low]
mediangroup[,f := numbers]
mediangroup[,c := cumnumbers- numbers]
n = salarydata[,sum(numbers)]

#calculating median
median <- mediangroup[,l + ((h/f)*((n/2)-c))]
median
   # [1] 1915.294

The compact version -

EDIT: Changed to a function at @AnandaMahto's suggestion. Also, using more general variable names.

library(data.table)

#Creating function

CalculateMedian <- function(
   LowerBound,
   UpperBound,
   Obs
)
{
   #calculating cumulative number of observations and n
   dataset <- data.table(UpperBound, LowerBound, Obs)

   dataset <- dataset[,cumObs := cumsum(Obs)]
   n = dataset[,max(cumObs)]

   #identifying mediangroup and dynamically calculating l,h,f,c. We already have n.
   median <- dataset[
      (cumObs - Obs) <= (max(cumObs)/2) & 
      cumObs >= (max(cumObs)/2),

      LowerBound + ((UpperBound - LowerBound)/Obs) * ((n/2) - (cumObs- Obs))
   ]

   return(median)
}


# Using function
CalculateMedian(
  LowerBound = 100*c(15:24),
  UpperBound = 100*c(16:25),
  Obs = c(110,180,320,460,850,250,130,70,20,10)
)
# [1] 1915.294

reddit.com › r/excel › is there a single formula for calculating median of grouped data for multiple datasets ?

r/excel on Reddit: Is there a single formula for Calculating Median of Grouped Data for multiple datasets ?

August 11, 2022 -

Each row is a separate dataset (up to 150 rows in a spreadsheet). The columns give the frequency in each group. I can manually find the median class and calculate the median for each row (albeit with some difficulty). But would like to make it a more automatic procedure.

I hope the screen shot below helps.

Top answer

1 of 6

u/AussieRuth - Your post was submitted successfully. Once your problem is solved, reply to the answer(s) saying Solution Verified to close the thread. Follow the submission rules -- particularly 1 and 2. To fix the body, click edit. To fix your title, delete and re-post. Include your Excel version and all other relevant information Failing to follow these steps may result in your post being removed without warning. I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

2 of 6

With grouped data, you have the x-value and its frequency, eg if the data are: 0 0 1 2 3 3 4 4 4 , you could write it as x 0 1 2 3 4 f 2 1 1 2 3 The median is found by finding the half-way point ie the 5th point, ie x=3. If the frequencies are fairly large eg in the 100s, then you cumulatively add until you get to half-way. If, instead of simple x- values, you have say age-ranges or income-brackets, then you can only find the median class (bracket) this way. There is a further adjustment to get an actual numerical estimate of the median. My median brackets are different for each row, so the xl formulae for the adjustment is different for each row. And I have to physically change the formula for each row. That's what I would like to do automatically. eg in row 17, the formulae use columns W and X to calculate median in row 18, the formulae use columns U and V to calculate median Not sure if this is what you wanted to know about need to automate.

Math is Fun

mathsisfun.com › data › frequency-grouped-mean-median-mode.html

Mean, Median and Mode from Grouped Frequencies

59, 65, 61, 62, 53, 55, 60, 70, ... 59 + 68 + 61 + 6721 Mean = 61.38095... To find the Median Alex places the numbers in value order and finds the middle number....

reddit.com › r/askmath › when calculating the median of grouped data, why do we strictly choose the nearest class with cumulative frequency higher than n/2?

r/askmath on Reddit: When calculating the median of grouped data, why do we strictly choose the nearest class with cumulative frequency higher than N/2?

October 24, 2021 -

Say we're calculating the median of grouped data and the value of N/2 is found to be 170. If the class 30-40 has a cumulative frequency of 169.5, and the class 40-50 has a cumulative frequency of 180, we choose the class 40-50 as the median class.

Why do we do that, even though class 30-40 is clearly closer to it? Why can't it be the class with the closest cumulative frequency to it?

Top answer

1 of 3

The median class is the class that actually contains the median. Cumulative frequency of a class tells you the total frequency of everything inside that class and all classes below it. Because of this, the cumulative frequency tells you the upper bound on the range of percentile of the class. You might find the answer to your question much clearer if you figure out the percentile range of each class, because the percentile range has 2 numbers so it tells you a more completely story. The cumulative frequency tell you only the upper bound of this range, so it can skew your intuition. If the cumulative frequency is =N/2, then that means if you start counting from the bottom to the (N/2)-th item, you will reach N/2 before you run out of items in that class and all classes below it. So how do you find the class that contains the median? Obviously, the cumulative frequency must be at least >=N/2 (so that the median is in this class or below), but the cumulative frequency of any classes below it must be

2 of 3

How can the cumulative frequecy be a non natural number like 169.5?

MathWorks

mathworks.com › matlabcentral › answers › 2130576-how-to-calculate-median-of-grouped-data-in-matlab

how to calculate median of grouped data in MATLAB - MATLAB Answers - MATLAB Central

Top answer

1 of 1

Hello NAFISA, I see that you are trying to calculate the median of grouped data using MATLAB's median function. However, the 'median' function in MATLAB is designed for raw data inputs and does not directly compute the median for grouped data. you can expand your grouped data as follows: class_intervals = [0 5; 5 10; 10 15; 15 20; 20 25; 25 30; 30 35; 35 40; 40 45; 45 50]; frequencies = [14, 8, 20, 7, 11, 10, 5, 16, 21, 9]; midpoints = (class_intervals(:, 1) + class_intervals(:, 2)) / 2; expanded_data = []; for i = 1:length(frequencies) expanded_data = [expanded_data, repmat(midpoints(i), 1, frequencies(i))]; end median_value = median(expanded_data); disp(['The median is: ', num2str(median_value)]); Using this method, you will find that the output is 27.5 instead of the expected 25.25. This discrepancy occurs because: When you expand grouped data into individual data points, you assume that all data points within a class interval are located at the midpoint of that interval. For example, if a class interval is [20, 25] with a frequency of 11, you assume there are 11 data points all exactly at 22.5. This assumption can lead to inaccuracies because the actual data points could be spread across the entire interval [20, 25]. So, as Muskan mentioned , you can use the grouped data median formula (L + ((N/2 - cf) / f) * h). If you frequently work with frequency distribution tables and find it cumbersome to use this formula manually, you can use MATLAB functions to simplify the process. Here is an example: class_intervals1 = [0 5; 5 10; 10 15; 15 20; 20 25; 25 30; 30 35; 35 40; 40 45; 45 50]; frequencies1 = [14, 8, 20, 7, 11, 10, 5, 16, 21, 9]; class_intervals2 = [420 430; 430 440; 440 450; 450 460; 460 470; 470 480; 480 490; 490 500]; frequencies2 = [336, 2112, 2336, 1074, 1553, 1336, 736, 85]; % Calculate the median using the custom function median_value1 = groupedMedian(class_intervals1, frequencies1); median_value2 = groupedMedian(class_intervals2, frequencies2); disp(['The median of the grouped data1 is: ', num2str(median_value1)]); disp(['The median of the grouped data2 is: ', num2str(median_value2)]); function median_value = groupedMedian(class_intervals, frequencies) % Calculate cumulative frequency cum_frequencies = cumsum(frequencies); % Total number of observations N = sum(frequencies); % Find the median class (first class where cumulative frequency >= N/2) median_class_index = find(cum_frequencies >= N/2, 1); % Extract the median class boundaries and frequency L = class_intervals(median_class_index, 1); f = frequencies(median_class_index); CF = cum_frequencies(median_class_index - 1); if isempty(CF) CF = 0; end h = class_intervals(median_class_index, 2) - L; % Calculate the median median_value = L + ((N/2 - CF) / f) * h; end You can also try referring to these file exchange functions which might help you https://www.mathworks.com/matlabcentral/fileexchange/38238-gmedian https://www.mathworks.com/matlabcentral/fileexchange/38228-gprctile I hope this helps you moving forward

Find elsewhere

Google Bing Mojeek

SMath

smath.com › en-US › forum › topic › TY7Zc6 › how-to-calculate-median-of-grouped-data-if-group-size-is-variable

how to calculate median of grouped data if group size is variable - SMath

I learned in school that Median = L + (n/2-cf)*h/f where L = lower limit of median class n = no. of observations cf = cumulative frequency of class preceding the median class, f = frequency of median class, h = class size (assuming class size to be equal). I used to use this formula for grouped ...

Stack Exchange

math.stackexchange.com › questions › 1617208 › calculation-of-median-of-grouped-data

statistics - calculation of median of grouped data - Mathematics Stack Exchange

Top answer

1 of 3

Because this is essentially a duplicate, I address a few issues that are do not explicitly overlap the related question or answer:

If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.

If $\text{[math]}$ is large (really the only case where this method is generally successful), there is little difference between $\text{[math]}$ and $\text{[math]}$ in the formula. All references I checked use $\text{[math]}$ .

Before computers were widely available, large datasets were customarily reduced to categories (classes) and plotted as histograms. Then the histograms were used to approximate the mean, variance, median, and other descriptive measures. Nowadays, it is best just to use a statistical computer package to find exact values of all measures.

One remaining application is to try to re-claim the descriptive measures from grouped data or from a histogram published in a journal. These are cases in which the original data are no longer available.

This procedure to approximate the sample median from grouped data $assumes$ that data are distributed in roughly a uniform fashion throughout the median interval. Then it uses interpolation to approximate the median. (By contrast, methods to approximate the sample mean and sample variance from grouped data one assumes that all obseervations are concentrated at their class midpoints.)

2 of 3

According to what I learned the class where the median is located is the lowest class for which the cumulative frequency equals or exceeds $\text{[math]}$

Therefore, the median class would be in 30-40. which would give 30.833 approximately as you said 31.

BBC

bbc.co.uk › bitesize › guides › zwhgk2p › revision › 7

Averages from a grouped table - Analysing data - Edexcel - GCSE Maths Revision - Edexcel - BBC Bitesize

February 13, 2023 - If data is organised into groups, we do not know the exact value of each item of data, just which group it belongs to. This means that we cannot find the exact value for the modeclosemodeAn average found by selecting the most commonly occurring value. There can be more than one mode and there can also be no mode., medianclosemedianThe median is the middle value.

GeeksforGeeks

geeksforgeeks.org › mathematics › median-of-grouped-data

Median of Grouped Data: Formula, How to Find, and Solved Examples - GeeksforGeeks

July 23, 2025 - To find median of ungrouped data, one can simply sort the data points in ascending order. In case of odd number of observations, the middle value would be the median. On the other hand , for even number of observations, one can take mean of the two middle values to find the median. But there is a different method to find median of grouped data discussed later in this article.

Microsoft Community

community.fabric.microsoft.com › t5 › DAX-Commands-and-Tips › How-to-calculate-median-for-grouped-data-without-making-new › m-p › 3823113

Solved: How to calculate median for grouped data without m... - Microsoft Fabric Community

April 10, 2024 - I got this original table: I need to calculate median values for selling_eur. How can I do it? I got only some ideas. Transform table with duplicated rows for calculation. It will not work fast. Some how add new column, thich contains arrays with duplicated prices for calculation. I got no id...

The Math Doctors

themathdoctors.org › finding-the-median-of-grouped-data

Finding the Median of Grouped Data – The Math Doctors

Derivation of Linear Interpolation Median Formula Median, m = L + [ (N/2 – F) / f ]C. How does this median formula come? My teacher did not show and proof how does this formula come. Therefore, I just substitute and blindly use the formula. Can you help me?

Microsoft Support

support.microsoft.com › en-us › office › calculate-the-median-of-a-group-of-numbers-2e3ec1aa-5046-4b4b-bfc4-4266ecf39bf9

Calculate the median of a group of numbers - Microsoft Support

In the Formula Builder pane, type MEDIAN in the Search box, and then select Insert Function. Make sure the cell span in the Number1 box matches your data (In this case, A1:A7). For this example, the answer that appears in the cell should be 8. Tip: To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Formulas tab, in the Formula Auditing group, select the Show Formulas button.

Quora

quora.com › How-can-I-find-a-median-in-grouped-data-if-it-is-odd

How to find a median in grouped data if it is odd - Quora

Answer (1 of 2): Find the cf(cumulative frequency) first ,then (N+1)÷2th term. Then, round up the exactly or gretear than (N+1)÷2th term. Finally, in the rounded number under Xi will be median

University of Massachusetts

people.umass.edu › biep540w › pdf › Grouped Data Calculation.pdf pdf

Lecture 2 – Grouped Data Calculation

– Grouped Data · Step 1: Construct the cumulative frequency distribution. Step 2: Decide the class that contain the median. Class Median is the first class with the value of cumulative · frequency equal at least n/2. Step 3: Find the median by using the following formula: M e d ia n · ...

Slideshare

slideshare.net › home › education › median of grouped data

Median of grouped data | PPTX

This document provides steps for ... frequencies, and cumulative frequencies. 2. Find the median class by calculating N/2, where N is the total number of data points....

AtoZMath

atozmath.com › example › StatsG.aspx

Median Example for grouped data

Median Example for grouped data - Median Example for grouped data, step by step online

CalculatorSoup

calculatorsoup.com › calculators › statistics › mean-median-mode.php

Mean, Median, Mode Calculator

November 4, 2025 - For the data set 1, 1, 2, 6, 6, ... to highest value, the median $ \widetilde{x} $ is the data point separating the upper half of the data values from the lower half....

Enterprise DNA

forum.enterprisedna.co › dax › dax calculations

Calculate median on a grouped column - DAX Calculations - Enterprise DNA Forum

February 8, 2021 - Hi. I have a calculation that I need to solve and I’m not quite sure how to proceed. Here is a sample of my data. Data Sample.xlsx (8.8 KB) I need to group the ‘Identifier’ where ‘06 Mos Post-ALC ERs’ = 1 I used the following DAX to generate the start of code necessary to calculate the median of a grouped column.