Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

  • frequencies: A vector of frequencies ("number" in your first example)
  • intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
  • sep: The separator character in your "intervals" column in your data.frame.
  • trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

By the way, with the sample data that you provided, where I think there was a mistake in one of your ranges (all were separated by dashes except one, which was separated by a comma), since strsplit uses a regular expression by default to split on, you can use the function like this:

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294
Answer from A5C1D2H2I1M1N2O1R2T1 on Stack Overflow
Top answer
1 of 6
7

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

  • frequencies: A vector of frequencies ("number" in your first example)
  • intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
  • sep: The separator character in your "intervals" column in your data.frame.
  • trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

By the way, with the sample data that you provided, where I think there was a mistake in one of your ranges (all were separated by dashes except one, which was separated by a comma), since strsplit uses a regular expression by default to split on, you can use the function like this:

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294
2 of 6
4

I've written it like this to clearly explain how it's being worked out. A more compact version is appended.

library(data.table)

#constructing the dataset with the salary range split into low and high
salarydata <- data.table(
  salaries_low = 100*c(15:24),
  salaries_high = 100*c(16:25),
  numbers = c(110,180,320,460,850,250,130,70,20,10)
)

#calculating cumulative number of observations
salarydata <- salarydata[,cumnumbers := cumsum(numbers)]
salarydata
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1500          1600     110        110
   # 2:         1600          1700     180        290
   # 3:         1700          1800     320        610
   # 4:         1800          1900     460       1070
   # 5:         1900          2000     850       1920
   # 6:         2000          2100     250       2170
   # 7:         2100          2200     130       2300
   # 8:         2200          2300      70       2370
   # 9:         2300          2400      20       2390
   # 10:         2400          2500      10       2400

#identifying median group
mediangroup <- salarydata[
  (cumnumbers - numbers) <= (max(cumnumbers)/2) & 
  cumnumbers >= (max(cumnumbers)/2)]
mediangroup
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1900          2000     850       1920

#creating the variables needed to calculate median
mediangroup[,l := salaries_low]
mediangroup[,h := salaries_high - salaries_low]
mediangroup[,f := numbers]
mediangroup[,c := cumnumbers- numbers]
n = salarydata[,sum(numbers)]

#calculating median
median <- mediangroup[,l + ((h/f)*((n/2)-c))]
median
   # [1] 1915.294

The compact version -

EDIT: Changed to a function at @AnandaMahto's suggestion. Also, using more general variable names.

library(data.table)

#Creating function

CalculateMedian <- function(
   LowerBound,
   UpperBound,
   Obs
)
{
   #calculating cumulative number of observations and n
   dataset <- data.table(UpperBound, LowerBound, Obs)

   dataset <- dataset[,cumObs := cumsum(Obs)]
   n = dataset[,max(cumObs)]

   #identifying mediangroup and dynamically calculating l,h,f,c. We already have n.
   median <- dataset[
      (cumObs - Obs) <= (max(cumObs)/2) & 
      cumObs >= (max(cumObs)/2),

      LowerBound + ((UpperBound - LowerBound)/Obs) * ((n/2) - (cumObs- Obs))
   ]

   return(median)
}


# Using function
CalculateMedian(
  LowerBound = 100*c(15:24),
  UpperBound = 100*c(16:25),
  Obs = c(110,180,320,460,850,250,130,70,20,10)
)
# [1] 1915.294
Top answer
1 of 5
26
library(dplyr)
dat%>%
group_by(custid)%>% 
summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value))
#  custid     Mean Max Min Median      Std
#1      1 2.666667   5   1    2.5 1.632993
#2      2 5.500000  10   1    5.5 6.363961
#3      3 2.666667   5   1    2.0 2.081666

For bigger datasets, data.table would be faster

setDT(dat)[,list(Mean=mean(value), Max=max(value), Min=min(value), Median=as.numeric(median(value)), Std=sd(value)), by=custid]
#   custid     Mean Max Min Median      Std
#1:      1 2.666667   5   1    2.5 1.632993
#2:      2 5.500000  10   1    5.5 6.363961
#3:      3 2.666667   5   1    2.0 2.081666
2 of 5
20

To add to the alternatives, here's summaryBy from the "doBy" package, with which you can specify a list of functions to apply.

library(doBy)
summaryBy(value ~ custid, data = mydf, 
          FUN = list(mean, max, min, median, sd))
#   custid value.mean value.max value.min value.median value.sd
# 1      1   2.666667         5         1          2.5 1.632993
# 2      2   5.500000        10         1          5.5 6.363961
# 3      3   2.666667         5         1          2.0 2.081666

Of course, you can also stick with base R:

myFun <- function(x) {
  c(min = min(x), max = max(x), 
    mean = mean(x), median = median(x), 
    std = sd(x))
}

tapply(mydf$value, mydf$custid, myFun)
# $`1`
#      min      max     mean   median      std 
# 1.000000 5.000000 2.666667 2.500000 1.632993 
# 
# $`2`
#       min       max      mean    median       std 
#  1.000000 10.000000  5.500000  5.500000  6.363961 
# 
# $`3`
#      min      max     mean   median      std 
# 1.000000 5.000000 2.666667 2.000000 2.081666 

cbind(custid = unique(mydf$custid), 
      do.call(rbind, tapply(mydf$value, mydf$custid, myFun)))
#   custid min max     mean median      std
# 1      1   1   5 2.666667    2.5 1.632993
# 2      2   1  10 5.500000    5.5 6.363961
# 3      3   1   5 2.666667    2.0 2.081666
🌐
Rdrr.io
rdrr.io › github › mrdwab › SOfun › man › GroupedMedian.html
GroupedMedian: Calculate the Median of Already Grouped Data in mrdwab/SOfun: Functions From Answers to R Questions on Stack Overflow
June 20, 2020 - TrueSeq: Convert TRUE Values in a Vector to a Grouped Sequence · unlist_by_row: Unlists the Values in a Rectangular Dataset by Row or Column · vec2symmat: Creates a Symmetric Matrix from a Vector · vectorBind: Bind Vectors Column-Wise According to Name · Browse all... ... Calculates the median of already grouped data given the interval ranges and the frequencies of each group.
🌐
ProgrammingR
programmingr.com › home › statistical operations
How to Find the Mean and Median of Grouped Data (With Examples in R) - ProgrammingR
March 20, 2023 - So, to see the code in action, let’s say that you would like to summarize the data in terms of mean and median based on different diets while generating a tibble based on the results. For that pipe operators are the best option to have sequence of operations. Excited already, let’s use pipes by using the following code: # Finding mean and Median using Group_by and Summarise
🌐
TutorialsPoint
tutorialspoint.com › how-to-find-the-group-wise-median-in-an-r-data-table-object
How to find the group-wise median in an R data.table object?
If we want to find the group-wise median and the data is stored in a data.table object then lapply function can be used as shown in the below examples. ... > Group<-sample(LETTERS[1:4],20,replace=TRUE) > x1<-rnorm(20,1,0.87) > x2<-rnorm(20,5,1.2) > x3<-rnorm(20,500,20) > x4<-rnorm(20,50,1.14) ...
🌐
Reddit
reddit.com › r/statistics › [q] accurately calculating median of grouped data (multiple groups)?
r/statistics on Reddit: [Q] Accurately calculating median of grouped data (multiple groups)?
November 8, 2021 -

Hi all, I was wondering if this is a possibility to aggregate the dataset in a smaller format:

- I have a dataset of individual people and their test scores. there are 10 test scores total for each student.

-Each student is in one of 40 classes, each of varying size between 10 to 500.

-The classes are further grouped into states, and there are 30 states total.

I would like to aggregate the data by class, and by state, into a NEW dataset, and calculate their medians. is there a possible way to do this? From my limited statistic knowledge, i've stumbled upon a calculation that takes the frequency, cumulative frequency, total no. of observations, and width of the median class to do so, but i'm not sure how to check if I am getting the right numbers.

🌐
RCODER
r-coder.com › home › r statistics › calculate the median in r
Calculate the MEDIAN in R [discrete and continuous variables]
January 7, 2024 - Consider a random sample of 1000 ... on both sides. ... Finally, if we have a data set classified by groups we can use the tapply function to calculate the median per group....
Find elsewhere
🌐
Spark By {Examples}
sparkbyexamples.com › home › r programming › calculate the median in r
Calculate the Median in R - Spark By {Examples}
April 10, 2024 - How to calculate the median of a DataFrame column or a Vector in R? You can use the R base median() function for computing the median of a Vector and DataFrame. This function takes the vector as a parameter and returns the median value as a numeric.
🌐
Stack Overflow
stackoverflow.com › questions › 42853599 › dplyr-medians-of-rows-based-on-grouping-variable
r - dplyr medians of rows based on grouping variable - Stack Overflow
Perhaps it is too easy, but I could not find for Dplyr. Thanks for your help. ... Sinan U.Sinan U. ... You need summarise_each i.e. gene_data %>% group_by(gene) %>% summarise_each(funs(medians=median(., na.rm = TRUE)))
🌐
DataScience Made Simple
datasciencemadesimple.com › home › median function in r – median()
Median function in R - median() - DataScience Made Simple
September 20, 2020 - Median of a group can also calculated using median() function in R by providing it inside the aggregate function. with median() function we can also find row wise median using dplyr package and also column wise ...
🌐
Statistics Globe
statisticsglobe.com › home › learn r programming (tutorial & examples) | free introduction › median in r (5 examples)
Median in R (5 Programming Examples) | NA, Column, by Group & Boxplot
July 22, 2022 - We can now use the median R function to compute the median of our example vector: ... As you can see based on the RStudio console output, the median of our example vector is 5.5. Note: Our example vector has an even length, resulting in a median ...
🌐
DataCamp
datacamp.com › tutorial › r-median-function
R median() Function: Find the Middle Value | DataCamp
June 20, 2025 - Learn how to use the R median() function to find the middle value of your data. Handle missing values, summarize by group, and improve your analysis.
🌐
Codecademy
codecademy.com › learn › learn-r › modules › r-stats-mean-median-mode › cheatsheet
Learn R: Learn R: Mean, Median, and Mode Cheatsheet | Codecademy
... The code above outputs 4.5, because it takes the average of the two medians, 4 and 5. The mean, or average, of a dataset is calculated by adding all the values in the dataset and then dividing by the number of values in the set.
Top answer
1 of 3
2

Because this is essentially a duplicate, I address a few issues that are do not explicitly overlap the related question or answer:

If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.

If $N$ is large (really the only case where this method is generally successful), there is little difference between $N/2$ and $(N+1)/2$ in the formula. All references I checked use $N/2$.

Before computers were widely available, large datasets were customarily reduced to categories (classes) and plotted as histograms. Then the histograms were used to approximate the mean, variance, median, and other descriptive measures. Nowadays, it is best just to use a statistical computer package to find exact values of all measures.

One remaining application is to try to re-claim the descriptive measures from grouped data or from a histogram published in a journal. These are cases in which the original data are no longer available.

This procedure to approximate the sample median from grouped data $assumes$ that data are distributed in roughly a uniform fashion throughout the median interval. Then it uses interpolation to approximate the median. (By contrast, methods to approximate the sample mean and sample variance from grouped data one assumes that all obseervations are concentrated at their class midpoints.)

2 of 3
0

According to what I learned the class where the median is located is the lowest class for which the cumulative frequency equals or exceeds $\frac N2$

Therefore, the median class would be in 30-40. which would give 30.833 approximately as you said 31.

🌐
Reddit
reddit.com › r/rlanguage › grouping with a variable while calculating the median of all other columns
r/Rlanguage on Reddit: grouping with a variable while calculating the median of all other columns
December 19, 2023 -

Hello ive been stuck on this task for a while now

i have a data set with 300+ rows and 3500+ columns

fruits number 1 number 2 number 3 number 4

apples 1

apples 2

apples NA

bananas 23123

bananas 21

oranges 2

oranges 1

oranges 1 1 2 3

oranges 1 3 5 6

etc...

is it possible to group by fruit (apples, bananas) while keeping all columns and calculating median of each column? (so for each fruit i will have 1 value for each column )

something like this

fruits nbr 1 (medain) nbr2 (medain ) nbr (median)

apples 21

bananas 5

etc...

thank you!