how to find median of grouped data in r

how to calculate the median on grouped dataset?

stackoverflow.com › questions › 18887382 › how-to-calculate-the-median-on-grouped-dataset

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

By the way, with the sample data that you provided, where I think there was a mistake in one of your ranges (all were separated by dashes except one, which was separated by a comma), since strsplit uses a regular expression by default to split on, you can use the function like this:

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

Answer from A5C1D2H2I1M1N2O1R2T1 on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 18887382 › how-to-calculate-the-median-on-grouped-dataset

r - how to calculate the median on grouped dataset? - Stack Overflow

Top answer

1 of 6

Since you already know the formula, it should be easy enough to create a function to do the calculation for you.

Here, I've created a basic function to get you started. The function takes four arguments:

frequencies: A vector of frequencies ("number" in your first example)
intervals: A 2-row matrix with the same number of columns as the length of frequencies, with the first row being the lower class boundary, and the second row being the upper class boundary. Alternatively, "intervals" may be a column in your data.frame, and you may specify sep (and possibly, trim) to have the function automatically create the required matrix for you.
sep: The separator character in your "intervals" column in your data.frame.
trim: A regular expression of characters that need to be removed before trying to coerce to a numeric matrix. One pattern is built into the function: trim = "cut". This sets the regular expression pattern to remove (, ), [, and ] from the input.

Here's the function (with comments showing how I used your instructions to put it together):

GroupedMedian <- function(frequencies, intervals, sep = NULL, trim = NULL) {
  # If "sep" is specified, the function will try to create the 
  #   required "intervals" matrix. "trim" removes any unwanted 
  #   characters before attempting to convert the ranges to numeric.
  if (!is.null(sep)) {
    if (is.null(trim)) pattern <- ""
    else if (trim == "cut") pattern <- "\\[|\\]|\\(|\\)"
    else pattern <- trim
    intervals <- sapply(strsplit(gsub(pattern, "", intervals), sep), as.numeric)
  }

  Midpoints <- rowMeans(intervals)
  cf <- cumsum(frequencies)
  Midrow <- findInterval(max(cf)/2, cf) + 1
  L <- intervals[1, Midrow]      # lower class boundary of median class
  h <- diff(intervals[, Midrow]) # size of median class
  f <- frequencies[Midrow]       # frequency of median class
  cf2 <- cf[Midrow - 1]          # cumulative frequency class before median class
  n_2 <- max(cf)/2               # total observations divided by 2

  unname(L + (n_2 - cf2)/f * h)
}

Here's a sample data.frame to work with:

mydf <- structure(list(salary = c("1500-1600", "1600-1700", "1700-1800", 
    "1800-1900", "1900-2000", "2000-2100", "2100-2200", "2200-2300", 
    "2300-2400", "2400-2500"), number = c(110L, 180L, 320L, 460L, 
    850L, 250L, 130L, 70L, 20L, 10L)), .Names = c("salary", "number"), 
    class = "data.frame", row.names = c(NA, -10L))
mydf
#       salary number
# 1  1500-1600    110
# 2  1600-1700    180
# 3  1700-1800    320
# 4  1800-1900    460
# 5  1900-2000    850
# 6  2000-2100    250
# 7  2100-2200    130
# 8  2200-2300     70
# 9  2300-2400     20
# 10 2400-2500     10

Now, we can simply do:

GroupedMedian(mydf$number, mydf$salary, sep = "-")
# [1] 1915.294

Here's an example of the function in action on some made up data:

set.seed(1)
x <- sample(100, 100, replace = TRUE)
y <- data.frame(table(cut(x, 10)))
y
#           Var1 Freq
# 1   (1.9,11.7]    8
# 2  (11.7,21.5]    8
# 3  (21.5,31.4]    8
# 4  (31.4,41.2]   15
# 5    (41.2,51]   13
# 6    (51,60.8]    5
# 7  (60.8,70.6]   11
# 8  (70.6,80.5]   15
# 9  (80.5,90.3]   11
# 10  (90.3,100]    6

### Here's GroupedMedian's output on the grouped data.frame...
GroupedMedian(y$Freq, y$Var1, sep = ",", trim = "cut")
# [1] 49.49231

### ... and the output of median on the original vector
median(x)
# [1] 49.5

x<-c(110,180,320,460,850,250,130,70,20,10)
colnames<-c("numbers")
rownames<-c("[1500-1600]","(1600-1700]","(1700-1800]","(1800-1900]",
            "(1900-2000]"," (2000,2100]","(2100-2200]","(2200-2300]",
            "(2300-2400]","(2400-2500]")
y<-matrix(x,nrow=length(x),dimnames=list(rownames,colnames))
GroupedMedian(y[, "numbers"], rownames(y), sep="-|,", trim="cut")
# [1] 1915.294

2 of 6

I've written it like this to clearly explain how it's being worked out. A more compact version is appended.

library(data.table)

#constructing the dataset with the salary range split into low and high
salarydata <- data.table(
  salaries_low = 100*c(15:24),
  salaries_high = 100*c(16:25),
  numbers = c(110,180,320,460,850,250,130,70,20,10)
)

#calculating cumulative number of observations
salarydata <- salarydata[,cumnumbers := cumsum(numbers)]
salarydata
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1500          1600     110        110
   # 2:         1600          1700     180        290
   # 3:         1700          1800     320        610
   # 4:         1800          1900     460       1070
   # 5:         1900          2000     850       1920
   # 6:         2000          2100     250       2170
   # 7:         2100          2200     130       2300
   # 8:         2200          2300      70       2370
   # 9:         2300          2400      20       2390
   # 10:         2400          2500      10       2400

#identifying median group
mediangroup <- salarydata[
  (cumnumbers - numbers) <= (max(cumnumbers)/2) & 
  cumnumbers >= (max(cumnumbers)/2)]
mediangroup
   # salaries_low salaries_high numbers cumnumbers
   # 1:         1900          2000     850       1920

#creating the variables needed to calculate median
mediangroup[,l := salaries_low]
mediangroup[,h := salaries_high - salaries_low]
mediangroup[,f := numbers]
mediangroup[,c := cumnumbers- numbers]
n = salarydata[,sum(numbers)]

#calculating median
median <- mediangroup[,l + ((h/f)*((n/2)-c))]
median
   # [1] 1915.294

The compact version -

EDIT: Changed to a function at @AnandaMahto's suggestion. Also, using more general variable names.

library(data.table)

#Creating function

CalculateMedian <- function(
   LowerBound,
   UpperBound,
   Obs
)
{
   #calculating cumulative number of observations and n
   dataset <- data.table(UpperBound, LowerBound, Obs)

   dataset <- dataset[,cumObs := cumsum(Obs)]
   n = dataset[,max(cumObs)]

   #identifying mediangroup and dynamically calculating l,h,f,c. We already have n.
   median <- dataset[
      (cumObs - Obs) <= (max(cumObs)/2) & 
      cumObs >= (max(cumObs)/2),

      LowerBound + ((UpperBound - LowerBound)/Obs) * ((n/2) - (cumObs- Obs))
   ]

   return(median)
}


# Using function
CalculateMedian(
  LowerBound = 100*c(15:24),
  UpperBound = 100*c(16:25),
  Obs = c(110,180,320,460,850,250,130,70,20,10)
)
# [1] 1915.294

Stack Overflow

stackoverflow.com › questions › 25198442 › how-to-calculate-mean-median-per-group-in-a-dataframe-in-r

how to calculate mean/median per group in a dataframe in r - Stack Overflow

Top answer

1 of 5

library(dplyr)
dat%>%
group_by(custid)%>% 
summarise(Mean=mean(value), Max=max(value), Min=min(value), Median=median(value), Std=sd(value))
#  custid     Mean Max Min Median      Std
#1      1 2.666667   5   1    2.5 1.632993
#2      2 5.500000  10   1    5.5 6.363961
#3      3 2.666667   5   1    2.0 2.081666

For bigger datasets, data.table would be faster

setDT(dat)[,list(Mean=mean(value), Max=max(value), Min=min(value), Median=as.numeric(median(value)), Std=sd(value)), by=custid]
#   custid     Mean Max Min Median      Std
#1:      1 2.666667   5   1    2.5 1.632993
#2:      2 5.500000  10   1    5.5 6.363961
#3:      3 2.666667   5   1    2.0 2.081666

2 of 5

To add to the alternatives, here's summaryBy from the "doBy" package, with which you can specify a list of functions to apply.

library(doBy)
summaryBy(value ~ custid, data = mydf, 
          FUN = list(mean, max, min, median, sd))
#   custid value.mean value.max value.min value.median value.sd
# 1      1   2.666667         5         1          2.5 1.632993
# 2      2   5.500000        10         1          5.5 6.363961
# 3      3   2.666667         5         1          2.0 2.081666

Of course, you can also stick with base R:

myFun <- function(x) {
  c(min = min(x), max = max(x), 
    mean = mean(x), median = median(x), 
    std = sd(x))
}

tapply(mydf$value, mydf$custid, myFun)
# $`1`
#      min      max     mean   median      std 
# 1.000000 5.000000 2.666667 2.500000 1.632993 
# 
# $`2`
#       min       max      mean    median       std 
#  1.000000 10.000000  5.500000  5.500000  6.363961 
# 
# $`3`
#      min      max     mean   median      std 
# 1.000000 5.000000 2.666667 2.000000 2.081666 

cbind(custid = unique(mydf$custid), 
      do.call(rbind, tapply(mydf$value, mydf$custid, myFun)))
#   custid min max     mean median      std
# 1      1   1   5 2.666667    2.5 1.632993
# 2      2   1  10 5.500000    5.5 6.363961
# 3      3   1   5 2.666667    2.0 2.081666

Videos

04:12

YouTube

Finding median of a grouped data - Part 2 | Statistics - YouTube

Mean, Median, and Mode of Grouped Data & Frequency Distribution ...

January 26, 2019

05:20

YouTube

Maths - Median of grouped data - Word problems - Statistics - Part ...

August 14, 2017

04:27

YouTube

Median Of Grouped Data - YouTube

rdrr.io › github › mrdwab › SOfun › man › GroupedMedian.html

GroupedMedian: Calculate the Median of Already Grouped Data in mrdwab/SOfun: Functions From Answers to R Questions on Stack Overflow

June 20, 2020 - TrueSeq: Convert TRUE Values in a Vector to a Grouped Sequence · unlist_by_row: Unlists the Values in a Rectangular Dataset by Row or Column · vec2symmat: Creates a Symmetric Matrix from a Vector · vectorBind: Bind Vectors Column-Wise According to Name · Browse all... ... Calculates the median of already grouped data given the interval ranges and the frequencies of each group.

ProgrammingR

programmingr.com › home › statistical operations

How to Find the Mean and Median of Grouped Data (With Examples in R) - ProgrammingR

March 20, 2023 - So, to see the code in action, let’s say that you would like to summarize the data in terms of mean and median based on different diets while generating a tibble based on the results. For that pipe operators are the best option to have sequence of operations. Excited already, let’s use pipes by using the following code: # Finding mean and Median using Group_by and Summarise

Stack Overflow

stackoverflow.com › questions › 26519904 › calculating-median-for-each-column-of-grouped-data

r - Calculating median for each column of grouped data - Stack Overflow

Top answer

1 of 5

I find it amazing that noone has suggested aggregate yet, seeing as it is the simple, base R function included for these sorts of tasks. E.g.:

aggregate(. ~ genotype, data=dat, FUN=median)

#  genotype DIV3  DIV4
#1      HET  1.4  3.20
#2       WT 23.9 25.25

2 of 5

I found ddply to be the best for this.

 medians = ddply(a, .(genotype), numcolwise(median))

TutorialsPoint

tutorialspoint.com › how-to-find-the-group-wise-median-in-an-r-data-table-object

How to find the group-wise median in an R data.table object?

If we want to find the group-wise median and the data is stored in a data.table object then lapply function can be used as shown in the below examples. ... > Group<-sample(LETTERS[1:4],20,replace=TRUE) > x1<-rnorm(20,1,0.87) > x2<-rnorm(20,5,1.2) > x3<-rnorm(20,500,20) > x4<-rnorm(20,50,1.14) ...

reddit.com › r/statistics › [q] accurately calculating median of grouped data (multiple groups)?

r/statistics on Reddit: [Q] Accurately calculating median of grouped data (multiple groups)?

November 7, 2021 -

Hi all, I was wondering if this is a possibility to aggregate the dataset in a smaller format:

- I have a dataset of individual people and their test scores. there are 10 test scores total for each student.

-Each student is in one of 40 classes, each of varying size between 10 to 500.

-The classes are further grouped into states, and there are 30 states total.

I would like to aggregate the data by class, and by state, into a NEW dataset, and calculate their medians. is there a possible way to do this? From my limited statistic knowledge, i've stumbled upon a calculation that takes the frequency, cumulative frequency, total no. of observations, and width of the median class to do so, but i'm not sure how to check if I am getting the right numbers.

Top answer

1 of 1

Using R:

library(tidyverse)

Data %>% group_by(classes, states) %>% summarize(Median=median(score, na.rm = TRUE/FALSE))

RCODER

r-coder.com › home › r statistics › calculate the median in r

Calculate the MEDIAN in R [discrete and continuous variables]

January 7, 2024 - Consider a random sample of 1000 ... on both sides. ... Finally, if we have a data set classified by groups we can use the tapply function to calculate the median per group....

Find elsewhere

Google Bing Mojeek

Spark By {Examples}

sparkbyexamples.com › home › r programming › calculate the median in r

Calculate the Median in R - Spark By {Examples}

April 10, 2024 - How to calculate the median of a DataFrame column or a Vector in R? You can use the R base median() function for computing the median of a Vector and DataFrame. This function takes the vector as a parameter and returns the median value as a numeric.

Stack Overflow

stackoverflow.com › questions › 42853599 › dplyr-medians-of-rows-based-on-grouping-variable

r - dplyr medians of rows based on grouping variable - Stack Overflow

Perhaps it is too easy, but I could not find for Dplyr. Thanks for your help. ... Sinan U.Sinan U. ... You need summarise_each i.e. gene_data %>% group_by(gene) %>% summarise_each(funs(medians=median(., na.rm = TRUE)))

Stack Overflow

stackoverflow.com › questions › 50933608 › how-to-calculate-the-median-for-groups-separately-in-r

dplyr - how to calculate the median for groups separately in R - Stack Overflow

Top answer

1 of 2

One solution can be achieved using dplyr and following below mentioned steps. Please find comments in code below for approach.

Note: It seems that sample data from OP is not very meaningful as such.

library(dplyr)

df %>% filter(stuff > 0) %>%  #First filter out for stuff > 0 which of our interest
  group_by(ItemRelation, num, year) %>%
    mutate(m = median(stuff[action==1]),
           m0 = median(tail(stuff[action==0], 5))) %>%  # Calculate m and m0 for all rows
  filter(action == 1) %>%  # Now keep only rows with action == 1
  mutate(m = m-m0) %>%
  select(-Dt,-m0,-action)

# # A tibble: 4 x 5
# # Groups: ItemRelation, num, year [2]
# ItemRelation stuff   num  year     m
# <int> <int> <int> <int> <dbl>
# 1       158043   400  1459  2018  -450
# 2       158043   700  1459  2018  -450
# 3          234   400  1459  2018  -450
# 4          234   700  1459  2018  -450

2 of 2

The easiest way to do this is to use group_by and summarize from the dplyr package:

library(dplyr)

# median of groups
medians <- df %>%
    group_by(ItemRelation, num, year) %>%
    summarize(med = median(stuff, na.rm = T))

# median of nonzero values in each group
medians <- df %>%
    filter(stuff>0) %>%
    group_by(ItemRelation, num, year) %>%
    summarize(med = median(stuff, na.rm = T))


subtract <- function(x){return(x[1]-x[2])}
median_diffs <- medians %>%
                group_by(ItemRelation, num, year) %>%
                mutate(med_diff = subtract(med))

DataScience Made Simple

datasciencemadesimple.com › home › median function in r – median()

Median function in R - median() - DataScience Made Simple

September 20, 2020 - Median of a group can also calculated using median() function in R by providing it inside the aggregate function. with median() function we can also find row wise median using dplyr package and also column wise ...

Statistics Globe

statisticsglobe.com › home › learn r programming (tutorial & examples) | free introduction › median in r (5 examples)

Median in R (5 Programming Examples) | NA, Column, by Group & Boxplot

July 22, 2022 - We can now use the median R function to compute the median of our example vector: ... As you can see based on the RStudio console output, the median of our example vector is 5.5. Note: Our example vector has an even length, resulting in a median ...

DataCamp

datacamp.com › tutorial › r-median-function

R median() Function: Find the Middle Value | DataCamp

June 20, 2025 - Learn how to use the R median() function to find the middle value of your data. Handle missing values, summarize by group, and improve your analysis.

Codecademy

codecademy.com › learn › learn-r › modules › r-stats-mean-median-mode › cheatsheet

Learn R: Learn R: Mean, Median, and Mode Cheatsheet | Codecademy

... The code above outputs 4.5, because it takes the average of the two medians, 4 and 5. The mean, or average, of a dataset is calculated by adding all the values in the dataset and then dividing by the number of values in the set.

Stack Exchange

math.stackexchange.com › questions › 1617208 › calculation-of-median-of-grouped-data

statistics - calculation of median of grouped data - Mathematics Stack Exchange

Top answer

1 of 3

Because this is essentially a duplicate, I address a few issues that are do not explicitly overlap the related question or answer:

If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.

If $N$ is large (really the only case where this method is generally successful), there is little difference between $N/2$ and $(N+1)/2$ in the formula. All references I checked use $N/2$.

Before computers were widely available, large datasets were customarily reduced to categories (classes) and plotted as histograms. Then the histograms were used to approximate the mean, variance, median, and other descriptive measures. Nowadays, it is best just to use a statistical computer package to find exact values of all measures.

One remaining application is to try to re-claim the descriptive measures from grouped data or from a histogram published in a journal. These are cases in which the original data are no longer available.

This procedure to approximate the sample median from grouped data $assumes$ that data are distributed in roughly a uniform fashion throughout the median interval. Then it uses interpolation to approximate the median. (By contrast, methods to approximate the sample mean and sample variance from grouped data one assumes that all obseervations are concentrated at their class midpoints.)

2 of 3

According to what I learned the class where the median is located is the lowest class for which the cumulative frequency equals or exceeds $\frac N2$

Therefore, the median class would be in 30-40. which would give 30.833 approximately as you said 31.

ResearchGate

researchgate.net › post › Can-anyone-help-me-find-the-median-of-groups-with-various-sizes-in-R

Can anyone help me find the median of groups with [various sizes] in R? | ResearchGate

Top answer

1 of 3

Output<-aggregate(values~groups,Table1,median) has the required output format. But Miguel's solution using tapply is faster, if that matters.

2 of 3

You can use the function tapply(): output<-tapply(values, INDEX=groups, FUN=median)

Stack Overflow

stackoverflow.com › questions › 74711625 › median-of-grouped-data

r - Median of grouped data - Stack Overflow

Top answer

1 of 2

If I understood your question correctly you're going to want to do something like this:

# Your gestational data:
gestational_data <- data.frame(GA_weeks = c(20:26),
                               num_infants_born = c(16,22,34,45,60,67,94))

# See the apply() documentation by running 
# ?apply

apply(gestational_data,
      1,
      function(x){
        rep(x[1],x[2])
      }) |>
  unlist()|>
  median()

2 of 2

What you want is a weighted median. You first want the weeks as numeric, which you get using gsub if not yet available

dat$GA_num <- as.numeric(gsub('\\D', '', dat$GA))

Then, use weightedMedian from the matrixStats package with the number of infants as weights.

matrixStats::weightedMedian(dat$GA_num, w=dat$num_infants_born)
# [1] 24.34646

Note, that there are several definitions of the weighted mean. For a comprehensive discussion, see this answer.

Data:

dat <- structure(list(GA = c("20 weeks", "21 weeks", "22 weeks", "23 weeks", 
"24 weeks", "25 weeks", "26 weeks"), num_infants_born = c(16L, 
22L, 34L, 45L, 60L, 67L, 94L)), class = "data.frame", row.names = c(NA, 
-7L))

reddit.com › r/rlanguage › grouping with a variable while calculating the median of all other columns

r/Rlanguage on Reddit: grouping with a variable while calculating the median of all other columns

December 19, 2023 -

Hello ive been stuck on this task for a while now

i have a data set with 300+ rows and 3500+ columns

fruits number 1 number 2 number 3 number 4

apples 1

apples 2

apples NA

bananas 23123

bananas 21

oranges 2

oranges 1

oranges 1 1 2 3

oranges 1 3 5 6

etc...

is it possible to group by fruit (apples, bananas) while keeping all columns and calculating median of each column? (so for each fruit i will have 1 value for each column )

something like this

fruits nbr 1 (medain) nbr2 (medain ) nbr (median)

apples 21

bananas 5

etc...

thank you!

Top answer

1 of 3

You might want to pivot your data longer (tidyr::pivot_longer) before using dplyr::summarise (with .by = fruits).

2 of 3

If someone can explain the thought process behind it it would be great! i tried using group and summarize / colMedian/ i tried to split the data into smaller data sets according to fruit/ tried creating a tbl_summary

Stack Overflow

stackoverflow.com › questions › 36994761 › r-median-of-a-frequency-distribution-grouped-by-another-variable

grouping - R - Median of a Frequency distribution, grouped by another variable - Stack Overflow

Top answer

1 of 2

We can try with dplyr

library(dplyr)    
Clean1 <- Clean[rep(1:nrow(Clean), Clean$Frequency),]
Clean1 %>%
      group_by(State) %>%
      summarise(Median = median(medicare_average_payment))

Or using data.table

library(data.table)
setDT(Clean)[, .(Median = median(rep(medicare_average_payment, Frequency))) , State]

2 of 2

You can use by to do split the data frame and perform this function on each piece:

by(Clean, Clean$State, 
   FUN=function(x) median(rep(x$medicare_average_payment, x$Frequency))
)