pandas get percentile of value in column

stackoverflow.com › questions › 44824927 › calculate-percentile-of-value-in-column

To find the percentile of a value relative to an array (or in your case a dataframe column), use the scipy function stats.percentileofscore().

For example, if we have a value x (the other numerical value not in the dataframe), and a reference array, arr (the column from the dataframe), we can find the percentile of x by:

from scipy import stats
percentile = stats.percentileofscore(arr, x)

Note that there is a third parameter to the stats.percentileofscore() function that has a significant impact on the resulting value of the percentile, viz. kind. You can choose from rank, weak, strict, and mean. See the docs for more information.

For an example of the difference:

>>> df
   a
0  1
1  2
2  3
3  4
4  5

>>> stats.percentileofscore(df['a'], 4, kind='rank')
80.0

>>> stats.percentileofscore(df['a'], 4, kind='weak')
80.0

>>> stats.percentileofscore(df['a'], 4, kind='strict')
60.0

>>> stats.percentileofscore(df['a'], 4, kind='mean')
70.0

As a final note, if you have a value that is greater than 80% of the other values in the column, it would be in the 80th percentile (see the example above for how the kind method affects this final score somewhat) not the 20th percentile. See this Wikipedia article for more information.

Answer from wingr on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 44824927 › calculate-percentile-of-value-in-column

python - Calculate percentile of value in column - Stack Overflow

Top answer

1 of 5

To find the percentile of a value relative to an array (or in your case a dataframe column), use the scipy function stats.percentileofscore().

For example, if we have a value x (the other numerical value not in the dataframe), and a reference array, arr (the column from the dataframe), we can find the percentile of x by:

from scipy import stats
percentile = stats.percentileofscore(arr, x)

For an example of the difference:

>>> df
   a
0  1
1  2
2  3
3  4
4  5

>>> stats.percentileofscore(df['a'], 4, kind='rank')
80.0

>>> stats.percentileofscore(df['a'], 4, kind='weak')
80.0

>>> stats.percentileofscore(df['a'], 4, kind='strict')
60.0

>>> stats.percentileofscore(df['a'], 4, kind='mean')
70.0

2 of 5

Probably very late but still

df['column_name'].describe()

will give you the regular 25, 50 and 75 percentile with some additional data but if you want percentiles for some specific values then

df['column_name'].describe(percentiles=[0.1, 0.2, 0.3, 0.5])

This will give you 10th, 20th, 30th and 50th percentiles. You can give as many values as you want.

The resulting object can be accessed like a dict:

desc = df['column_name'].describe(percentiles=[0.1, 0.2, 0.3, 0.5])
print(desc)
print(desc['10%'])

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.quantile.html

pandas.DataFrame.quantile — pandas 3.0.1 documentation

Value between 0 <= q <= 1, the quantile(s) to compute. axis{0 or ‘index’, 1 or ‘columns’}, default 0

Discussions

python - Find percentile stats of a given column - Stack Overflow

You can even give multiple columns with null values and get multiple quantile values (I use 95 percentile for outlier treatment) More on stackoverflow.com

stackoverflow.com

python - Finding the percentile in pandas column - Stack Overflow

Now I want to search through for ... of column 'D' and if the particular zone is below it add the row to a datagram. ... for the first city 'abc' and date 1/1/2020 we have three zones 'AA','CC' and 'DD' which have the corresponding 'D' column as 22,32 and 44. So the 10th percentile is 24 so first row gets appended to ... More on stackoverflow.com

stackoverflow.com

python - How do I get the percentile for a row in a pandas dataframe? - Stack Overflow

Please note that this strategy only makes sense if you want to query a large enough number of values. Otherwise the sorting is too expensive. ... In order to get the percentile of a column in pandas Dataframe with respect to another categorical column More on stackoverflow.com

stackoverflow.com

Percentile range output across multiple columns in python/pandas

df.groupby("type").agg("median") will get you the 50th percentile for "Hello" and "OK". I can't remember off the top of my head, but you can pass a list of aggregations to agg (such as mean, count, min) to get multiple columns as well as use custom functions. More on reddit.com

r/learnpython

February 6, 2021

Saturn Cloud

saturncloud.io › blog › how-to-find-percentile-stats-of-a-given-column-using-pandas

How to Find Percentile Stats of a Given Column Using Pandas | Saturn Cloud Blog

December 26, 2023 - The quantile() function is used to find the percentile statistics of a given column in a Pandas DataFrame. We can use this function to find any percentile, such as the median (50th percentile), first quartile (25th percentile), third quartile ...

Medium

medium.com › @amit25173 › understanding-percentiles-in-pandas-369166d19e76

Understanding Percentiles in Pandas | by Amit Yadav | Medium

March 6, 2025 - In these cases, pandas interpolates (estimates) the value. But what if you want more control over how it does that? Enter the interpolation parameter. ... # Using custom interpolation percentile = df_with_nan['Scores'].quantile(0.75, interpolation='nearest') print("75th Percentile (nearest interpolation):", percentile) ... Here, instead of estimating, pandas picks the nearest actual value in the dataset.

Statology

statology.org › home › how to calculate percentile rank in pandas (with examples)

How to Calculate Percentile Rank in Pandas (With Examples)

August 30, 2022 - Here’s how to interpret the values in the percent_rank column: 14.3% of the points values for team A are equal to or less than 2. 35.7% of the points values for team A are equal to or less than 5. 57.1% of the points values for team A are equal to or less than 7. ... How to Calculate Percent Change in Pandas How to Calculate Cumulative Percentage in Pandas How to Calculate Percentage of Total Within Group in Pandas

GeeksforGeeks

geeksforgeeks.org › python › percentile-rank-of-a-column-in-a-pandas-dataframe

Percentile rank of a column in a Pandas DataFrame - GeeksforGeeks

July 15, 2025 - Let us see how to find the percentile rank of a column in a Pandas DataFrame. We will use the rank() function with the argument pct = True to find the percentile rank.

TutorialsPoint

tutorialspoint.com › percentile-rank-of-a-column-in-a-pandas-dataframe

Percentile Rank of a Column in a Pandas DataFrame

July 25, 2023 - In the end, assign the resulting percentiles to a new column called 'Per_Rank' and display the result using 'print()' method. # importing packages import pandas as pd import numpy as np # defining a sample DataFrame using pandas data = {'Name': ['Ram', 'Shyam', 'Shrey', 'Mohan', 'Navya'], 'Score': [75, 82, 68, 90, 88] } df = pd.DataFrame(data) # Calculating the percentile rank using numpy df['Per_Rank'] = np.percentile(df['Score'], df['Score'], method = 'nearest') # to show the result print(df)

Statology

statology.org › home › how to calculate percentiles in python (with examples)

How to Calculate Percentiles in Python (With Examples)

November 3, 2020 - import numpy as np import pandas as pd #create DataFrame df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35], 'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15], 'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]}) #find 90th percentile of var1 column np.percentile(df.var1, 95) 34.1 · The following code shows how to find the 95th percentile value for a several columns in a pandas DataFrame:

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 39581893 › find-percentile-stats-of-a-given-column

python - Find percentile stats of a given column - Stack Overflow

Top answer

1 of 6

183

You can use the pandas.DataFrame.quantile() function.
- If you look at the API for quantile(), you will see it takes an argument for how to do interpolation. If you want a quantile that falls between two positions in your data:
  - 'linear', 'lower', 'higher', 'midpoint', or 'nearest'.
  - By default, it performs linear interpolation.
  - These interpolation methods are discussed in the Wikipedia article for percentile

import pandas as pd
import numpy as np

# sample data 
np.random.seed(2023)  # for reproducibility
data = {'Category': np.random.choice(['hot', 'cold'], size=(10,)),
        'field_A': np.random.randint(0, 100, size=(10,)),
        'field_B': np.random.randint(0, 100, size=(10,))}
df = pd.DataFrame(data)

df.field_A.mean()  # Same as df['field_A'].mean()
# 51.1

df.field_A.median() 
# 50.0

# You can call `quantile(i)` to get the i'th quantile,
# where `i` should be a fractional number.

df.field_A.quantile(0.1)  # 10th percentile
# 15.6

df.field_A.quantile(0.5)  # same as median
# 50.0

df.field_A.quantile(0.9)  # 90th percentile
# 88.8

df.groupby('Category').field_A.quantile(0.1)
#Category
#cold    28.8
#hot      8.6
#Name: field_A, dtype: float64

`df`

  Category  field_A  field_B
0     cold       96       58
1     cold       22       28
2      hot       17       81
3     cold       53       71
4     cold       47       63
5      hot       77       48
6     cold       39       32
7      hot       69       29
8      hot       88       49
9      hot        3       49

2 of 6

assume series s

s = pd.Series(np.arange(100))

Get quantiles for [.1, .2, .3, .4, .5, .6, .7, .8, .9]

s.quantile(np.linspace(.1, 1, 9, 0))

0.1     9.9
0.2    19.8
0.3    29.7
0.4    39.6
0.5    49.5
0.6    59.4
0.7    69.3
0.8    79.2
0.9    89.1
dtype: float64

s.quantile(np.linspace(.1, 1, 9, 0), 'lower')

0.1     9
0.2    19
0.3    29
0.4    39
0.5    49
0.6    59
0.7    69
0.8    79
0.9    89
dtype: int32

GeeksforGeeks

geeksforgeeks.org › python › calculate-arbitrary-percentile-on-pandas-groupby

Calculate Arbitrary Percentile on Pandas GroupBy - GeeksforGeeks

July 23, 2025 - The 90th percentile tells you the value below which 90% of the data lies. In Pandas, we can easily compute percentiles using the quantile() function. It can also be calculated using the NumPy percentile() function.

Skytowner

skytowner.com › explore › calculating_percentiles_of_a_dataframe_in_pandas

Calculating percentiles of a DataFrame in Pandas

Applying a function to multiple ... values within groups ... Master the mathematics behind data science with 100+ top-tier guides Start your free 7-days trial now! To calculate percentiles in Pandas, use the quantile(~) method. ... By default the quantile(~) method computes percentiles column-wise.

Enterprise DNA

blog.enterprisedna.co › pandas-percentile-calculate-percentiles-of-a-dataframe

Pandas Percentile: Calculate Percentiles of a Dataframe – Master Data Skills + AI

This will return the value below which 25% of the data in the Series falls. To find the 25th percentile (also known as the first quartile) in a Pandas DataFrame or Series, you can use the quantile() function with 0.25 as the argument. For instance, if you have a DataFrame df and you want to ...

datagy

datagy.io › home › pandas tutorials › data analysis in pandas › pandas quantile: calculate percentiles of a dataframe

Pandas Quantile: Calculate Percentiles of a Dataframe • datagy

April 16, 2023 - The q= argument accepts either a single number or an array of numbers that we want to calculate. If we wanted to calculate multiple percentiles, we simply pass in a list of values for the different percentiles we want to calculate.

IncludeHelp

includehelp.com › python › find-percentile-stats-of-a-given-column.aspx

Pandas: Find percentile stats of a given column

September 24, 2023 - # Importing pandas package import pandas as pd # Creating a dictionary d = { 'A':[90,72,56,76,82,34], 'B':[50,56,72,80,53,78] } # Creating a DataFrame df = pd.DataFrame(d,index=['a','b','c','d','e','f']) # Display original DataFrame print("Original DataFrame:\n",df,"\n") # Get multiple statistical result print("MEAN:\n",df['A'].mean(),"\n") print("MEAN:\n",df['A'].median(),"\n") # Calculating percentile using quantile print("10th Percentile:\n",df['A'].quantile(0.1),"\n") print("50th Percentile:\n",df['A'].quantile(0.5),"\n") print("90th Percentile:\n",df['A'].quantile(0.9),"\n")

Stack Overflow

stackoverflow.com › questions › 61616428 › finding-the-percentile-in-pandas-column

python - Finding the percentile in pandas column - Stack Overflow

you can use groupby.transform with quantile that will give row-wise the 10th percentile of the group, and then use loc to get only the rows where the value in D is lower than or equal (le) to this 10th percentile value. print (df.loc[df['D'...

Stack Overflow

stackoverflow.com › questions › 50804120 › how-do-i-get-the-percentile-for-a-row-in-a-pandas-dataframe

python - How do I get the percentile for a row in a pandas dataframe? - Stack Overflow

Top answer

1 of 3

TL; DR

Use

sz = temp['INCOME'].size-1
temp['PCNT_LIN'] = temp['INCOME'].rank(method='max').apply(lambda x: 100.0*(x-1)/sz)

   INCOME    PCNT_LIN
0      78   44.444444
1      38   11.111111
2      42   22.222222
3      48   33.333333
4      31    0.000000
5      89   55.555556
6      94   66.666667
7     102   77.777778
8     122  100.000000
9     122  100.000000

Answer

It is actually very simple, once your understand the mechanics. When you are looking for percentile of a score, you already have the scores in each row. The only step left is understanding that you need percentile of numbers that are less or equal to the selected value. This is exactly what parameters kind='weak' of scipy.stats.percentileofscore() and method='average' of DataFrame.rank() do. In order to invert it, run Series.quantile() with interpolation='lower'.

So, the behavior of the scipy.stats.percentileofscore(), Series.rank() and Series.quantile() is consistent, see below:

In[]:
temp = pd.DataFrame([  78, 38, 42, 48, 31, 89, 94, 102, 122, 122], columns=['INCOME'])
temp['PCNT_RANK']=temp['INCOME'].rank(method='max', pct=True)
temp['POF']  = temp['INCOME'].apply(lambda x: scipy.stats.percentileofscore(temp['INCOME'], x, kind='weak'))
temp['QUANTILE_VALUE'] = temp['PCNT_RANK'].apply(lambda x: temp['INCOME'].quantile(x, 'lower'))
temp['RANK']=temp['INCOME'].rank(method='max')
sz = temp['RANK'].size - 1 
temp['PCNT_LIN'] = temp['RANK'].apply(lambda x: (x-1)/sz)
temp['CHK'] = temp['PCNT_LIN'].apply(lambda x: temp['INCOME'].quantile(x))

temp

Out[]:
   INCOME  PCNT_RANK    POF  QUANTILE_VALUE  RANK  PCNT_LIN    CHK
0      78        0.5   50.0              78   5.0  0.444444   78.0
1      38        0.2   20.0              38   2.0  0.111111   38.0
2      42        0.3   30.0              42   3.0  0.222222   42.0
3      48        0.4   40.0              48   4.0  0.333333   48.0
4      31        0.1   10.0              31   1.0  0.000000   31.0
5      89        0.6   60.0              89   6.0  0.555556   89.0
6      94        0.7   70.0              94   7.0  0.666667   94.0
7     102        0.8   80.0             102   8.0  0.777778  102.0
8     122        1.0  100.0             122  10.0  1.000000  122.0
9     122        1.0  100.0             122  10.0  1.000000  122.0

Now in a column PCNT_RANK you get ratio of values that are smaller or equal to the one in a column INCOME. But if you want the "interpolated" ratio, it is in column PCNT_LIN. And as you use Series.rank() for calculations, it is pretty fast and will crunch you 255M numbers in seconds.

Here I will explain how you get the value from using quantile() with linear interpolation:

temp['INCOME'].quantile(0.11)
37.93

Our data temp['INCOME'] has only ten values. According to the formula from your link to Wiki the rank of 11th percentile is

rank = 11*(10-1)/100 + 1 = 1.99

The truncated part of the rank is 1, which corresponds to the value 31, and the value with the rank 2 (i.e. next bin) is 38. The value of fraction is the fractional part of the rank. This leads to the result:

 31 + (38-31)*(0.99) = 37.93

For the values themselves, the fraction part have to be zero, so it is very easy to do the inverse calculation to get percentile:

p = (rank - 1)*100/(10 - 1)

I hope I made it more clear.

2 of 3

This seems to work:

A = np.sort(temp['INCOME'].values)
np.interp(sample, A, np.linspace(0, 1, len(A)))

For example:

>>> temp.INCOME.quantile(np.interp([37.5, 38, 122, 121], A, np.linspace(0, 1, len(A))))
0.103175     37.5
0.111111     38.0
1.000000    122.0
0.883333    121.0
Name: INCOME, dtype: float64

Please note that this strategy only makes sense if you want to query a large enough number of values. Otherwise the sorting is too expensive.

Python Forum

python-forum.io › thread-37787.html

pandas column percentile

Dear All, I have a time series in a dataframe that contain vaues for each day for many years. How can I compute the average of values above a certain percentile for, say, each month. Thank you

Data Science Parichay

datascienceparichay.com › home › blog › calculate percentile in python

Calculate Percentile in Python - Data Science Parichay

October 8, 2021 - Let’s now calculate the 95th percentile value for the “Day” column. Note that when using the pandas quantile() function pass the value of the nth percentile as a fractional value.

reddit.com › r/learnpython › percentile range output across multiple columns in python/pandas

r/learnpython on Reddit: Percentile range output across multiple columns in python/pandas

February 6, 2021 -

I have a dataset, df, where I would like to showcase the 60th, 70th, and 90th percentile values for given values in a column

DATA

type value

Hello 1

Hello 2

Hello 3

Hello 5

Hello 6

Hello 8

Hello 3

OK 1

OK 2

DESIRED

type 0.6 0.7 0.9

Hello 5 5.6 8

OK 1.8 2 2

DOING

My approach is to utilize the percentile function in numpy:

import numpy as np

print np.percentile(df,60)

print np.percentile(df,70)

print np.percentile(df,90)

This works, however, the output shows these values individually and does not maintain the other columns in the dataset

Top answer

1 of 2

2 of 2

Groupby has a built in quantile method: df = pd.DataFrame({"type": ["Hello", "Hello", "Hello", "Hello", "Hello", "Hello", "Hello", "Hello", "Hello", "OK", "OK", "OK", "OK"], "value": [1.0, 2.0, 3.0, 5.0, 5.0, 6.0, 8.0, 8.0, 3.0, 1.0, 1.0, 2.0, 2.0]}) df.groupby('type').quantile([.6, .7, .9])['value'].unstack() 0.6 0.7 0.9 type Hello 5.0 5.6 8.0 OK 1.8 2.0 2.0

Pandas

pandas.pydata.org › docs › reference › api › pandas.Series.quantile.html

pandas.Series.quantile — pandas 3.0.1 documentation

If q is an array, a Series will be returned where the index is q and the values are the quantiles, otherwise a float will be returned. ... Calculate the rolling quantile. ... Returns the q-th percentile(s) of the array elements.