Completely vectorized numpy solution

Here is the code I use. It's not an optimal one (which I'm unable to write with numpy), but still much faster and more reliable than accepted solution

def weighted_quantile(values, quantiles, sample_weight=None, 
                      values_sorted=False, old_style=False):
    """ Very close to numpy.percentile, but supports weights.
    NOTE: quantiles should be in [0, 1]!
    :param values: numpy.array with data
    :param quantiles: array-like with many quantiles needed
    :param sample_weight: array-like of the same length as `array`
    :param values_sorted: bool, if True, then will avoid sorting of
        initial array
    :param old_style: if True, will correct output to be consistent
        with numpy.percentile.
    :return: numpy.array with computed quantiles.
    """
    values = np.array(values)
    quantiles = np.array(quantiles)
    if sample_weight is None:
        sample_weight = np.ones(len(values))
    sample_weight = np.array(sample_weight)
    assert np.all(quantiles >= 0) and np.all(quantiles <= 1), \
        'quantiles should be in [0, 1]'

    if not values_sorted:
        sorter = np.argsort(values)
        values = values[sorter]
        sample_weight = sample_weight[sorter]

    weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight
    if old_style:
        # To be convenient with numpy.percentile
        weighted_quantiles -= weighted_quantiles[0]
        weighted_quantiles /= weighted_quantiles[-1]
    else:
        weighted_quantiles /= np.sum(sample_weight)
    return np.interp(quantiles, weighted_quantiles, values)

Examples:

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])

array([ 1. , 3.2, 9. ])

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.], sample_weight=[2, 1, 2, 4, 1])

array([ 1. , 3.2, 9. ])

Answer from Alleo on Stack Overflow
🌐
NumPy
numpy.org › doc › stable › reference › generated › numpy.quantile.html
numpy.quantile — NumPy v2.4 Manual
June 22, 2021 - For backward compatibility with previous versions of NumPy, quantile provides four additional discontinuous estimators. Like method='linear', all have m = 1 - q so that j = q*(n-1) // 1, but g is defined as follows. ... Weighted quantiles: More formally, the quantile at probability level \(q\) of a cumulative distribution function \(F(y)=P(Y \leq y)\) with probability measure \(P\) is defined as any number \(x\) that fulfills the coverage conditions
🌐
NumPy
numpy.org › devdocs › reference › generated › numpy.quantile.html
numpy.quantile — NumPy v2.5.dev0 Manual
For backward compatibility with previous versions of NumPy, quantile provides four additional discontinuous estimators. Like method='linear', all have m = 1 - q so that j = q*(n-1) // 1, but g is defined as follows. ... Weighted quantiles: More formally, the quantile at probability level \(q\) of a cumulative distribution function \(F(y)=P(Y \leq y)\) with probability measure \(P\) is defined as any number \(x\) that fulfills the coverage conditions
Top answer
1 of 13
83

Completely vectorized numpy solution

Here is the code I use. It's not an optimal one (which I'm unable to write with numpy), but still much faster and more reliable than accepted solution

def weighted_quantile(values, quantiles, sample_weight=None, 
                      values_sorted=False, old_style=False):
    """ Very close to numpy.percentile, but supports weights.
    NOTE: quantiles should be in [0, 1]!
    :param values: numpy.array with data
    :param quantiles: array-like with many quantiles needed
    :param sample_weight: array-like of the same length as `array`
    :param values_sorted: bool, if True, then will avoid sorting of
        initial array
    :param old_style: if True, will correct output to be consistent
        with numpy.percentile.
    :return: numpy.array with computed quantiles.
    """
    values = np.array(values)
    quantiles = np.array(quantiles)
    if sample_weight is None:
        sample_weight = np.ones(len(values))
    sample_weight = np.array(sample_weight)
    assert np.all(quantiles >= 0) and np.all(quantiles <= 1), \
        'quantiles should be in [0, 1]'

    if not values_sorted:
        sorter = np.argsort(values)
        values = values[sorter]
        sample_weight = sample_weight[sorter]

    weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight
    if old_style:
        # To be convenient with numpy.percentile
        weighted_quantiles -= weighted_quantiles[0]
        weighted_quantiles /= weighted_quantiles[-1]
    else:
        weighted_quantiles /= np.sum(sample_weight)
    return np.interp(quantiles, weighted_quantiles, values)

Examples:

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])

array([ 1. , 3.2, 9. ])

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.], sample_weight=[2, 1, 2, 4, 1])

array([ 1. , 3.2, 9. ])

2 of 13
20

This seems to be now implemented in statsmodels

from statsmodels.stats.weightstats import DescrStatsW
wq = DescrStatsW(data=np.array([1, 2, 9, 3.2, 4]), weights=np.array([0.0, 0.5, 1.0, 0.3, 0.5]))
wq.quantile(probs=np.array([0.1, 0.9]), return_pandas=False)
# array([2., 9.])

The DescrStatsW object also has other methods implemented, such as weighted mean, etc. https://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.DescrStatsW.html

🌐
GitHub
github.com › nudomarinero › wquantiles
GitHub - nudomarinero/wquantiles: weighted quantiles with Python
Weighted quantiles with Python, including weighted median. This library is based on numpy, which is the only dependence.
Starred by 53 users
Forked by 13 users
Languages   Python 100.0% | Python 100.0%
🌐
NumPy
numpy.org › doc › 2.0 › reference › generated › numpy.quantile.html
numpy.quantile — NumPy v2.0 Manual
Weighted quantiles: For weighted quantiles, the above coverage conditions still hold. The empirical cumulative distribution is simply replaced by its weighted version, i.e. \(P(Y \leq t) = \frac{1}{\sum_i w_i} \sum_i w_i 1_{x_i \leq t}\).
Top answer
1 of 3
2
import numpy as np
your_data    = [ 1.7 , 2.2 , 3.9 ]
your_weights = [ 2 , 1 , 5 ]
xw = np.repeat( your_data , your_weights )

You should obtain that your xw is

[ 1.7 , 1.7 , 2.2 , 3.9 , 3.9 , 3.9 , 3.9 , 3.9 ]

Unfortunately numpy doesn't have built in weighted functions for everything, but you can put things together in this way.

2 of 3
1

For simplicity, I'll assume that interpolation isn't needed, and that it suffices to find the individual nearest to the quantile point, where

Suppose that the population consists of individuals, sorted in ascending order of the values of some attribute. Suppose that there are different attribute values, and that individuals have the value of the attribute, for Then

Represent the individual as the centre of a notional continuous interval for Then the entire population occupies the interval and the quantile point is at We simplistically replace this with the nearest integer, rounding down in the ambiguous case when is an integer. Thus we take the quantile to be individual number for or number in the special case

Define the partial sums for These form a strictly increasing sequence where and For therefore, there exists a unique positive integer such that That means that the individual in the population has the attribute value.

In terms of this function if is the list of attribute values sorted into ascending order, then the quantile value of the attribute is (ignoring the special case ):

Here's a toy Python 3 module that does the job. I haven't tried it on any large arrays. For all I know, the way I've coded it may use tons of resources. (You'll surely need to recode it anyway, for instance to use interpolation.)

"""Compute quantiles: see https://math.stackexchange.com/q/3721765."""

__all__ = ['weighted']

import math, operator, itertools

class weighted(object):
    """
    Structure of repeated attribute values in ascending order.
    """
    
    def __init__(self, x, w):
        """
        Create sorted data from unsorted attribute values and their "weights".
        """
        self.xs, self.ws = zip(*sorted(zip(x, w), key=operator.itemgetter(0)))
        self.subtotals = list(itertools.accumulate(self.ws))
        self.N = self.subtotals[-1]
    
    def individual(self, q):
        """
        Identify individual member of population nearest to the q'th quantile.
        """
        return math.floor(q * self.N) + 1 if q < 1 else self.N
    
    def attribute(self, k):
        """
        Compute attribute index of k'th individual member of the population.
        """
        for i, M in enumerate(self.subtotals):
            if M >= k:
                return i
    
    def quantile(self, q):
        """
        Compute q'th quantile value of the attribute.
        """
        return self.xs[self.attribute(self.individual(q))]

def main():
    print('median = {}'.format(weighted([6, 4, 2],[1, 3, 5]).quantile(.5)))

if __name__ == '__main__':
    main()

Version 0.2

This is still a toy implementation. In particular, it still might be hugely inefficient (I haven't given any thought to that question), and it still hasn't been tested on any large datasets. What is nice about it is that the new class multilist is obviously capable of being considerably elaborated. (No doubt I'll tinker with it a lot, but there isn't likely to be any good reason to post my tinkerings here.)

I'm not sure how to post code in Maths.SE, so the indentation of the code isn't quite consistent.

"""Lists of items with multiplicity, analogous to multisets."""

__all__ = ['individual', 'multilist', 'quantile']

import math, itertools

def individual(q, N):
    """
    Number (1 to N) of individual near q'th quantile of population of size N.
    """
    return math.floor(q*N) + 1 if q < 1 else N

def quantile(x, q):
    """
    Compute the q'th quantile value of the given *sorted* (N.B.!) multilist x.
    """
    return x[individual(q, len(x))]

class multilist(object):
    """
    List of elements with multiplicity: similar to a multiset, whence the name.
    
    The multiplicity of each element is a positive integer. The purpose of the
    multilist is to behave like a list in which each element occurs many times,
    without actually having to store all of those occurrences.
    """

def __init__(self, x, w):
    """
    Create multilist from list of values and list of their multiplicities.
    """
    self.items = x
    self.times = w
    self.subtotals = list(itertools.accumulate(self.times))

def __len__(self):
    """
    Get the number of items in a list with multiplicities.
    
    The syntax needed to call this function is "len(x)", where x is the
    name of the multilist.
    """
    return self.subtotals[-1]

def __getitem__(self, k):
    """
    Find the k'th item in a list with multiplicities.
    
    If the multiplicities are m_1, m_2, ..., m_r (note that Python indices
    are 1 less, running from 0 to r - 1), and subtotals M_0, M_1, ..., M_r,
    where M_i = m_1 + m_2 + ... + m_i (i = 0, 1, ..., r), then we want the
    unique i (but the Python code uses i - 1) such that M_{i-1} < k <= M_i.
    
    The syntax needed to call this function is "x[k]", where x is the name
    of the multilist, and 1 <= k <= len(x).
    """
    for i, M in enumerate(self.subtotals):
        if M >= k:
            return self.items[i]

def sorted(self):
    """
    Return a sorted copy of the given multilist.
    
    Note on the implementation: by default, 2-tuples in Python are compared
    lexicographically, i.e. by the first element, or the second in the case
    of a tie; so there is no need for parameter key=operator.itemgetter(0).
    """
    return multilist(*zip(*sorted(zip(self.items, self.times))))

def main():
    data = multilist([6, 4, 2], [1, 3, 5]).sorted()
    print('median = {}'.format(quantile(data, .5)))

if __name__ == '__main__':
    main()
🌐
NumPy
numpy.org › doc › 2.1 › reference › generated › numpy.quantile.html
numpy.quantile — NumPy v2.1 Manual
For backward compatibility with previous versions of NumPy, quantile provides four additional discontinuous estimators. Like method='linear', all have m = 1 - q so that j = q*(n-1) // 1, but g is defined as follows. ... Weighted quantiles: More formally, the quantile at probability level \(q\) of a cumulative distribution function \(F(y)=P(Y \leq y)\) with probability measure \(P\) is defined as any number \(x\) that fulfills the coverage conditions
🌐
NumPy
numpy.org › doc › 2.2 › reference › generated › numpy.quantile.html
numpy.quantile — NumPy v2.2 Manual
For backward compatibility with previous versions of NumPy, quantile provides four additional discontinuous estimators. Like method='linear', all have m = 1 - q so that j = q*(n-1) // 1, but g is defined as follows. ... Weighted quantiles: More formally, the quantile at probability level \(q\) of a cumulative distribution function \(F(y)=P(Y \leq y)\) with probability measure \(P\) is defined as any number \(x\) that fulfills the coverage conditions
Find elsewhere
🌐
PyPI
pypi.org › project › wquantiles
wquantiles · PyPI
Weighted quantiles, including weighted median, based on numpy
      » pip install wquantiles
    
Published   May 26, 2021
Version   0.6
🌐
NumPy
numpy.org › doc › 2.2 › reference › generated › numpy.nanquantile.html
numpy.nanquantile — NumPy v2.2 Manual
An array of weights associated with the values in a. Each value in a contributes to the quantile according to its associated weight. The weights array can either be 1-D (in which case its length must be the size of a along the given axis) or of the same shape as a.
🌐
SciPy
docs.scipy.org › doc › scipy › reference › generated › scipy.stats.quantile.html
quantile — SciPy v1.17.0 Manual
Frequency weights; e.g., for counting number weights, quantile(x, p, weights=weights) is equivalent to quantile(np.repeat(x, weights), p). Values other than finite counting numbers are accepted, but may not have valid statistical interpretations. Not compatible with method='harrell-davis' or those that begin with 'round_'. Returns: quantilescalar or ndarray · The resulting quantile(s). The dtype is the result dtype of x and p. See also · numpy.quantile ·
🌐
NumPy
numpy.org › doc › 2.3 › reference › generated › numpy.quantile.html
numpy.quantile — NumPy v2.3 Manual
For backward compatibility with previous versions of NumPy, quantile provides four additional discontinuous estimators. Like method='linear', all have m = 1 - q so that j = q*(n-1) // 1, but g is defined as follows. ... Weighted quantiles: More formally, the quantile at probability level \(q\) of a cumulative distribution function \(F(y)=P(Y \leq y)\) with probability measure \(P\) is defined as any number \(x\) that fulfills the coverage conditions
🌐
Medium
medium.com › @amit25173 › understanding-quartiles-in-numpy-step-by-step-80fb48b5587a
Understanding Quartiles in NumPy (Step-by-Step) | by Amit Yadav | Medium
February 8, 2025 - NumPy interpolates values when the dataset has an even number of elements. That’s why Q1 and Q3 might not be exact values from the dataset but calculated as a weighted average between two closest values.
🌐
NumPy
numpy.org › doc › stable › reference › generated › numpy.percentile.html
numpy.percentile — NumPy v2.4 Manual
An array of weights associated with the values in a. Each value in a contributes to the percentile according to its associated weight. The weights array can either be 1-D (in which case its length must be the size of a along the given axis) or of the same shape as a.
🌐
GitHub
github.com › numpy › numpy › issues › 8935
Weighted quantile option in nanpercentile() · Issue #8935 · numpy/numpy
April 12, 2017 - >>> out = weighted_quantile(da=ar, q=[0.25, 0.5, 0.75], dim=['x', 'y'], w_dict={'x': [1, 1]}, interpolation='nearest') >>> out <xarray.DataArray (quantile: 3, z: 2)> array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) Coordinates: * z (z) int64 8 9 * quantile (quantile) float64 0.25 0.5 0.75 >>> np.nanpercentile(da_stacked, q=[25, 50, 75], axis=-1, interpolation='nearest') array([[ 8., 1.], [ 8., 3.], [ 8., 3.]]) We wonder if it's ok to make this feature part of numpy, probably in np.nanpercentile?
🌐
GitHub
github.com › nudomarinero › wquantiles › blob › master › wquantiles.py
wquantiles/wquantiles.py at master · nudomarinero/wquantiles
Library to compute weighted quantiles, including the weighted median, of · numpy arrays. """ from __future__ import print_function · import numpy as np · · __version__ = "0.4" · · def quantile_1D(data, weights, quantile): """ Compute the weighted quantile of a 1D numpy array.
Author   nudomarinero
🌐
Xarray
docs.xarray.dev › en › v2025.04.0 › generated › xarray.computation.weighted.DatasetWeighted.quantile.html
xarray.computation.weighted.DatasetWeighted.quantile
April 29, 2025 - Apply a weighted quantile to this Dataset’s data along some dimension(s). Weights are interpreted as sampling weights (or probability weights) and describe how a sample is scaled to the whole population [1]. There are other possible interpretations for weights, precision weights describing the precision of observations, or frequency weights counting the number of identical observations, however, they are not implemented here. For compatibility with NumPy’s non-weighted quantile (which is used by DataArray.quantile and Dataset.quantile), the only interpolation method supported by this weighted version corresponds to the default “linear” option of numpy.quantile.
🌐
GitHub
github.com › numpy › numpy › issues › 6326
weighted percentile · Issue #6326 · numpy/numpy
September 17, 2015 - Support for weights in percentile would be nice to have. A quick look suggests https://github.com/nudomarinero/wquantiles; I'd be happy to make a PR out of this implementation if there's in...
Author   anntzer
🌐
NumPy
numpy.org › devdocs › reference › generated › numpy.percentile.html
numpy.percentile — NumPy v2.5.dev0 Manual
An array of weights associated with the values in a. Each value in a contributes to the percentile according to its associated weight. The weights array can either be 1-D (in which case its length must be the size of a along the given axis) or of the same shape as a.