Here's one approach:
Use numpy.unique to both sort the array and remove duplicate items. Pass the return_inverse argument to get the indices into the sorted array that give the values of the original array. Then, you can get all of the indices of the tied items by finding the indices of the inverse array whose values are equal to the index into the unique array for that item.
For example:
foo = array([3, 1, 4, 0, 1, 0])
foo_unique, foo_inverse = unique(foo, return_inverse=True)
# Put largest items first
foo_unique = foo_unique[::-1]
foo_inverse = -foo_inverse + len(foo_unique) - 1
foo_top3 = foo_unique[:3]
# Get the indices into foo of the top item
first_indices = (foo_inverse == 0).nonzero()
# Choose one at random
first_random_idx = random.choice(first_indices)
second_indices = (foo_inverse == 1).nonzero()
second_random_idx = random.choice(second_indices)
# And so on...
numpy.unique is implemented using argsort, so a glance at its implementation might suggest a simpler approach.
Videos
What about simply this?
(-foo).argsort(kind='mergesort')[:3]
Why this works:
Argsorting in descending order (not what np.argsort does) is the same as argsorting in ascending order (what np.argsort does) the opposite values. You then just need to pick the first 3 sorted indices. Now all you need is make sure that the sort is stable, meaning in case of ties, keep first index first.
NOTE: I thought the default kind=quicksort was stable but from the doc it appears only kind=mergesort is guaranteed to be stable: (https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html)
The various sorting algorithms are characterized by their average speed, worst case performance, work space size, and whether they are stable. A stable sort keeps items with the same key in the same relative order. The three available algorithms have the following properties:
kind speed worst case work space stable
‘quicksort’ 1 O(n^2) 0 no
‘mergesort’ 2 O(n*log(n)) ~n/2 yes
‘heapsort’ 3 O(n*log(n)) 0 no
This is an extremely hacky answer, but why don't you just argsort the array in reverse? That way argsort picks the last index (in reverse), which is the first index.
This translates to:
>>> foo = np.array([3, 1, 4, 0, 1, 0])
>>> foo.argsort()[::-1]
array([2, 0, 4, 1, 5, 3])
>>> foo.size - 1 - foo[::-1].argsort()[::-1]
array([2, 0, 1, 4, 3, 5])
According to the documentation
Returns the indices that would sort an array.
2is the index of0.0.3is the index of0.1.1is the index of1.41.0is the index of1.48.
[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.
There are a number of ways to get the result you are looking for:
import numpy as np
import scipy.stats as stats
def using_indexed_assignment(x):
"https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
result = np.empty(len(x), dtype=int)
temp = x.argsort()
result[temp] = np.arange(len(x))
return result
def using_rankdata(x):
return stats.rankdata(x)-1
def using_argsort_twice(x):
"https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
return np.argsort(np.argsort(x))
def using_digitize(x):
unique_vals, index = np.unique(x, return_inverse=True)
return np.digitize(x, bins=unique_vals) - 1
For example,
In [72]: x = np.array([1.48,1.41,0.0,0.1])
In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])
This checks that they all produce the same result:
x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
assert np.allclose(expected, func(x))
These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:
In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop
In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop
In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop
In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop
For small arrays, using_argsort_twice may be faster:
In [78]: x = np.random.random(10**2)
In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop
In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop
In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop
In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop
Note also that stats.rankdata gives you more control over how to handle elements of equal value.