Videos
your approach is right, it is similar to the Schwartzian transform or Decorate-Sort-Undecorate (DSU) idiom
As I said you can use the numpy function np.argsort. It does the work of your order_to_index.
For a more explicit answer, suppose we have an array x and want to sort the rows according to some function func which takes a row of x and outputs a scalar.
x[np.apply_along_axis(func, axis=1, arr=x).argsort()]
For this example
c1, c2 = 4, 7
x = np.array([
[0, 1],
[2, 3],
[4, -5]
])
x[np.apply_along_axis(lambda row: c1 * / c2 * row[1] + row[0], 1, x).argsort()]
Out:
array([[ 0, 1],
[ 4, -5],
[ 2, 3]])
In this case, np.apply_along_axis isn't even necessary.
x[(c1 / c2 * x[:,1] + x[:,0]).argsort()]
Out:
array([[ 0, 1],
[ 4, -5],
[ 2, 3]])
According to the documentation
Returns the indices that would sort an array.
2is the index of0.0.3is the index of0.1.1is the index of1.41.0is the index of1.48.
[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.
There are a number of ways to get the result you are looking for:
import numpy as np
import scipy.stats as stats
def using_indexed_assignment(x):
"https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
result = np.empty(len(x), dtype=int)
temp = x.argsort()
result[temp] = np.arange(len(x))
return result
def using_rankdata(x):
return stats.rankdata(x)-1
def using_argsort_twice(x):
"https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
return np.argsort(np.argsort(x))
def using_digitize(x):
unique_vals, index = np.unique(x, return_inverse=True)
return np.digitize(x, bins=unique_vals) - 1
For example,
In [72]: x = np.array([1.48,1.41,0.0,0.1])
In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])
This checks that they all produce the same result:
x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
assert np.allclose(expected, func(x))
These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:
In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop
In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop
In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop
In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop
For small arrays, using_argsort_twice may be faster:
In [78]: x = np.random.random(10**2)
In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop
In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop
In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop
In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop
Note also that stats.rankdata gives you more control over how to handle elements of equal value.
There is no built-in function, but it's easy to assemble one out of the terrific tools Python makes available:
def argsort(seq):
# http://stackoverflow.com/questions/3071415/efficient-method-to-calculate-the-rank-vector-of-a-list-in-python
return sorted(range(len(seq)), key=seq.__getitem__)
x = [5,2,1,10]
print(argsort(x))
# [2, 1, 0, 3]
It works on Python array.arrays the same way:
import array
x = array.array('d', [5, 2, 1, 10])
print(argsort(x))
# [2, 1, 0, 3]
I timed the suggestions above and here are my results.
import timeit
import random
import numpy as np
def f(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#non-lambda version by Tony Veijalainen
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]
def g(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#lambda version by Tony Veijalainen
return [x for x,y in sorted(enumerate(seq), key = lambda x: x[1])]
def h(seq):
#http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3382369#3382369
#by unutbu
return sorted(range(len(seq)), key=seq.__getitem__)
seq = list(range(10000))
random.shuffle(seq)
n_trials = 100
for cmd in [
'f(seq)', 'g(seq)', 'h(seq)', 'np.argsort(seq)',
'np.argsort(seq).tolist()'
]:
t = timeit.Timer(cmd, globals={**globals(), **locals()})
print('time for {:d}x {:}: {:.6f}'.format(n_trials, cmd, t.timeit(n_trials)))
output
time for 100x f(seq): 0.323915
time for 100x g(seq): 0.235183
time for 100x h(seq): 0.132787
time for 100x np.argsort(seq): 0.091086
time for 100x np.argsort(seq).tolist(): 0.104226
A problem size dependent analysis is given here.