How do they achieve internally that you are able to pass something like x > 5 into a method?
The short answer is that they don't.
Any sort of logical operation on a numpy array returns a boolean array. (i.e. __gt__, __lt__, etc all return boolean arrays where the given condition is true).
E.g.
x = np.arange(9).reshape(3,3)
print x > 5
yields:
array([[False, False, False],
[False, False, False],
[ True, True, True]], dtype=bool)
This is the same reason why something like if x > 5: raises a ValueError if x is a numpy array. It's an array of True/False values, not a single value.
Furthermore, numpy arrays can be indexed by boolean arrays. E.g. x[x>5] yields [6 7 8], in this case.
Honestly, it's fairly rare that you actually need numpy.where but it just returns the indicies where a boolean array is True. Usually you can do what you need with simple boolean indexing.
How does python numpy.where() work? - Stack Overflow
`where` clauses in List comprehension - Ideas - Discussions on Python.org
How does np.where() work?
How can I find where Python is installed on Windows? - Stack Overflow
Videos
How do they achieve internally that you are able to pass something like x > 5 into a method?
The short answer is that they don't.
Any sort of logical operation on a numpy array returns a boolean array. (i.e. __gt__, __lt__, etc all return boolean arrays where the given condition is true).
E.g.
x = np.arange(9).reshape(3,3)
print x > 5
yields:
array([[False, False, False],
[False, False, False],
[ True, True, True]], dtype=bool)
This is the same reason why something like if x > 5: raises a ValueError if x is a numpy array. It's an array of True/False values, not a single value.
Furthermore, numpy arrays can be indexed by boolean arrays. E.g. x[x>5] yields [6 7 8], in this case.
Honestly, it's fairly rare that you actually need numpy.where but it just returns the indicies where a boolean array is True. Usually you can do what you need with simple boolean indexing.
Old Answer it is kind of confusing. It gives you the LOCATIONS (all of them) of where your statment is true.
so:
>>> a = np.arange(100)
>>> np.where(a > 30)
(array([31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
99]),)
>>> np.where(a == 90)
(array([90]),)
a = a*40
>>> np.where(a > 1000)
(array([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99]),)
>>> a[25]
1000
>>> a[26]
1040
I use it as an alternative to list.index(), but it has many other uses as well. I have never used it with 2D arrays.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
New Answer It seems that the person was asking something more fundamental.
The question was how could YOU implement something that allows a function (such as where) to know what was requested.
First note that calling any of the comparison operators do an interesting thing.
a > 1000
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True`, True, True, True, True, True, True, True, True, True], dtype=bool)`
This is done by overloading the "__gt__" method. For instance:
>>> class demo(object):
def __gt__(self, item):
print item
>>> a = demo()
>>> a > 4
4
As you can see, "a > 4" was valid code.
You can get a full list and documentation of all overloaded functions here: http://docs.python.org/reference/datamodel.html
Something that is incredible is how simple it is to do this. ALL operations in python are done in such a way. Saying a > b is equivalent to a.gt(b)!
My understanding of np.where() was so far as sort of an arg container for .loc. However in the example below, .loc works, but not when np.where is used as arg. Why is that happening?
df=pd.DataFrame({'Name':['Tom', 'Mia', 'Sam'], 'Age':[15, 26, 32]}, index=['A','B','C'])
print(df.loc[(df['Age']>28)&(df['Name'].str.startswith('S')])
k=np.where((df['Age']>28)&(df['Name'].str.startswith('S'))
print(df.loc[k])
In your Python interpreter, type the following commands:
>>> import os, sys
>>> os.path.dirname(sys.executable)
'C:\\Python25'
Also, you can club all these and use a single line command. Open cmd and enter following command
python -c "import os, sys; print(os.path.dirname(sys.executable))"
If you have Python in your environment variable then you can use the following command in cmd or powershell:
where python
or for Unix enviroment
which python
command line image :
