You are actually comparing two vectorized implementations and the later is more expensive due to the memory hierarchy overheads (as pointed ou by @hpaulj in comments). The overhead of CPython loops is small in both cases.
More specifically, the former code operates on arrays of size 240*320 = 76800. which fits in the L2 cache on all mainstream CPUs while the later operate on arrays of size 3_840_000 which does not fit in the L2 cache but the L3 on some CPU or only the RAM on others. The L3 cache is slower than the L2 and the RAM si much slower than the L3 cache. CPU caches closer to cores are much faster but also much smaller. Because of that, doing the computation chunk by chunk like you do in the first code is a good practice to improve temporal memory locality and thus performance.
One reason why temporal memory locality is improved here is because the internal allocator generally tends to recycle array (that is providing back the address of a newly deleted array of a similar size). This often happens on temporary arrays created by Numpy. Another reason is that temporary array are written and often directly read again and reading the array back is faster if the array is in a fast CPU cache. For example, FRAMES == 3 creates and fill a boolean array which is directly read back by the astype function.
On my machine with a i5-9600KF CPU, the L3 is large enough (9 MiB) to (mostly) hold the arrays so the difference in performance is not so huge. I expect this not to be the case on your machine.
However, note that .astype(np.uint8) is rather expensive here and not even needed in this case. Indeed, you can use .view(np.uint8). The former create a new array and convert items while the later just do a reinterpret cast of the item in memory without any copy. The later is only safe if you know exactly what you are doing. Here it is safe since the input (np.bool_) and output (np.uint8) types are of the same size (1 byte), both are unsigned and of a compatible kind (boolean versus integers). Once this modification is done, the code is much faster.
Here are performance results on my machine (with Cpython 3.8.1, Numpy 1.24.4 on Windows):
Initial implementation:
- Native Python: 5.990550100000291
- Vectorized: 9.843130900000233
Optimized implementation (using np.view):
- Native Python: 0.9843445999999858
- Vectorized: 2.8540657999997165
In the optimized version of the "Native Python" implementation, the big input array completely fit in the L3 and small temporary array do not impact is much so data is not evicted. In the second implementation data is more often evicted from the L3 cache because the temporary array is relatively big (the minimum space required is 3840000*2=7680000 bytes assuming the allocator and the cache optimal, but they are not and there is not enough space in my L3 cache for 3 times the array size, that is 3840000*3=11520000 bytes, hence a significantly higher execution time, not to mention the "Native Python" mostly operates in the L1 cache for temporary arrays).
Videos
You seem a bit confused as to how numpy arrays work behind the scenes. Each item in an array must be the same size.
The string representation of a float doesn't work this way. For example, repr(1.3) yields '1.3', but repr(1.33) yields '1.3300000000000001'.
A accurate string representation of a floating point number produces a variable length string.
Because numpy arrays consist of elements that are all the same size, numpy requires you to specify the length of the strings within the array when you're using string arrays.
If you use x.astype('str'), it will always convert things to an array of strings of length 1.
For example, using x = np.array(1.344566), x.astype('str') yields '1'!
You need to be more explict and use the '|Sx' dtype syntax, where x is the length of the string for each element of the array.
For example, use x.astype('|S10') to convert the array to strings of length 10.
Even better, just avoid using numpy arrays of strings altogether. It's usually a bad idea, and there's no reason I can see from your description of your problem to use them in the first place...
If you have an array of numbers and you want an array of strings, you can write:
strings = ["%.2f" % number for number in numbers]
If your numbers are floats, the array would be an array with the same numbers as strings with two decimals.
>>> a = [1,2,3,4,5]
>>> min_a, max_a = min(a), max(a)
>>> a_normalized = [float(x-min_a)/(max_a-min_a) for x in a]
>>> a_normalized
[0.0, 0.25, 0.5, 0.75, 1.0]
>>> a_strings = ["%.2f" % x for x in a_normalized]
>>> a_strings
['0.00', '0.25', '0.50', '0.75', '1.00']
Notice that it also works with numpy arrays:
>>> a = numpy.array([0.0, 0.25, 0.75, 1.0])
>>> print ["%.2f" % x for x in a]
['0.00', '0.25', '0.50', '0.75', '1.00']
A similar methodology can be used if you have a multi-dimensional array:
new_array = numpy.array(["%.2f" % x for x in old_array.reshape(old_array.size)])
new_array = new_array.reshape(old_array.shape)
Example:
>>> x = numpy.array([[0,0.1,0.2],[0.3,0.4,0.5],[0.6, 0.7, 0.8]])
>>> y = numpy.array(["%.2f" % w for w in x.reshape(x.size)])
>>> y = y.reshape(x.shape)
>>> print y
[['0.00' '0.10' '0.20']
['0.30' '0.40' '0.50']
['0.60' '0.70' '0.80']]
If you check the Matplotlib example for the function you are using, you will notice they use a similar methodology: build empty matrix and fill it with strings built with the interpolation method. The relevant part of the referenced code is:
colortuple = ('y', 'b')
colors = np.empty(X.shape, dtype=str)
for y in range(ylen):
for x in range(xlen):
colors[x, y] = colortuple[(x + y) % len(colortuple)]
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, facecolors=colors,
linewidth=0, antialiased=False)
Let's see if I can address some of the confusion I'm seeing in the comments.
Make an array:
In [609]: x=np.arange(5)
In [610]: x
Out[610]: array([0, 1, 2, 3, 4])
In [611]: x.dtype
Out[611]: dtype('int32')
The default for arange is to make an int32.
astype is an array method; it can used on any array:
In [612]: x.astype(np.float32)
Out[612]: array([ 0., 1., 2., 3., 4.], dtype=float32)
arange also takes a dtype parameter
In [614]: np.arange(5, dtype=np.float32)
Out[614]: array([ 0., 1., 2., 3., 4.], dtype=float32)
whether it created the int array first and converted it, or made the float32 directly isn't any concern to me. This is a basic operation, done in compiled code.
I can also give it a float stop value, in which case it will give me a float array - the default float type.
In [615]: np.arange(5.0)
Out[615]: array([ 0., 1., 2., 3., 4.])
In [616]: _.dtype
Out[616]: dtype('float64')
zeros is similar; the default dtype is float64, but with a parameter I can change that. Since its primary task with to allocate memory, and it doesn't have to do any calculation, I'm sure it creates the desired dtype right away, without further conversion. But again, this is compiled code, and I shouldn't have to worry about what it is doing under the covers.
In [618]: np.zeros(5)
Out[618]: array([ 0., 0., 0., 0., 0.])
In [619]: _.dtype
Out[619]: dtype('float64')
In [620]: np.zeros(5,dtype=np.float32)
Out[620]: array([ 0., 0., 0., 0., 0.], dtype=float32)
randn involves a lot of calculation, and evidently it is compiled to work with the default float type. It does not take a dtype. But since the result is an array, it can be cast with astype.
In [623]: np.random.randn(3)
Out[623]: array([-0.64520949, 0.21554705, 2.16722514])
In [624]: _.dtype
Out[624]: dtype('float64')
In [625]: __.astype(np.float32)
Out[625]: array([-0.64520949, 0.21554704, 2.16722512], dtype=float32)
Let me stress that astype is a method of an array. It takes the values of the array and produces a new array with the desire dtype. It does not act retroactively (or in-place) on the array itself, or on the function that created that array.
The effect of astype is often (always?) the same as a dtype parameter, but the sequence of actions is different.
In https://stackoverflow.com/a/39625960/901925 I describe a sparse matrix creator that takes a dtype parameter, and implements it with an astype method call at the end.
When you do calculations such as dot or *, it tries to match the output dtype with inputs. In the case of mixed types it goes with the higher precision alternative.
In [642]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float64)
Out[642]: array([ 0., 1., 4., 9., 16.])
In [643]: _.dtype
Out[643]: dtype('float64')
In [644]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float32)
Out[644]: array([ 0., 1., 4., 9., 16.], dtype=float32)
There are casting rules. One way to look those up is with can_cast function:
In [649]: np.can_cast(np.float64,np.float32)
Out[649]: False
In [650]: np.can_cast(np.float32,np.float64)
Out[650]: True
It is possible in some calculations that it will cast the 32 to 64, do the calculation, and then cast back to 32. The purpose would be to avoid rounding errors. But I don't know how you find that out from the documentation or tests.
arr1 = np.array([25, 56, 12, 85, 34, 75])
arr2 = np.array([42, 3, 86, 32, 856, 46])
arr1.astype(np.complex)
print (arr1)
print(type(arr1[0]))
print(arr1.astype(np.complex))
arr2 = np.array(arr2,dtype='complex')
print(arr2)
print(type(arr2[0]))
OUTPUT for above
[25 56 12 85 34 75]
<class 'numpy.int64'>
[25.+0.j 56.+0.j 12.+0.j 85.+0.j 34.+0.j 75.+0.j]
[ 42.+0.j 3.+0.j 86.+0.j 32.+0.j 856.+0.j 46.+0.j]
<class 'numpy.complex128'>
It can be seen that astype changes the type temporally as we do in normal type casting but where as the generic method changes the type permanently