Shape relates to the size of the dimensions of an N-dimensional array.
Size regarding arrays, relates to the amount (or count) of elements that are contained in the array (or sometimes, at the top dimension of the array - when used as length).
For example, let a be a matrix
1 2 3 4
5 6 7 8
9 10 11 12
the shape of a is (3, 4), the size of a is 12 and the size of a[1] is 4.
hey, im doing an ai thing in school and my code didnt work as expected, and after 5 hours i found out i reshaped an array from (206,) to (206,1) and that made the results wrong. and from what i understand, the shape means the length of each dimension, and length is not 0 indexed so a size of 1 would be equal to just 1D no?
I read that in python a tuple that ends with a comma is different than a tuple that ends with a number. So if a numpy array has shape (100,) is that different than (100,1)? Or is it arbitrary?
Videos
Shape relates to the size of the dimensions of an N-dimensional array.
Size regarding arrays, relates to the amount (or count) of elements that are contained in the array (or sometimes, at the top dimension of the array - when used as length).
For example, let a be a matrix
1 2 3 4
5 6 7 8
9 10 11 12
the shape of a is (3, 4), the size of a is 12 and the size of a[1] is 4.
Because you are working with a numpy array, which was seen as a C array, size refers to how big your array will be. Moreover, if you can pass np.zeros(10) or np.zeros((10)). While the difference is subtle, size passed this way will create you a 1D array. You can give size=(n1, n2, ..., nn) which will create an nD array.
However, because python users want multi-dimensional arrays, array.reshape allows you to get from 1D to an nD array. So, when you call shape, you get the N dimension shape of the array, so you can see exactly how your array looks like.
In essence, size is equal to the product of the elements of shape.
EDIT: The difference in name can be attributed to 2 parts: firstly, you can initialise your array with a size. However, you do not know the shape of it. So size is only for total number of elements. Secondly, how numpy was developed, different people worked on different parts of the code, giving different names to roughly the same element, depending on their personal vision for the code.
Is there a reasoning (besides legacy) that NumPy is inconsistent with "size" and "shape"? Or, am I missing some underlying logic.
For example, If I have a multi-dimensional array X and I want to make a random array the same dimensions, I would do:
np.random.uniform(size=X.shape)
But conversely, if you look at the signature of np.ones, it calles the first argument shape. I do not know about you, I but often get confused if the keyword is size or shape for any given function.
And then there is the easy-to-mess-up np.array.resize (in-place) and np.array.reshape (returns an array). And there is the non-object oriented version np.reshape which again returns an array but there is no np.resize that would modify the argument in-place.
(Personally, I do not like when functions modify an argument in place such as random.shuffle. A method should be the main way. But that is a different story)
Is there an underlying logic? Way to keep it straight? Reasoning?
Thanks!
Personally, I do not like when functions modify an argument in place such as random.shuffle. A method should be the main way.
which would necessarily require that a run of the mill list or any other mutable container maintaining order has to acknowledge the existence of random and pull it as a dependency. And then what happens when somebody writes betterrandom package with shuffling, and then evenmoreawesomerandom package with shuffling?
Probably just legacy, as you surmise.
And then there is the easy-to-mess-up np.array.resize (in-place) and np.array.reshape (returns an array). And there is the non-object oriented version np.reshape which again returns an array but there is no np.resize that would modify the argument in-place.
I don't know why you are saying np.reshape is non-object oriented. That doesn't even have a clear meaning.
np.array... are array functions so they understand how arrays are built. np.reshape on the other hand works on "array like" objects that roughly speaking just need to be a certain flavour of iterable. Obviously np.arrays themselves are "array like" but so are Python buffers, lists, etc, including, potentially, user made objects that numpy has never seen before. Anything that conforms to the "array like" interface. Now if you have such an iterable, and you know what an array is, then you can easily take the contents of the iterable and put them into a new array in any shape that fits. No? But now if you are asked to modify the iterable itself - where do you start? Is the iterable contiguous in memory? Or is it some kind of linked list? Or what? In fact you do not have one single clue where even one single element is in memory. So, how do you propose to modify this iterable in place?
(Personally, I do not like when functions modify an argument in place such as random.shuffle. A method should be the main way. But that is a different story)
Yes, this is a useful convention...but doing shuffling in place is also a convention. Sometimes there is not any "correct" way to do things, just trade offs.
And if so, which is better to use?
Whenever I make an operation that produces a 1D matrix (Sum the columns of a matrix, for instance) numpy will set its dimensions to (N,). If I'm making multiple operations, a multiplication by a (1,M) matrix for example, I would get the error "matrices are not aligned". What's the point of creating that type of "array" if you'll mostly all the time need to use reshape or expand_dims? To me it just seems like an unnecessary extra task, that I would not need to do in Matlab.
Why is there no standard for typing array dimensions? In data science, it really usefull to indicate wether something is a vector or a matrix (or a tensor with more dimensions). One up in complexity, its usefull to indicate wether a function returns something with the same size or not.
Unless I am missing something, a standard for this is lacking. Of course I understand that typing is not enforced in python, and i am not aksing for this, i just want to make more readable functions. I think numpy and scipy 'solve' this by using the docstring. But would it make sense to specifiy array dimensions & sizes in the function signature?
I wouldn't worry about performance here - any differences should only be very marginal.
I'd say the more pythonic alternative is probably the one which matches your needs more closely:
a.shape may contain more information than len(a) since it contains the size along all axes whereas len only returns the size along the first axis:
>>> a = np.array([[1,2,3,4], [1,2,3,4]])
>>> len(a)
2
>>> a.shape
(2L, 4L)
If you actually happen to work with one-dimensional arrays only, than I'd personally favour using len(a) in case you explicitly need the array's size.
From the source code, it looks like shape basically uses len():
https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py
@property
def shape(self) -> Tuple[int, int]:
return len(self.index), len(self.columns)
def __len__(self) -> int:
return len(self.index)
Calling shape will attempt to run both dim calcs. So maybe df.shape[0] + df.shape[1] is slower than len(df.index) + len(df.columns). Still, performance-wise, the difference should be negligible except for a giant giant 2D dataframe.
So in line with the previous answers, df.shape is good if you need both dimensions, for a single dimension, len() seems more appropriate conceptually.
Looking at property vs method answers, it all points to usability and readability of code. So again, in your case, I would say if you want information about the whole dataframe just to check or for example to pass the shape tuple to a function, use shape. For a single column, including index (i.e. the rows of a df), use len().
I'm learning numpy and reading the documentation for np.reshape() the order parameter:
Read the elements of a using this index order, and place the elements into the reshaped array using this index order. โCโ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. โFโ means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest. Note that the โCโ and โFโ options take no account of the memory layout of the underlying array, and only refer to the order of indexing. โAโ means to read / write the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise.
Frankly this is like a foreign language to me, and I actually am coming from C as a programmer :?
I was hoping it would be like: if you have a 1d array C will put them in row-wise, and F will put them in column-wise. Obviously it can't be that simple as a general case, but is it that simple if I am reshaping a 1d array into a 2d array?
Anyone have a discussion that is simple at least for the simpler cases like that (I don't need a fully general discussion for nd to nd cases).
I have a 64x64 image with alpha layer, which when converted to NumPy array results in (64, 64, 4) shape. I am quite new to NumPy and have been having a hard time visualizing it. Below is the image of how I am visualizing it currently. Is this representation correct? If not can you please help me visualize it?
Image link
When do you choose to use Numpy instead of plain for loops and normal Python lists?
I come from HPC (c++/c/Fortran, etc), I can read and use other languages without issue. I like many parts of python, and its nice a lot of the time to not have to program everything from the ground up. I am not the biggest fan of weakly typed languages and python is slow, but it defiantly has a place and it does what it's designed to do pretty well.
However, the way numpy organises its array syntax and the general working of array is probably the most confusing poorly designed thing in the entire language, perhaps in the world of programming languages. I spend 80% of my programming time trying to get things into the right shape or work out what shape they are, and then use the correct functions to do extremely trivial appends/inserts/delete etc. Its is vague and honestly should be rewritten. There is honestly no defence of this poor syntax. There is a reason pretty much every other language deals with arrays in the same way... it makes sense.
I know that the hardcore python fanboys will tell me I am not doing things correctly, or lol, say "I must think 'pythonically'". BS, a good language does not need some gibberish word to defend the poor design.