Shape relates to the size of the dimensions of an N-dimensional array.
Size regarding arrays, relates to the amount (or count) of elements that are contained in the array (or sometimes, at the top dimension of the array - when used as length).
For example, let a be a matrix
1 2 3 4
5 6 7 8
9 10 11 12
the shape of a is (3, 4), the size of a is 12 and the size of a[1] is 4.
Shape relates to the size of the dimensions of an N-dimensional array.
Size regarding arrays, relates to the amount (or count) of elements that are contained in the array (or sometimes, at the top dimension of the array - when used as length).
For example, let a be a matrix
1 2 3 4
5 6 7 8
9 10 11 12
the shape of a is (3, 4), the size of a is 12 and the size of a[1] is 4.
Because you are working with a numpy array, which was seen as a C array, size refers to how big your array will be. Moreover, if you can pass np.zeros(10) or np.zeros((10)). While the difference is subtle, size passed this way will create you a 1D array. You can give size=(n1, n2, ..., nn) which will create an nD array.
However, because python users want multi-dimensional arrays, array.reshape allows you to get from 1D to an nD array. So, when you call shape, you get the N dimension shape of the array, so you can see exactly how your array looks like.
In essence, size is equal to the product of the elements of shape.
EDIT: The difference in name can be attributed to 2 parts: firstly, you can initialise your array with a size. However, you do not know the shape of it. So size is only for total number of elements. Secondly, how numpy was developed, different people worked on different parts of the code, giving different names to roughly the same element, depending on their personal vision for the code.
hey, im doing an ai thing in school and my code didnt work as expected, and after 5 hours i found out i reshaped an array from (206,) to (206,1) and that made the results wrong. and from what i understand, the shape means the length of each dimension, and length is not 0 indexed so a size of 1 would be equal to just 1D no?
Difference between numpy shape(), size(), and itemsize() - Week 8: High dimensional array and creating numpy array
numpy: dimensions vs. shape - Stack Overflow
python - shape vs len for numpy array - Stack Overflow
What is numpy actually used for?
Videos
I wouldn't worry about performance here - any differences should only be very marginal.
I'd say the more pythonic alternative is probably the one which matches your needs more closely:
a.shape may contain more information than len(a) since it contains the size along all axes whereas len only returns the size along the first axis:
>>> a = np.array([[1,2,3,4], [1,2,3,4]])
>>> len(a)
2
>>> a.shape
(2L, 4L)
If you actually happen to work with one-dimensional arrays only, than I'd personally favour using len(a) in case you explicitly need the array's size.
From the source code, it looks like shape basically uses len():
https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py
@property
def shape(self) -> Tuple[int, int]:
return len(self.index), len(self.columns)
def __len__(self) -> int:
return len(self.index)
Calling shape will attempt to run both dim calcs. So maybe df.shape[0] + df.shape[1] is slower than len(df.index) + len(df.columns). Still, performance-wise, the difference should be negligible except for a giant giant 2D dataframe.
So in line with the previous answers, df.shape is good if you need both dimensions, for a single dimension, len() seems more appropriate conceptually.
Looking at property vs method answers, it all points to usability and readability of code. So again, in your case, I would say if you want information about the whole dataframe just to check or for example to pass the shape tuple to a function, use shape. For a single column, including index (i.e. the rows of a df), use len().