Get into an interactive Python session with numpy and pandas, and experiment
Make a dataframe:
In [394]: df=pd.DataFrame(np.eye(3))
In [395]: df
Out[395]:
0 1 2
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
Check its shape. That's a tuple (basic Python object):
In [396]: df.shape
Out[396]: (3, 3)
In [397]: df.shape[0] # first element of the tuple
Out[397]: 3
Repeat with the shape parameter is just like using the number 3:
In [398]: np.repeat('red', df.shape[0])
Out[398]: array(['red', 'red', 'red'], dtype='<U3')
Pandas and numpy are running in Python. So the regular evaluation order of Python applies.
Answer from hpaulj on Stack OverflowVideos
Get into an interactive Python session with numpy and pandas, and experiment
Make a dataframe:
In [394]: df=pd.DataFrame(np.eye(3))
In [395]: df
Out[395]:
0 1 2
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
Check its shape. That's a tuple (basic Python object):
In [396]: df.shape
Out[396]: (3, 3)
In [397]: df.shape[0] # first element of the tuple
Out[397]: 3
Repeat with the shape parameter is just like using the number 3:
In [398]: np.repeat('red', df.shape[0])
Out[398]: array(['red', 'red', 'red'], dtype='<U3')
Pandas and numpy are running in Python. So the regular evaluation order of Python applies.
This part (red_df.shape[0]) just to return an integer with the total number of rows in the red_df to create the new add column 'Color' with the same number of raws of its related red_df so, when we append it later with the white_df, it doesn't shift down the other white_df and creatw empty rows on the other columns.
You can simply delete this section and write it like this:
color_red = np.repeat('red', red_df.shape[0])
color_red = np.repeat('red', 1599)
Full program will be
import pandas as pd
import numpy as np
df_red = pd.read_csv('winequality-red.csv',sep=';')
df_white = pd.read_csv('winequality-white.csv',sep=';')
print(df_red.info())
print(df_red.shape[0])
# shape[0} refer to the number of columns which is 1599 shape[1] refer to the number of rows which is 12
# create color array for red dataframe
color_red = np.repeat('red', 1599)
# create color array for white dataframe
color_white = np.repeat('white', df_white.shape[0])
df_red['color'] = color_red
df_white['color'] = color_white
#combine data frame into one data frame called wine_df
wine_df = df_red.append(df_white)
print(wine_df.head())
wine_df.to_csv('winequality_edited.csv', index=False)
pd.shape will give you the number of rows and columns present in the dataFrame.
where, df.shape[0] will give you the total rows present in the dataFrame.
and, df.shape[1] will give you the number of columns present in the dataFrame.
Example:
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011'],
'Phrases':['I have a cool family', 'I like avocados', 'I would like to go to school']})
df
Out[26]:
Date Phrases
0 10/2/2011 I have a cool family
1 11/2/2011 I like avocados
2 12/2/2011 I would like to go to school
df.shape
Out[27]: (3, 2)
df.shape[0] #number of rows
Out[28]: 3
df.shape[1] #number of columns
Out[29]: 2
.shape returns a tuple (number of row, number of columns). Therefore dataset.shape[1] is the number of columns. i in range(dataset.shape[1]) simply iterates from 0 through the number of columns.