pd.shape will give you the number of rows and columns present in the dataFrame.
where, df.shape[0] will give you the total rows present in the dataFrame.
and, df.shape[1] will give you the number of columns present in the dataFrame.
Example:
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011'],
'Phrases':['I have a cool family', 'I like avocados', 'I would like to go to school']})
df
Out[26]:
Date Phrases
0 10/2/2011 I have a cool family
1 11/2/2011 I like avocados
2 12/2/2011 I would like to go to school
df.shape
Out[27]: (3, 2)
df.shape[0] #number of rows
Out[28]: 3
df.shape[1] #number of columns
Out[29]: 2
Answer from Suhas Mucherla on Stack Overflowpd.shape will give you the number of rows and columns present in the dataFrame.
where, df.shape[0] will give you the total rows present in the dataFrame.
and, df.shape[1] will give you the number of columns present in the dataFrame.
Example:
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011'],
'Phrases':['I have a cool family', 'I like avocados', 'I would like to go to school']})
df
Out[26]:
Date Phrases
0 10/2/2011 I have a cool family
1 11/2/2011 I like avocados
2 12/2/2011 I would like to go to school
df.shape
Out[27]: (3, 2)
df.shape[0] #number of rows
Out[28]: 3
df.shape[1] #number of columns
Out[29]: 2
.shape returns a tuple (number of row, number of columns). Therefore dataset.shape[1] is the number of columns. i in range(dataset.shape[1]) simply iterates from 0 through the number of columns.
Videos
Get into an interactive Python session with numpy and pandas, and experiment
Make a dataframe:
In [394]: df=pd.DataFrame(np.eye(3))
In [395]: df
Out[395]:
0 1 2
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
Check its shape. That's a tuple (basic Python object):
In [396]: df.shape
Out[396]: (3, 3)
In [397]: df.shape[0] # first element of the tuple
Out[397]: 3
Repeat with the shape parameter is just like using the number 3:
In [398]: np.repeat('red', df.shape[0])
Out[398]: array(['red', 'red', 'red'], dtype='<U3')
Pandas and numpy are running in Python. So the regular evaluation order of Python applies.
This part (red_df.shape[0]) just to return an integer with the total number of rows in the red_df to create the new add column 'Color' with the same number of raws of its related red_df so, when we append it later with the white_df, it doesn't shift down the other white_df and creatw empty rows on the other columns.
You can simply delete this section and write it like this:
color_red = np.repeat('red', red_df.shape[0])
color_red = np.repeat('red', 1599)
Full program will be
import pandas as pd
import numpy as np
df_red = pd.read_csv('winequality-red.csv',sep=';')
df_white = pd.read_csv('winequality-white.csv',sep=';')
print(df_red.info())
print(df_red.shape[0])
# shape[0} refer to the number of columns which is 1599 shape[1] refer to the number of rows which is 12
# create color array for red dataframe
color_red = np.repeat('red', 1599)
# create color array for white dataframe
color_white = np.repeat('white', df_white.shape[0])
df_red['color'] = color_red
df_white['color'] = color_white
#combine data frame into one data frame called wine_df
wine_df = df_red.append(df_white)
print(wine_df.head())
wine_df.to_csv('winequality_edited.csv', index=False)