I would first modify how you import the data to:
df = DataFrame(olympic_medal_counts).set_index('country_name')
I would then calculate a new column containing the sum of the rows for the toal number of medals per country.
df['medal total'] = df.sum(axis=1)
Results:
bronze gold silver medal total
country_name
Russian Fed. 9 13 11 33
Norway 10 11 5 26
Canada 5 10 10 25
United States 12 9 7 28
Netherlands 9 8 7 24
Germany 5 8 6 19
Switzerland 2 6 3 11
Belarus 1 5 0 6
Austria 5 4 8 17
France 7 4 4 15
Poland 1 4 1 6
China 2 3 4 9
Korea 2 3 3 8
Sweden 6 2 7 15
Czech Republic 2 2 4 8
Slovenia 4 2 2 8
Japan 3 1 4 8
Finland 1 1 3 5
Great Britain 2 1 1 4
Ukraine 1 1 0 2
Slovakia 0 1 0 1
Italy 6 0 2 8
Latvia 2 0 2 4
Australia 1 0 2 3
Croatia 0 0 1 1
Kazakhstan 1 0 0 1
Finally, subset the the DataFrame for rows with medal totals greater than or equal to 1 and find the average of the columns.
df[df['medal total'] >= 1].apply(np.mean)
Results:
bronze 3.807692
gold 3.807692
silver 3.730769
medal total 11.346154
This could also be accomplished in one line using:
df[ df.sum(axis=1) >= 1 ].apply(np.mean)
Answer from Andrew on Stack OverflowI would first modify how you import the data to:
df = DataFrame(olympic_medal_counts).set_index('country_name')
I would then calculate a new column containing the sum of the rows for the toal number of medals per country.
df['medal total'] = df.sum(axis=1)
Results:
bronze gold silver medal total
country_name
Russian Fed. 9 13 11 33
Norway 10 11 5 26
Canada 5 10 10 25
United States 12 9 7 28
Netherlands 9 8 7 24
Germany 5 8 6 19
Switzerland 2 6 3 11
Belarus 1 5 0 6
Austria 5 4 8 17
France 7 4 4 15
Poland 1 4 1 6
China 2 3 4 9
Korea 2 3 3 8
Sweden 6 2 7 15
Czech Republic 2 2 4 8
Slovenia 4 2 2 8
Japan 3 1 4 8
Finland 1 1 3 5
Great Britain 2 1 1 4
Ukraine 1 1 0 2
Slovakia 0 1 0 1
Italy 6 0 2 8
Latvia 2 0 2 4
Australia 1 0 2 3
Croatia 0 0 1 1
Kazakhstan 1 0 0 1
Finally, subset the the DataFrame for rows with medal totals greater than or equal to 1 and find the average of the columns.
df[df['medal total'] >= 1].apply(np.mean)
Results:
bronze 3.807692
gold 3.807692
silver 3.730769
medal total 11.346154
This could also be accomplished in one line using:
df[ df.sum(axis=1) >= 1 ].apply(np.mean)
I have just used the concept of R language in pandas to solve it and it works. Try this code under # your code here
sub_df = df[(df.gold >= 1) | (df.silver >= 1) | (df.bronze >= 1)] ### subsetting the data frame
avg_count = sub_df.mean(axis=0) ### axis 0 for column wise mean
return avg_count
In python 3 IDE (like pycharm) you should use
return print(avg_count)
then put the main function outside of the indentation to find the answer
avg_medal_count()
If you only want the mean of the weight column, select the column (which is a Series) and call .mean():
In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120
In [480]: df.loc[:, 'weight'].mean()
Out[480]: 0.83982437500000007
Try df.mean(axis=0) , axis=0 argument calculates the column wise mean of the dataframe so the result will be axis=1 is row wise mean so you are getting multiple values.