Use value_counts with normalize=True:
df['gender'].value_counts(normalize=True) * 100
The result is a fraction in range (0, 1]. We multiply by 100 here in order to get the %.
Answer from coldspeed95 on Stack OverflowUse value_counts with normalize=True:
df['gender'].value_counts(normalize=True) * 100
The result is a fraction in range (0, 1]. We multiply by 100 here in order to get the %.
If you do not need to look M and F values other than gender column then, may be you can try using value_counts() and count() as following:
df = pd.DataFrame({'gender':['M','M','F', 'F', 'F']})
# Percentage calculation
(df['gender'].value_counts()/df['gender'].count())*100
Result:
F 60.0
M 40.0
Name: gender, dtype: float64
Or, using groupby:
(df.groupby('gender').size()/df['gender'].count())*100
python - Pandas percentage of total with groupby - Stack Overflow
python - Compute percentage for each row in pandas dataframe - Stack Overflow
python - How to calculate percentage with Pandas' DataFrame - Stack Overflow
Pandas, loop through each row of a column and append string to next column previous row
Can't test right now but something like this should work:
df = pd.read_excel('yourfile.xlsx')
for index, element in enumerate(df['PartID']):
try:
int(element)
except ValueError:
df['PartID'][index] = index + 1
df['Notes'][index - 1] += element
Here we iterate over the elements of the first column while also yielding their index in the column. The try block tests wether a given element is a number; if it's not, we change the element in that position to the index + 1 (since index starts from 0 but your column starts from 1), and add the element to the previous row (index - 1) of the Notes column.
More on reddit.comVideos
Update 2022-03
This answer by caner using transform looks much better than my original answer!
df['sales'] / df.groupby('state')['sales'].transform('sum')
Thanks to this comment by Paul Rougieux for surfacing it.
Original Answer (2014)
Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:
# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))
Returns:
sales
state office_id
AZ 2 16.981365
4 19.250033
6 63.768601
CA 1 19.331879
3 33.858747
5 46.809373
CO 1 36.851857
3 19.874290
5 43.273852
WA 2 34.707233
4 35.511259
6 29.781508
(This solution is inspired from this article https://pbpython.com/pandas_transform.html)
I find the following solution to be the simplest(and probably the fastest) using transformation:
Transformation: While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine. For such a transformation, the output is the same shape as the input.
So using transformation, the solution is 1-liner:
df['%'] = 100 * df['sales'] / df.groupby('state')['sales'].transform('sum')
And if you print:
print(df.sort_values(['state', 'office_id']).reset_index(drop=True))
state office_id sales %
0 AZ 2 195197 9.844309
1 AZ 4 877890 44.274352
2 AZ 6 909754 45.881339
3 CA 1 614752 50.415708
4 CA 3 395340 32.421767
5 CA 5 209274 17.162525
6 CO 1 549430 42.659629
7 CO 3 457514 35.522956
8 CO 5 280995 21.817415
9 WA 2 828238 35.696929
10 WA 4 719366 31.004563
11 WA 6 772590 33.298509
You can get the percentages of each column using a lambda function as follows:
>>> df.iloc[:, 3:].apply(lambda x: x / x.sum())
y191 y192 y193 y194 y195
0 0.527231 0.508411 0.490517 0.500544 0.480236
1 0.013305 0.014088 0.013463 0.013631 0.013713
2 0.316116 0.324405 0.341373 0.319164 0.323259
3 0.002006 0.002263 0.002678 0.003206 0.002872
4 0.141342 0.150833 0.151969 0.163455 0.179920
Your example does not have any duplicate values for val_code, so I'm unsure how you want your data to appear (i.e. show percent of total in column vs. total for each vval_code group.)
Ge the total for all the columns of interest and then add the percentage column:
In [35]:
total = np.sum(df.ix[:,'y191':].values)
df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100
df
Out[35]:
country_name country_code val_code y191 y192 \
0 United States of America 231 1 47052179 43361966
1 United States of America 231 1 1187385 1201557
2 United States of America 231 1 28211467 27668273
3 United States of America 231 1 179000 193000
4 United States of America 231 1 12613922 12864425
y193 y194 y195 percent
0 42736682 43196916 41751928 50.149471
1 1172941 1176366 1192173 1.363631
2 29742374 27543836 28104317 32.483447
3 233338 276639 249688 0.260213
4 13240395 14106139 15642337 15.743237
So np.sum will sum all the values:
In [32]:
total = np.sum(df.ix[:,'y191':].values)
total
Out[32]:
434899243
We then call .sum(axis=1)/total * 100 on the cols of interest to sum row-wise, divide by the total and multiply by 100 to get a percentage.
If indeed percentage of 10 is what you want, the simplest way is to adjust your intake of the data slightly:
>>> p = pd.DataFrame(a.items(), columns=['item', 'score'])
>>> p['perc'] = p['score']/10
>>> p
Out[370]:
item score perc
0 Test 2 1 0.1
1 Test 3 1 0.1
2 Test 1 4 0.4
3 Test 4 9 0.9
For real percentages, instead:
>>> p['perc']= p['score']/p['score'].sum()
>>> p
Out[427]:
item score perc
0 Test 2 1 0.066667
1 Test 3 1 0.066667
2 Test 1 4 0.266667
3 Test 4 9 0.600000
First, make the keys of your dictionary the index of you dataframe:
import pandas as pd
a = {'Test 1': 4, 'Test 2': 1, 'Test 3': 1, 'Test 4': 9}
p = pd.DataFrame([a])
p = p.T # transform
p.columns = ['score']
Then, compute the percentage and assign to a new column.
def compute_percentage(x):
pct = float(x/p['score'].sum()) * 100
return round(pct, 2)
p['percentage'] = p.apply(compute_percentage, axis=1)
This gives you:
score percentage
Test 1 4 26.67
Test 2 1 6.67
Test 3 1 6.67
Test 4 9 60.00
[4 rows x 2 columns]