Create all pairs combinations of columns names, loop and divide to new columns:
from itertools import combinations
for a, b in combinations(df.columns, 2):
df[f'{a}/{b}'] = df[a].div(df[b])
Or use list comprehension, join together by concat and add original columns by join:
df = df.join(pd.concat([df[a].div(df[b]).rename(f'{a}/{b}')
for a, b in combinations(df.columns, 2)], 1))
print (df)
x1 x2 x3 x4 x1/x2 x1/x3 x1/x4 x2/x3 x2/x4 x3/x4
0 4 7 1 5 0.571429 4.000000 0.800000 7.000000 1.400000 0.200000
1 5 8 3 3 0.625000 1.666667 1.666667 2.666667 2.666667 1.000000
2 4 9 5 6 0.444444 0.800000 0.666667 1.800000 1.500000 0.833333
3 5 4 7 9 1.250000 0.714286 0.555556 0.571429 0.444444 0.777778
4 5 2 1 2 2.500000 5.000000 2.500000 2.000000 1.000000 0.500000
5 4 3 0 4 1.333333 inf 1.000000 inf 0.750000 0.000000
Answer from jezrael on Stack OverflowHi guys,
Fairly new to python - i'll try to be clear as I can
I have a dummy DF with some columns of appointments in healthcare - Surgeon, Conversion (1/0 or Yes/No)
I want to create a new column with "Conversion Rate" i.e. what % of the Surgeon's appointments resulted in a Conversion (1/True)
In plain english that looks like - if surgeon matches the one highlighted, return sum (Conversion) / count (Surgeon) but i'm struggling to get that delivered + applied via a function.
Here's my rough stab below but any thoughts/advice appreciated!
I am using pandas to do this and I do have other columns so I need to create a new column only with this rather than any over-arching DF operations. So if the surgeons name appeared 17 times I want the conversion rate column to show the same rate for that surgeon each time.
def convrate(surgeon):
for surgeon in df['Surgeon']:
ConvPts = 0
Non-Conv== 0
if df['Preop'] == 1:
if df['Conversion'] == 1:
ConvPts += 1
else:
Non-Conv += 1
return ConvPts / (ConvPts + Non-Conv)
Create all pairs combinations of columns names, loop and divide to new columns:
from itertools import combinations
for a, b in combinations(df.columns, 2):
df[f'{a}/{b}'] = df[a].div(df[b])
Or use list comprehension, join together by concat and add original columns by join:
df = df.join(pd.concat([df[a].div(df[b]).rename(f'{a}/{b}')
for a, b in combinations(df.columns, 2)], 1))
print (df)
x1 x2 x3 x4 x1/x2 x1/x3 x1/x4 x2/x3 x2/x4 x3/x4
0 4 7 1 5 0.571429 4.000000 0.800000 7.000000 1.400000 0.200000
1 5 8 3 3 0.625000 1.666667 1.666667 2.666667 2.666667 1.000000
2 4 9 5 6 0.444444 0.800000 0.666667 1.800000 1.500000 0.833333
3 5 4 7 9 1.250000 0.714286 0.555556 0.571429 0.444444 0.777778
4 5 2 1 2 2.500000 5.000000 2.500000 2.000000 1.000000 0.500000
5 4 3 0 4 1.333333 inf 1.000000 inf 0.750000 0.000000
you can try :
df = pd.DataFrame({'x1':[1,2,3,4,5], 'x2': [10, 10, 10, 10, 10], 'x3' : [100, 100, 100 ,100, 100], 'x4': [10, 10, 10, 10, 10]})
columns = df.columns
def pattern(c = columns):
yield from ((v1, v2) for i, v1 in enumerate(c) for v2 in c[i + 1:])
for name1, name2 in pattern():
df[f'{name1}/{name2}'] = df[name1].div(df[name2])
output:

also, you can concatenate all your desired columns:
pd.concat([df[n1].div(df[n2]).rename(f'{n1}/{n2}') for n1, n2 in pattern()], 1)
output:

python - How to calculate ratio of values in a pandas dataframe column? - Stack Overflow
python 3.x - Pandas Group-By and Calculate Ratio of Two Columns - Stack Overflow
pandas - Adding new column in dataframe by taking ratio of two existing columns - Stack Overflow
python - How to find ratio of multiple columns in pandas? - Stack Overflow
We can take advantage of the way that Boolean values are handled mathematically (True being 1 and False being 0) and use 3 aggregation functions sum, count and mean per group (groupby aggregate). We can also take advantage of Named Aggregation to both create and rename the columns in one step:
df = (
df.groupby('playerId', as_index=False)
.agg(wins=('winner', 'sum'),
totalCount=('winner', 'count'),
winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100
df:
playerId wins totalCount winPct
0 1848 1 2 50.0
1 1988 0 2 0.0
2 3543 1 1 100.0
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'playerId': [1848, 1988, 3543, 1848, 1988],
'winner': [True, False, True, False, False]
})
In your case just do mean can yield the pct
out = df.groupby('playerId')['winner'].agg(['sum','count','mean'])
Out[22]:
sum count mean
playerId
1848 1 2 0.5
1988 0 2 0.0
3543 1 1 1.0
I think you need convert string columns to float or int, because their type is string (but looks like numbers):
Apple_farm['Good_apples'] = Apple_farm['Good_apples'].astype(float)
Apple_farm['Total_apples'] = Apple_farm['Total_apples'].astype(float)
Apple_farm['Good_apples'] = Apple_farm['Good_apples'].astype(int)
Apple_farm['Total_apples'] = Apple_farm['Total_apples'].astype(int)
Sample:
import pandas as pd
Good_apples = ["10", "20", "3", "7", "9"]
Total_apples = ["20", "80", "30", "70", "90"]
d = {"Good_apples": Good_apples, "Total_apples": Total_apples}
Apple_farm = pd.DataFrame(d)
print Apple_farm
Good_apples Total_apples
0 10 20
1 20 80
2 3 30
3 7 70
4 9 90
print Apple_farm.dtypes
Good_apples object
Total_apples object
dtype: object
print Apple_farm.at[0,'Good_apples']
10
print type(Apple_farm.at[0,'Good_apples'])
<type 'str'>
Apple_farm['Good_apples'] = Apple_farm['Good_apples'].astype(int)
Apple_farm['Total_apples'] = Apple_farm['Total_apples'].astype(int)
print Apple_farm.dtypes
Good_apples int32
Total_apples int32
dtype: object
print Apple_farm.at[0,'Good_apples']
10
print type(Apple_farm.at[0,'Good_apples'])
<type 'numpy.int32'>
Apple_farm['Perc_Good'] = (Apple_farm['Good_apples'] / Apple_farm['Total_apples']) *100
print Apple_farm
Good_apples Total_apples Perc_Good
0 10 20 50.0
1 20 80 25.0
2 3 30 10.0
3 7 70 10.0
4 9 90 10.0
This code may helps you :
revenue_per_countries = df.groupby(["Country"])["Amount"].sum().sort_values()
revenue_per_countries = pd.DataFrame(revenue_per_countries)
revenue_per_countries['percent'] = revenue_per_countries['Amount']/revenue_per_countries['Amount'].sum()*100
revenue_per_countries = revenue_per_countries.sort_values(by=['percent'], ascending=False)
revenue_per_countries = revenue_per_countries.head(15)
revenue_per_countries.head(15)
Looks like you want the ratio of y1 to the total instead. Use groupby + value_counts:
v = df.groupby('AAA').BBB.value_counts().unstack()
df['RATIO'] = df.AAA.map(v.y2 / (v.y2 + v.y1))
AAA BBB CCC DDD RATIO
0 x1 y1 t1 10 0.333333
1 x1 y1 t2 11 0.333333
2 x1 y2 t3 18 0.333333
3 x2 y2 t1 17 0.666667
4 x2 y2 t1 21 0.666667
5 x2 y1 t1 30 0.666667
To generalise for many groups, you may use
df['RATIO'] = df.AAA.map(v.y2 / v.sum(axis=1))
Using groupby + transform with a custom function:
def ratio(x):
counts = x.value_counts()
return counts['y2'] / counts.sum()
df['Ratio of BBB'] = df.groupby('AAA')['BBB'].transform(ratio)
print(df)
AAA BBB CCC DDD Ratio of BBB
0 x1 y1 t1 10 0.333333
1 x1 y1 t2 11 0.333333
2 x1 y2 t3 18 0.333333
3 x2 y2 t1 17 0.666667
4 x2 y2 t1 21 0.666667
5 x2 y1 t1 30 0.666667