If I understand you correctly, you are describing the following:
Setup
d = {'id1': ['x', 'x'], 'id2': ['z','w'], 'metric': [100,10] }
df = pd.DataFrame(data=d)
df
Solution
# Manually choose the value by which to scale the column 'metric'
scaler = df.loc[(df['id1'] == 'x') & (df['id2'] == 'z'), 'metric'].values
# Divide all 'metric' values by the above scaler value
df['result'] = df['metric'] / scaler
df
id1 id2 metric result
0 x z 100 1.0
1 x w 10 0.1
Answer from Peter Leimbigler on Stack OverflowPandas - Creating new column with proportion/ratio from criteria/other cols
python - Calculating the ratio of two columns - Stack Overflow
python - How to calculate ratio of values in a pandas dataframe column? - Stack Overflow
pandas - Adding new column in dataframe by taking ratio of two existing columns - Stack Overflow
Hi guys,
Fairly new to python - i'll try to be clear as I can
I have a dummy DF with some columns of appointments in healthcare - Surgeon, Conversion (1/0 or Yes/No)
I want to create a new column with "Conversion Rate" i.e. what % of the Surgeon's appointments resulted in a Conversion (1/True)
In plain english that looks like - if surgeon matches the one highlighted, return sum (Conversion) / count (Surgeon) but i'm struggling to get that delivered + applied via a function.
Here's my rough stab below but any thoughts/advice appreciated!
I am using pandas to do this and I do have other columns so I need to create a new column only with this rather than any over-arching DF operations. So if the surgeons name appeared 17 times I want the conversion rate column to show the same rate for that surgeon each time.
def convrate(surgeon):
for surgeon in df['Surgeon']:
ConvPts = 0
Non-Conv== 0
if df['Preop'] == 1:
if df['Conversion'] == 1:
ConvPts += 1
else:
Non-Conv += 1
return ConvPts / (ConvPts + Non-Conv)
Create all pairs combinations of columns names, loop and divide to new columns:
from itertools import combinations
for a, b in combinations(df.columns, 2):
df[f'{a}/{b}'] = df[a].div(df[b])
Or use list comprehension, join together by concat and add original columns by join:
df = df.join(pd.concat([df[a].div(df[b]).rename(f'{a}/{b}')
for a, b in combinations(df.columns, 2)], 1))
print (df)
x1 x2 x3 x4 x1/x2 x1/x3 x1/x4 x2/x3 x2/x4 x3/x4
0 4 7 1 5 0.571429 4.000000 0.800000 7.000000 1.400000 0.200000
1 5 8 3 3 0.625000 1.666667 1.666667 2.666667 2.666667 1.000000
2 4 9 5 6 0.444444 0.800000 0.666667 1.800000 1.500000 0.833333
3 5 4 7 9 1.250000 0.714286 0.555556 0.571429 0.444444 0.777778
4 5 2 1 2 2.500000 5.000000 2.500000 2.000000 1.000000 0.500000
5 4 3 0 4 1.333333 inf 1.000000 inf 0.750000 0.000000
you can try :
df = pd.DataFrame({'x1':[1,2,3,4,5], 'x2': [10, 10, 10, 10, 10], 'x3' : [100, 100, 100 ,100, 100], 'x4': [10, 10, 10, 10, 10]})
columns = df.columns
def pattern(c = columns):
yield from ((v1, v2) for i, v1 in enumerate(c) for v2 in c[i + 1:])
for name1, name2 in pattern():
df[f'{name1}/{name2}'] = df[name1].div(df[name2])
output:

also, you can concatenate all your desired columns:
pd.concat([df[n1].div(df[n2]).rename(f'{n1}/{n2}') for n1, n2 in pattern()], 1)
output:

We can take advantage of the way that Boolean values are handled mathematically (True being 1 and False being 0) and use 3 aggregation functions sum, count and mean per group (groupby aggregate). We can also take advantage of Named Aggregation to both create and rename the columns in one step:
df = (
df.groupby('playerId', as_index=False)
.agg(wins=('winner', 'sum'),
totalCount=('winner', 'count'),
winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100
df:
playerId wins totalCount winPct
0 1848 1 2 50.0
1 1988 0 2 0.0
2 3543 1 1 100.0
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'playerId': [1848, 1988, 3543, 1848, 1988],
'winner': [True, False, True, False, False]
})
In your case just do mean can yield the pct
out = df.groupby('playerId')['winner'].agg(['sum','count','mean'])
Out[22]:
sum count mean
playerId
1848 1 2 0.5
1988 0 2 0.0
3543 1 1 1.0
How about:
user_count=df3.groupby('user_state')['user_count'].mean()
#(or however you think a value for each state should be calculated)
engaged_unique=df3.groupby('user_state')['engaged_count'].nunique()
engaged_pct=engaged_unique/user_count
(you could also do this in one line in a bunch of different ways)
Your original solution was almost fine except that you were dividing a value by the entire user count series. So you were getting a Series instead of a value. You could try this slight variation:
def f(x):
engaged_percent = x['engaged_count'].nunique()/x['user_count'].mean()
return engaged_percent
by = df3.groupby(['user_state']).apply(f)
by
I would just use groupby and apply directly
df3['engaged_percent'] = df3.groupby('user_state')
.apply(lambda s: s.engaged_count.nunique()/s.user_count).values
Demo
>>> df3
engaged_count user_count user_state
0 3 21 California
1 3 21 California
2 3 21 California
...
19 4 7 Florida
20 4 7 Florida
21 4 7 Florida
>>> df3['engaged_percent'] = df3.groupby('user_state').apply(lambda s: s.engaged_count.nunique()/s.user_count).values
>>> df3
engaged_count user_count user_state engaged_percent
0 3 21 California 0.095238
1 3 21 California 0.095238
2 3 21 California 0.095238
...
19 4 7 Florida 0.285714
20 4 7 Florida 0.285714
21 4 7 Florida 0.285714
After performing your groupby, use pct_change:
# Sort the DataFrame, if necessary.
df = df.sort_values(['name', 'order'])
# Use groupby and pcnt_change on the 'quantity' column.
df['quantity'] = df.groupby('name')['quantity'].pct_change()
The resulting output:
name order quantity
0 A 1 NaN
1 A 2 0.500000
2 A 3 -0.666667
3 B 1 NaN
4 B 2 2.000000
You could take your result and divide it by the shifted 'quantity' column in df:
diff_df.quantity = diff_df.quantity / df.quantity.shift(1)