pandas groupby transform multiple columns

Pass Multiple Columns to groupby.transform

stackoverflow.com › questions › 19619082 › pass-multiple-columns-to-groupby-transform

You can use apply() method:

import numpy as np
import pandas as pl
np.random.seed(0)

people2 = pd.DataFrame(np.random.randn(5, 5), 
                      columns=['a', 'b', 'c', 'd', 'e'], 
                      index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']

Grouped = people2.groupby(key)

def f(df):
    df["f"] = (df.a.mean() - df.b.mean())*df.c
    return df

people2 = Grouped.apply(f)
print people2

If you want some generalize method:

Grouped = people2.groupby(key)

def f(a, b, c, **kw):
    return (a.mean() - b.mean())*c

people2["f"] = Grouped.apply(lambda df:f(**df))
print people2

Answer from HYRY on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 19619082 › pass-multiple-columns-to-groupby-transform

python - Pass Multiple Columns to groupby.transform - Stack Overflow

Top answer

1 of 2

You can use apply() method:

import numpy as np
import pandas as pl
np.random.seed(0)

people2 = pd.DataFrame(np.random.randn(5, 5), 
                      columns=['a', 'b', 'c', 'd', 'e'], 
                      index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']

Grouped = people2.groupby(key)

def f(df):
    df["f"] = (df.a.mean() - df.b.mean())*df.c
    return df

people2 = Grouped.apply(f)
print people2

If you want some generalize method:

Grouped = people2.groupby(key)

def f(a, b, c, **kw):
    return (a.mean() - b.mean())*c

people2["f"] = Grouped.apply(lambda df:f(**df))
print people2

2 of 2

This is based upon the answer provided by HYRY (thanks) who made me see how this could be achieved. My version does nothing more than generalise the function and enter the arguments of the function when it is called. I think though the function has to be called with a lambda:

import pandas as pd
import numpy as np
people = DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe',         'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']
people['f'] = ""
Grouped = people.groupby(key)

def FUNC(df, col1, col2, col3, col4):
    df[col1] = (df[col2].mean() - df[col3].mean())*df[col4]
    return df

people2 = Grouped.transform(lambda x: FUNC(x, 'f', 'a', 'b', 'c'))

This appears to me to be the best way I have seen of doing this... Basically the entire grouped data frame is passed to the function as x, and then columns can be called as arguments.

Google Groups

groups.google.com › g › pydata › c › CFrBmyo81CU

data.table like group by transform with multiple columns

Is this something that could be added to the pandas api easily and if so how? ... Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message ... I don't quite get the interest in one-lining everything -- sometimes that makes things clearer, and often it doesn't -- but in your particular case something like · In [47]: test["new"] = test.groupby((test.x + test.y) % 2)[["x","y"]].transform(sum).sum(axis=1)

Videos

37:12

YouTube

Advanced Use of groupby(), aggregate, filter, transform, apply ...

May 26, 2020

10:40

YouTube

Pandas groupby(): split-apply-combine - Transform - YouTube

Pandas Transform - Merging GroupBy Results Back into Your DataFrame ...

October 28, 2018

View all

Pandas

pandas.pydata.org › docs › user_guide › groupby.html

Group by: split-apply-combine — pandas 3.0.1 documentation

By using DataFrameGroupBy.ngroup(), we can extract information about the groups in a way similar to factorize() (as described further in the reshaping API) but which applies naturally to multiple columns of mixed type and different sources. This can be useful as an intermediate categorical-like step in processing, when the relationships between the group rows are more important than their content, or as input to an algorithm which only accepts the integer encoding. (For more information about support in pandas for full categorical data, see the Categorical introduction and the API documentation.)

Julia Programming Language

discourse.julialang.org › new to julia

Way to return multiple columns after applying combine-groupby transformation - New to Julia - Julia Programming Language

Top answer

1 of 1

My preferred approach: using Statistics df |> x -> groupby(x, :y) |> x -> subset(x, :x => (x -> x .== maximum(x))) 2×4 DataFrame Row │ y x a b │ Int64 Int64 Int64 Int64 ─────┼──────────────────────────── 1 │ 1 4 1 2 2 │ 2 6 …

GeeksforGeeks

geeksforgeeks.org › pandas › pandas-group-by-multiple-columns

Pandas Group by Multiple Columns - GeeksforGeeks

July 23, 2025 - To group by multiple columns, you simply pass a list of column names to the groupby() function. ... Consider the following dataset. We will group by Category and Subcategory, and then calculate the sum of the Sales column.

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas groupby transform

Pandas Groupby Transform - Spark By {Examples}

June 18, 2025 - Pandas groupby transform works on just one Series at a time and groupby apply() works on the entire DataFrame at once. This means you can not access multiple columns while using the groupBy transform function.

Medium

medium.com › @heyamit10 › mastering-pandas-groupby-multiple-columns-92411999779f

Mastering pandas groupby Multiple Columns | by Hey Amit | Medium

April 12, 2025 - You might be wondering what makes the groupby function in pandas so special. Simple enough, it lets you group data based on one or more columns, aggregating similar values into summarized forms. This is incredibly useful when working with complex datasets where you need insights derived from grouped data points. ... You might be asking yourself, “Why not just group by one column?” While grouping by a single column is perfectly fine, juggling multiple columns can give you a finer level of detail and richer insights.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.groupby.html

pandas.DataFrame.groupby — pandas 3.0.2 documentation

When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Find elsewhere

Google Bing Mojeek

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.transform.html

pandas.DataFrame.transform — pandas 3.0.2 documentation

>>> df = pd.DataFrame( ... { ... "c": [1, 1, 1, 2, 2, 2, 2], ... "type": ["m", "n", "o", "m", "m", "n", "n"], ... } ... ) >>> df c type 0 1 m 1 1 n 2 1 o 3 2 m 4 2 m 5 2 n 6 2 n >>> df["size"] = df.groupby("c")["type"].transform(len) >>> df c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4

GitHub

github.com › pandas-dev › pandas › issues › 24267

Feature-Request: Allow lambda function with different columns in transform · Issue #24267 · pandas-dev/pandas

December 13, 2018 - Currently, if you want to create ... custom function, you have to create intermediate dataframes since transform() cannot work with multiple columns at once. import pandas as pd df = pd.DataFrame({'a':[1,2,3,4,5,6], ...

Author harlecin

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas groupby multiple columns explained

Pandas GroupBy Multiple Columns Explained - Spark By {Examples}

June 27, 2025 - How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns

Stack Overflow

stackoverflow.com › questions › 53212490 › pandas-groupby-transform-and-multiple-columns › 53212836

python - Pandas groupby + transform and multiple columns - Stack Overflow

Top answer

1 of 2

for this particular case you could do:

g = df.groupby(['c', 'd'])

df['e'] = g.a.transform('sum') + g.b.transform('sum')

df
# outputs

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22

if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.

otherwise, you'd use a groupby-apply and then merge back to the original df.

example:

_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.

2 of 2

You can use GroupBy + transform with sum twice:

df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)

print(df)

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22

Stack Overflow

stackoverflow.com › questions › 54389006 › pandas-how-to-transform-all-numeric-columns-of-a-data-frame-into-logarithms

python - pandas: How to transform all numeric columns of a data frame into logarithms - Stack Overflow

Top answer

1 of 4

Supposed you have a dataframe named df

You can first make a list of possible numeric types, then just do a loop

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
for c in [c for c in df.columns if df[c].dtype in numerics]:
    df[c] = np.log10(df[c])

Or, a one-liner solution with lambda operator and np.dtype.kind

numeric_df = df.apply(lambda x: np.log10(x) if np.issubdtype(x.dtype, np.number) else x)

2 of 4

If most columns are numeric it might make sense to just try it and skip the column if it does not work:

for column in df.columns:
    try:
        df[column] = np.log10(df[column])
    except (ValueError, AttributeError):
        pass

If you want to you could wrap it in a function, of course.

If all columns are numeric, you can even simply do

df_log10 = np.log10(df)

Datasciencebyexample

datasciencebyexample.com › 2023 › 02 › 14 › sklearn-columntransformer-one-columns-to-many-columns

Transforming One or More Columns of a Pandas DataFrame using ColumnTransformer | DataScienceTribe

February 14, 2023 - In this blog post, we’ll demonstrate how to use a custom transformer with scikit-learn’s ColumnTransformer to transform one or more columns of a Pandas DataFrame.

scikit-learn

scikit-learn.org › stable › modules › generated › sklearn.compose.ColumnTransformer.html

ColumnTransformer — scikit-learn 1.8.0 documentation

Applies transformers to columns of an array or pandas DataFrame.

Saturn Cloud

saturncloud.io › blog › how-to-simultaneously-melt-multiple-columns-in-python-pandas

How to Simultaneously Melt Multiple Columns in Python Pandas | Saturn Cloud Blog

August 25, 2023 - In this blog post, we will learn how to simultaneously melt multiple columns in Python Pandas. Melting data is the process of transforming wide data into long data by unpivoting columns into rows.

GeeksforGeeks

geeksforgeeks.org › add-multiple-columns-to-dataframe-in-pandas

Add multiple columns to dataframe in Pandas - GeeksforGeeks

October 3, 2022 - Although insert takes single column name, value as input, but we can use it repeatedly to add multiple columns to the DataFrame. ... # importing pandas library import pandas as pd # creating and initializing a nested list students = [['jackma', 34, 'Sydeny', 'Australia'], ['Ritika', 30, 'Delhi', 'India'], ['Vansh', 31, 'Delhi', 'India'], ['Nany', 32, 'Tokyo', 'Japan'], ['May', 16, 'New York', 'US'], ['Michael', 17, 'las vegas', 'US']] # Create a DataFrame object df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Country'], index=['a', 'b', 'c', 'd', 'e', 'f']) # creating columns 'Age' and 'ID' at # 2nd and 3rd position using # dataframe.insert() function df.insert(2, "Marks", [90, 70, 45, 33, 88, 77], True) df.insert(3, "ID", [101, 201, 401, 303, 202, 111], True) # Displaying the Data frame df

Statology

statology.org › home › how to convert pandas dataframe columns to int

How to Convert Pandas DataFrame Columns to int

November 28, 2022 - This tutorial explains how to convert pandas DataFrame columns to integer type, including several examples.

GitHub

github.com › pandas-dev › pandas › issues › 342

Enable easier transformations of multiple columns in DataFrame · Issue #342 · pandas-dev/pandas

November 7, 2011 - things like df[cols] = transform(df[cols]) should be possible in a mixed-type DataFrmae, per the mailing list discussion

Author wesm

R-bloggers

r-bloggers.com › r bloggers › convert multiple columns to numeric in r

Convert Multiple Columns to Numeric in R | R-bloggers

July 9, 2022 - We may use the following code to only numeric the columns for assists and rebounds.