polars groupby object has no attribute apply

AttributeError: 'DataFrame' object has no attribute 'group_by'

stackoverflow.com › questions › 77361799 › attributeerror-dataframe-object-has-no-attribute-group-by

On checking I found my polars version :

pl.__version__

0.17.3

https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.groupby.html

I need to do:

df.groupby("a").agg(pl.col("b").sum())  # there is no underscore in groupby

#output

shape: (3, 2)
a   b
str i64
"a" 2
"c" 3
"b" 5

and the document says :

Deprecated since version 0.19.0: This method has been renamed to DataFrame.group_by().

This is the new document for polars version 0.19

https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.group_by.html#polars-dataframe-group-by

Answer from Talha Tayyab on Stack Overflow

reddit.com › r/learnpython › why can’t i perform groupby commands in jupyter lab? (beginner)

r/learnpython on Reddit: Why can’t I perform groupby commands in Jupyter Lab? (Beginner)

April 10, 2025 -

I keep receiving this error: [AttributeError: ‘DataFrame’ object has no attribute ‘groupby’]. When I ask ChatGPT for guidance, the only response i get is that polars is out of date, which it definitely isn’t. Need to perform them for a university assignment, so any guidance would be appreciated, thanks!

Top answer

1 of 3

And this is why I don't trust LLMs. The first google search for "polars 'dataframe' object has no attribute 'groupby'" shows that this method was renamed to group_by() in polars 0.19: https://stackoverflow.com/questions/77361799/attributeerror-dataframe-object-has-no-attribute-group-by Meaning that ChatGPT was giving you the exact opposite of the correct answer: your polars isn't out of date for the method you're trying to call, it's too new.

2 of 3

If you're using Polars, you should probably go directly to their docs. https://docs.pola.rs/ If I put "groupby" into the top search bar, the first result is: https://docs.pola.rs/releases/upgrade/0.19/#groupby-renamed-to-group_by There is also a specially trained LLM available on the Python reference pages. See the "Ask AI" button in the bottom right corner. https://docs.pola.rs/api/python/stable/reference/index.html

GitHub

github.com › pola-rs › polars › issues › 16499 › linked_closing_reference

`next()` on GroupBy raises `AttributeError` object has no attribute `_current_index` · Issue #12868 · pola-rs/polars

December 2, 2023 - import polars as pl next(pl.DataFrame().group_by(1)) # AttributeError: 'GroupBy' object has no attribute '_current_index' No response · Not sure if next() is intended to work or not, it seems like it should raise a TypeError instead if it isn't. "Work" or return a TypeError?

Author pola-rs

Discussions

python - GroupBy, column selection and mean in Polars - Stack Overflow

Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com

stackoverflow.com

python - How to use group_by and apply a custom function with Polars? - Stack Overflow

I am breaking my head trying to figure out how to use group_by and apply a custom function using Polars. Coming from Pandas, I was using: import polars as pl import pandas as pd from scipy.stats im... More on stackoverflow.com

stackoverflow.com

python - Sample from each group in polars dataframe? - Stack Overflow

That approximate solution is just the same as sampling the whole dataframe and doing a groupby after. No good. ... @creanion someone who didn't know better must have edited it to add it. It was tagged as python-polars only, originally. More on stackoverflow.com

stackoverflow.com

Error on LazyFrame when using unique().groupby()

Polars version checks I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of Polars. Issue description Running a unique() on a LazyF... More on github.com

github.com

March 15, 2023

Stack Overflow

stackoverflow.com › questions › 77361799 › attributeerror-dataframe-object-has-no-attribute-group-by

python - AttributeError: 'DataFrame' object has no attribute 'group_by' - Stack Overflow

Top answer

1 of 1

On checking I found my polars version :

pl.__version__

0.17.3

https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.groupby.html

I need to do:

df.groupby("a").agg(pl.col("b").sum())  # there is no underscore in groupby

#output

shape: (3, 2)
a   b
str i64
"a" 2
"c" 3
"b" 5

and the document says :

Deprecated since version 0.19.0: This method has been renamed to DataFrame.group_by().

This is the new document for polars version 0.19

https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.group_by.html#polars-dataframe-group-by

Polars

docs.pola.rs › py-polars › html › reference › dataframe › api › polars.DataFrame.group_by.html

polars.DataFrame.group_by — Polars documentation

Object which can be used to perform aggregations.

Polars

docs.pola.rs › py-polars › html › reference › dataframe › group_by.html

GroupBy — Polars documentation

This namespace is available after calling DataFrame.group_by(...) · Allows iteration over the groups of the group by operation

Stack Overflow

stackoverflow.com › questions › 77845848 › groupby-column-selection-and-mean-in-polars

python - GroupBy, column selection and mean in Polars - Stack Overflow

Top answer

1 of 1

You were close

df.group_by(['X', 'Y'], maintain_order=True).agg(pl.col('Z').mean())

Stack Overflow

stackoverflow.com › questions › 69575496 › how-to-use-group-by-and-apply-a-custom-function-with-polars

python - How to use group_by and apply a custom function with Polars? - Stack Overflow

Top answer

1 of 2

Polars has the pl.corr() function which supports method="spearman"

If you want to use a custom function you could do it like this:

Custom function on multiple columns/expressions

Copyimport polars as pl
from typing import List
from scipy import stats

df = pl.DataFrame({
    "g": [1, 1, 1, 2, 2, 2, 5],
    "a": [2, 4, 5, 190, 1, 4, 1],
    "b": [1, 3, 2, 1, 43, 3, 1]
})

def get_score(args: List[pl.Series]) -> pl.Series:
    return pl.Series([stats.spearmanr(args[0], args[1]).correlation], dtype=pl.Float64)

(df.group_by("g", maintain_order=True)
 .agg(
    pl.map_groups(
        exprs=["a", "b"], 
        function=get_score).alias("corr")
 ))

Polars provided function

Copy(df.group_by("g", maintain_order=True)
 .agg(
     pl.corr("a", "b", method="spearman").alias("corr")
 ))

Both output:

Copyshape: (3, 2)
┌─────┬──────┐
│ g   ┆ corr │
│ --- ┆ ---  │
│ i64 ┆ f64  │
╞═════╪══════╡
│ 1   ┆ 0.5  │
│ 2   ┆ -1.0 │
│ 5   ┆ NaN  │
└─────┴──────┘

Custom function on a a single column/expression

We can also apply custom functions on single expressions, via .map_elements

Below is an example of how we can square a column with a custom function and with normal polars expressions. The expression syntax should always be preferred, as its a lot faster.

Copy(df.group_by("g")
 .agg(
     pl.col("a").map_elements(lambda group: group**2).alias("squared1"),
     (pl.col("a")**2).alias("squared2")
 ))

2 of 2

This seems to be a gap in the Polars API relative to pandas. While pandas is able to do grouped operations with arbitrary functions and return the result as a DataFrame with the groups, it seems .map_groups() gets no information about the groups and so this gets lost.

Here's an approach using a pl.DataFrame namespace:

Copyimport polars as pl
from collections.abc import Callable
from scipy.stats import spearmanr
 
df = pl.DataFrame({
    "era": [1, 1, 1, 2, 2, 2, 5],
    "prediction": [2, 4, 5, 190, 1, 4, 1],
    "target": [1, 3, 2, 1, 43, 3, 1]
})

def with_group_keys(fun: Callable[[pl.DataFrame], pl.DataFrame], by: list[str]):
    def wrapped(g: pl.DataFrame) -> pl.DataFrame:
        keys = g.select(by).row(0, named=True)
        res = fun(g)
        if not isinstance(res, pl.DataFrame):
            raise TypeError("fun(g) must return a Polars DataFrame")
        if res.height != 1:
            raise ValueError("fun(g) must return exactly one row per group")
        return pl.DataFrame({k: [keys[k]] for k in by}).hstack(res)
    return wrapped

@pl.api.register_dataframe_namespace("groups")
class EraPLNamespace:
    def __init__(self, df: pl.DataFrame):
        self._df = df

    def map(self, by: list[str], fun: Callable[[pl.DataFrame], pl.DataFrame]) -> pl.DataFrame:
        return self._df.group_by(*by).map_groups(with_group_keys(fun, by))

def get_score(g: pl.DataFrame) -> pl.DataFrame:
    return pl.DataFrame({"corr": [spearmanr(g["prediction"], g["target"]).correlation]})

# usage
out = df.groups.map(["era"], get_score)

out

era	corr
2	-1.0
1	0.5
5	NaN

Of course, a more direct answer to the specific question would be the following, but I assume OP might have been interested in the answer to a more general question.

Copycorrelations = df.group_by("era").agg(
    pl.corr("prediction", "target", method="spearman").alias("corr")
)

Polars

docs.pola.rs › api › python › version › 0.18 › reference › expressions › api › polars.Expr.apply.html

polars.Expr.apply — Polars documentation

return_dtype: PolarsDataType | None = None, *, skip_nulls: bool = True, pass_name: bool = False, strategy: ApplyStrategy = 'thread_local', ) → Self[source]# Apply a custom/user-defined function (UDF) in a GroupBy or Projection context. Warning · This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise. Depending on the context it has the following behavior: Selection ·

Find elsewhere

Google Bing Mojeek

Towards Data Science

towardsdatascience.com › home › latest › understanding groupby in polars dataframe by examples

Understanding GroupBy in Polars DataFrame by Examples | Towards Data Science

January 20, 2025 - As you would have noticed from ... you have to remember when using Polars is that you should avoid using the apply() function as it will affect performance of your query....

Polars

docs.pola.rs › py-polars › html › reference › dataframe › api › polars.dataframe.group_by.GroupBy.len.html

polars.dataframe.group_by.GroupBy.len — Polars documentation

>>> df = pl.DataFrame({"a": ["Apple", "Apple", "Orange"], "b": [1, None, 2]}) >>> df.group_by("a").len() shape: (2, 2) ┌────────┬─────┐ │ a ┆ len │ │ --- ┆ --- │ │ str ┆ u32 │ ╞════════╪═════╡ │ Apple ┆ 2 │ │ Orange ┆ 1 │ └────────┴─────┘ >>> df.group_by("a").len(name="n") shape: (2, 2) ┌────────┬─────┐ │ a ┆ n │ │ --- ┆ --- │ │ str ┆ u32 │ ╞════════╪═════╡ │ Apple ┆ 2 │ │ Orange ┆ 1 │ └────────┴─────┘

Polars

docs.pola.rs › py-polars › html › reference › dataframe › api › polars.DataFrame.group_by_dynamic.html

polars.DataFrame.group_by_dynamic — Polars documentation

When the group_by argument is given, polars can not check sortedness by the metadata and has to do a full scan on the index column to verify data is sorted. This is expensive. If you are sure the data within the groups is sorted, you can set this to False. Doing so incorrectly will lead to ...

Medium

medium.com › @tammy.diprima › dataframe-object-has-no-attribute-groupby-bf4d3df0921a

These are all really cool. But I got an error on the Polars example: 'DataFrame' object has no attribute 'groupby'. Should be: filtered.group_by("category") with an underscore. - Tammy Diprima - Medium

August 10, 2025 - These are all really cool. But I got an error on the Polars example: 'DataFrame' object has no attribute 'groupby'. Should be: filtered.group_by("category") with an underscore.

Polars

docs.pola.rs › py-polars › html › reference › dataframe › api › polars.DataFrame.groupby.html

polars.DataFrame.groupby — Polars documentation

Deprecated since version 0.19.0: This method has been renamed to DataFrame.group_by().

Polars

docs.pola.rs › api › python › version › 0.18 › reference › dataframe › api › polars.DataFrame.groupby_dynamic.html

polars.DataFrame.groupby_dynamic — Polars documentation

When the by argument is given, polars can not check sortedness by the metadata and has to do a full scan on the index column to verify data is sorted. This is expensive. If you are sure the data within the by groups is sorted, you can set this to False. Doing so incorrectly will lead to incorrect output ... Object you can call .agg on to aggregate by groups, the result of which will be sorted by index_column (but note that if by columns are passed, it will only be sorted within each by group).

Stack Overflow

stackoverflow.com › questions › 72633461 › sample-from-each-group-in-polars-dataframe

python - Sample from each group in polars dataframe? - Stack Overflow

Top answer

1 of 4

Let start with some dummy data:

n = 100
seed = 0

df = pl.DataFrame({
    "groups": (pl.int_range(n, eager=True) % 5).shuffle(seed=seed),
    "values": pl.int_range(n, eager=True).shuffle(seed=seed)
})

shape: (100, 2)
┌────────┬────────┐
│ groups ┆ values │
│ ---    ┆ ---    │
│ i64    ┆ i64    │
╞════════╪════════╡
│ 0      ┆ 55     │
│ 0      ┆ 40     │
│ 2      ┆ 57     │
│ 4      ┆ 99     │
│ 4      ┆ 4      │
│ …      ┆ …      │
│ 0      ┆ 90     │
│ 2      ┆ 87     │
│ 1      ┆ 96     │
│ 3      ┆ 43     │
│ 4      ┆ 44     │
└────────┴────────┘

This gives us 100 / 5, is 5 groups of 20 elements. Let's verify that:

df.group_by("groups").agg(pl.len())

shape: (5, 2)
┌────────┬─────┐
│ groups ┆ len │
│ ---    ┆ --- │
│ i64    ┆ u32 │
╞════════╪═════╡
│ 0      ┆ 20  │
│ 4      ┆ 20  │
│ 2      ┆ 20  │
│ 3      ┆ 20  │
│ 1      ┆ 20  │
└────────┴─────┘

Sample our data

Now we are going to use a window function to take a sample of our data.

df.filter(
    pl.int_range(pl.len()).shuffle().over("groups") < 10
)

shape: (50, 2)
┌────────┬────────┐
│ groups ┆ values │
│ ---    ┆ ---    │
│ i64    ┆ i64    │
╞════════╪════════╡
│ 0      ┆ 55     │
│ 2      ┆ 57     │
│ 4      ┆ 99     │
│ 4      ┆ 4      │
│ 1      ┆ 81     │
│ …      ┆ …      │
│ 2      ┆ 22     │
│ 1      ┆ 76     │
│ 3      ┆ 98     │
│ 0      ┆ 90     │
│ 4      ┆ 44     │
└────────┴───────┘

For every group in over("group") the pl.int_range(pl.len()) expression creates an index row. We then shuffle that range so that we take a sample and not a slice. Then we only want to take the index values that are lower than 10. This creates a boolean mask that we can pass to the filter method.

2 of 4

This worked better for me:

sampled_df = pl.concat(
    df.sample(fraction=0.001) for df in 
    df.partition_by(["column"], include_key=True)
)

The problem with .agg(pl.col("column").sample(2) was that it seemed to select different values for each column. What I needed was randomly selected rows.

GitHub

github.com › pola-rs › polars › issues › 7578

Error on LazyFrame when using unique().groupby() · Issue #7578 · pola-rs/polars

March 15, 2023 - This does not happen on a DataFrame. import polars as pl df = pl.DataFrame({ "foo": ["0", "1", "2", "1", "2"], "bar": ["a", "a", "a", "b", "b"], }) # LazyFrame lazy_result = ( df.lazy() .unique() .groupby("bar").agg(pl.count()) ).collect() print(lazy_result) """ |-----|-------| | bar | count | |-----|-------| | a | 1 | |-----|-------| | b | 1 | |-----|-------| """ # DataFrame data_result = ( df .unique() .groupby("bar").agg(pl.count()) ) print(data_result) """ |-----|-------| | bar | count | |-----|-------| | a | 3 | |-----|-------| | b | 2 | |-----|-------| """

Author mhendrey

Stack Overflow

stackoverflow.com › questions › 77754961 › how-to-apply-a-lambda-in-polars-while-grouping-by-and-for-a-rolling-window

python - How to apply a lambda in Polars, while grouping by and for a rolling window - Stack Overflow

Top answer

1 of 2

There is a dedicated .rolling() method to perform the group_by/rolling operation.

You can then perform your calculations inside the .agg() context.

lookback_period = 5

window = dict(
   by = ("ticker", "timeframe"),               # group_by these columns
   index_column = pl.int_range(0, pl.count()), # a "row count" to use as the index
   period = f"{lookback_period}i"              # window "size"
)

df.rolling(**window).agg(
   pl.when(pl.count() == lookback_period)
     .then(
        (pl.col("close-LDPM")
         / (pl.col("close-LDPM").cum_count().reverse() + 1)).sum()
     )
)

shape: (21, 4)
┌────────┬───────────┬─────┬────────────┐
│ ticker ┆ timeframe ┆ int ┆ close-LDPM │
│ ---    ┆ ---       ┆ --- ┆ ---        │
│ str    ┆ str       ┆ i64 ┆ f64        │
╞════════╪═══════════╪═════╪════════════╡
│ ERIC   ┆ 1 W       ┆ 0   ┆ null       │
│ ERIC   ┆ 1 W       ┆ 1   ┆ null       │
│ ERIC   ┆ 1 W       ┆ 2   ┆ null       │
│ ERIC   ┆ 1 W       ┆ 3   ┆ null       │
│ ERIC   ┆ 1 W       ┆ 4   ┆ 26.295667  │
│ ERIC   ┆ 1 W       ┆ 5   ┆ 27.193     │
│ ERIC   ┆ 1 W       ┆ 6   ┆ 27.647833  │
│ ERIC   ┆ 1 W       ┆ 7   ┆ 25.616167  │
│ ERIC   ┆ 1 W       ┆ 8   ┆ 24.800667  │
│ ERIC   ┆ 1 W       ┆ 9   ┆ 22.096333  │
│ ERIC   ┆ 1 W       ┆ 10  ┆ 20.864333  │
│ ERIC   ┆ 1 W       ┆ 11  ┆ 20.517     │
│ ERIC   ┆ 1 W       ┆ 12  ┆ 20.660667  │
│ ERIC   ┆ 1 W       ┆ 13  ┆ 20.894167  │
│ ERIC   ┆ 1 W       ┆ 14  ┆ 21.4575    │
│ ERIC   ┆ 1 W       ┆ 15  ┆ 20.6175    │
│ ERIC   ┆ 1 W       ┆ 16  ┆ 20.2265    │
│ ERIC   ┆ 1 W       ┆ 17  ┆ 19.372     │
│ ERIC   ┆ 1 W       ┆ 18  ┆ 18.587833  │
│ ERIC   ┆ 1 W       ┆ 19  ┆ 17.988833  │
│ ERIC   ┆ 1 W       ┆ 20  ┆ 17.861     │
└────────┴───────────┴─────┴────────────┘

Notes

Then when/then condition is used to null out the smaller windows.

The reverse cum_count is one way to emulate the range() behaviour in your example.

df.rolling(**window).agg(
   value = pl.col("close-LDPM"),
   weight = pl.col("close-LDPM").cum_count().reverse() + 1
)

shape: (21, 5)
┌────────┬───────────┬─────┬─────────────────────────────────────┬─────────────────┐
│ ticker ┆ timeframe ┆ int ┆ value                               ┆ weight          │
│ ---    ┆ ---       ┆ --- ┆ ---                                 ┆ ---             │
│ str    ┆ str       ┆ i64 ┆ list[f64]                           ┆ list[u32]       │
╞════════╪═══════════╪═════╪═════════════════════════════════════╪═════════════════╡
│ ERIC   ┆ 1 W       ┆ 0   ┆ [10.87]                             ┆ [1]             │
│ ERIC   ┆ 1 W       ┆ 1   ┆ [10.87, 11.04]                      ┆ [2, 1]          │
│ ERIC   ┆ 1 W       ┆ 2   ┆ [10.87, 11.04, 11.36]               ┆ [3, 2, 1]       │
│ ERIC   ┆ 1 W       ┆ 3   ┆ [10.87, 11.04, 11.36, 11.01]        ┆ [4, 3, 2, 1]    │
│ ERIC   ┆ 1 W       ┆ 4   ┆ [10.87, 11.04, 11.36, 11.01, 12.07] ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 5   ┆ [11.04, 11.36, 11.01, 12.07, 12.44] ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 6   ┆ [11.36, 11.01, 12.07, 12.44, 12.38] ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 7   ┆ [11.01, 12.07, 12.44, 12.38, 10.06] ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 8   ┆ [12.07, 12.44, 12.38, 10.06, 10.12] ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 9   ┆ [12.44, 12.38, 10.06, 10.12, 8.1]   ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 10  ┆ [12.38, 10.06, 10.12, 8.1, 8.45]    ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 11  ┆ [10.06, 10.12, 8.1, 8.45, 9.05]     ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 12  ┆ [10.12, 8.1, 8.45, 9.05, 9.27]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 13  ┆ [8.1, 8.45, 9.05, 9.27, 9.51]       ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 14  ┆ [8.45, 9.05, 9.27, 9.51, 9.66]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 15  ┆ [9.05, 9.27, 9.51, 9.66, 8.49]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 16  ┆ [9.27, 9.51, 9.66, 8.49, 8.53]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 17  ┆ [9.51, 9.66, 8.49, 8.53, 7.96]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 18  ┆ [9.66, 8.49, 8.53, 7.96, 7.71]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 19  ┆ [8.49, 8.53, 7.96, 7.71, 7.65]      ┆ [5, 4, 3, 2, 1] │
│ ERIC   ┆ 1 W       ┆ 20  ┆ [8.53, 7.96, 7.71, 7.65, 7.77]      ┆ [5, 4, 3, 2, 1] │
└────────┴───────────┴─────┴─────────────────────────────────────┴─────────────────┘

Multiple columns

Assuming all columns follow a similar naming pattern we can:

select all close- columns by regex to process them all together.
use .name.map to extract the final part of the column name and add the _w suffix.
use regex again to select the newly created _w columns.

weighted_sums = (
   df.with_columns(pl.col("close-LDPM").reverse().alias("close-ABCD")) # add dummy column
     .rolling(**window).agg(
        pl.when(pl.count() == lookback_period)
          .then(
           (pl.col("^close-.+$") # select all `close-` columns
            / (pl.col("^close-.+$").cum_count().reverse() + 1)).sum()
        )
        .name.map(lambda col: col.rsplit('-', 1)[1] + "_w") # extract everything after last `-` and add `_w` suffix
   )
   .select("^.+_w$") # select all `_w` columns
)

shape: (21, 2)
┌───────────┬───────────┐
│ LDPM_w    ┆ ABCD_w    │
│ ---       ┆ ---       │
│ f64       ┆ f64       │
╞═══════════╪═══════════╡
│ null      ┆ null      │
│ null      ┆ null      │
│ null      ┆ null      │
│ null      ┆ null      │
│ 26.295667 ┆ 18.5465   │
│ 27.193    ┆ 18.865833 │
│ 27.647833 ┆ 20.280333 │
│ 25.616167 ┆ 20.8945   │
│ 24.800667 ┆ 21.0735   │
│ 22.096333 ┆ 20.968    │
│ 20.864333 ┆ 20.3745   │
│ 20.517    ┆ 19.561167 │
│ 20.660667 ┆ 21.103167 │
│ 20.894167 ┆ 21.7425   │
│ 21.4575   ┆ 24.498333 │
│ 20.6175   ┆ 26.133333 │
│ 20.2265   ┆ 26.955667 │
│ 19.372    ┆ 26.298667 │
│ 18.587833 ┆ 26.474333 │
│ 17.988833 ┆ 25.8955   │
│ 17.861    ┆ 25.343167 │
└───────────┴───────────┘

Add result back to original dataframe

In this case (with a "row count" index) the order is guaranteed, so we can simply .with_columns to add the result.

df = df.with_columns(weighted_sums)

Otherwise, you would a join: https://stackoverflow.com/a/77489932

2 of 2

final code with the invaluable help from @jqurious:

df_data = df_data.with_columns([pl.col("close-LDPM").alias("LDPS_w")])
window = dict(
by = ("ticker", "timeframe"),               # group_by these columns
index_column = pl.int_range(0, pl.count()), # a "row count" to use as the index
period = f"{lookback_period}i"              # window "size"
)

weighted_sums = df_data.rolling(**window).agg(
pl.when(pl.count() == lookback_period)
    .then(
        (pl.col("LDPS_w")
        / (pl.col("LDPS_w").cum_count().reverse() + 1)).sum()
    )
)

df_data = df_data.with_columns(weighted_sums)
df_data = df_data.drop(["int"])

I duplicated the column "close-LDPM" that I wanted to run the weighted average on, so I got to keep the original column and the new one.

Thanx again @jqurious

Stack Overflow

stackoverflow.com › questions › 78813399 › using-apply-in-polars

python - Using apply in polars - Stack Overflow

Top answer

1 of 2

pl.Expr.apply was deprecated in favour of pl.Expr.map_elements in Polars release 0.19.0. Recently, pl.Expr.apply was removed in the release of Polars 1.0.0.

You can adapt your code to the new version as follows.

df.with_columns(
    pl.col("AH_PROC_REALIZADO")
    .map_elements(get_procedure_description, return_dtype=pl.String)
    .alias("proced_descr")
)

2 of 2

If you really want to apply python function then you can use map_elements(). However, using native polars expression is always preferrable.

In your case I'd suggest to look at replace() or replace_strict().

If you would want to just search by AH_PROC_REALIZADO column you could use simple replace_strict():

df = pl.DataFrame({
    "AH_PROC_REALIZADO": ["30408", "410010065", "410010111", "XXXX"]
})

┌───────────────────┐
│ AH_PROC_REALIZADO │
│ ---               │
│ str               │
╞═══════════════════╡
│ 30408             │
│ 410010065         │
│ 410010111         │
│ XXXX              │
└───────────────────┘

df.with_columns(
    pl.col("AH_PROC_REALIZADO")
    .replace_strict(proceds, default=None)
    .alias("proced_descr")
)

┌───────────────────┬────────────────────────────────┐
│ AH_PROC_REALIZADO ┆ proced_descr                   │
│ ---               ┆ ---                            │
│ str               ┆ str                            │
╞═══════════════════╪════════════════════════════════╡
│ 30408             ┆ QUIMIOTERAPIA                  │
│ 410010065         ┆ MASTECTOMIA SIMPLES            │
│ 410010111         ┆ SETORECTOMIA / QUADRANTECTOMIA │
│ XXXX              ┆ null                           │
└───────────────────┴────────────────────────────────┘

The problem with your use case is that, as far as I understand, you want to search by prefix of the strings in AH_PROC_REALIZADO column. In that case you could probably adjust the solution to:

itertools.groupby() to transform proceds dictionary into dictionary of dictionaries where high level keys are length of the key.
replace_strict() to search for product description.
coalesce() to combine results into final column.

from itertools import groupby

mappings = {k: dict(g) for k, g in groupby(proceds.items(), lambda x: len(x[0]))}

df = pl.DataFrame({
    "AH_PROC_REALIZADO": ["30408_____", "410010065_____", "410010111____", "XXXX"]
})

┌───────────────────┐
│ AH_PROC_REALIZADO │
│ ---               │
│ str               │
╞═══════════════════╡
│ 30408_____        │
│ 410010065_____    │
│ 410010111____     │
│ XXXX              │
└───────────────────┘

df.with_columns(
    pl.coalesce(
        pl.col("AH_PROC_REALIZADO").str.head(k).replace_strict(m, default=None) for k, m in mappings.items()
    )
    .alias("proced_descr")
)

┌───────────────────┬────────────────────────────────┐
│ AH_PROC_REALIZADO ┆ proced_descr                   │
│ ---               ┆ ---                            │
│ str               ┆ str                            │
╞═══════════════════╪════════════════════════════════╡
│ 30408_____        ┆ QUIMIOTERAPIA                  │
│ 410010065_____    ┆ MASTECTOMIA SIMPLES            │
│ 410010111____     ┆ SETORECTOMIA / QUADRANTECTOMIA │
│ XXXX              ┆ null                           │
└───────────────────┴────────────────────────────────┘

Polars

docs.pola.rs › api › python › version › 0.18 › reference › dataframe › api › polars.DataFrame.groupby_rolling.html

polars.DataFrame.groupby_rolling — Polars documentation

In case of a rolling groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column. ... Define which sides of the temporal interval are closed (inclusive). ... When the by argument is given, polars can not check sortedness by the metadata and has ...

Quansight

labs.quansight.org › blog › dataframe-group-by

The Polars vs pandas difference nobody is talking about | Labs

November 11, 2024 - df[df['sales'] > df.groupby('id')['sales'].transform('mean')].groupby('id')['views'].max() It's not as bad as the apply solution above, but it still looks overly complicated and requires two group-bys. ... Realistically, few users would come up with it (most would jump straight to apply), but for completeness, we present it: ... The Polars API lets us pass expressions to GroupBy.agg.