The first step would be to check if your task can be solved natively using Polars Expressions.
If a custom function is neccessary, .map_elements() can be used to apply one on a row by row basis.
To pass in values from multiple columns, you can utilize the Struct data type.
e.g. with pl.struct()
Copy>>> df.select(pl.struct(pl.all())) # all columns
shape: (3, 1)
βββββββββββββ
β foo β
β --- β
β struct[3] β
βββββββββββββ‘
β {1,4,7} β
β {2,5,8} β
β {3,6,9} β
βββββββββββββ
Using pl.struct(...).map_elements will pass the values to the custom function as a dict argument.
Copydef my_complicated_function(row: dict) -> int:
"""
A function that cannot utilize polars expressions.
This should be avoided.
"""
# a dict with column names as keys
print(f"[DEBUG]: {row=}")
# do some work
return row["foo"] + row["bar"] + row["baz"]
df = pl.DataFrame({
"foo": [1, 2, 3],
"bar": [4, 5, 6],
"baz": [7, 8, 9]
})
df = df.with_columns(
pl.struct(pl.all())
.map_elements(my_complicated_function, return_dtype=pl.Int64)
.alias("foo + bar + baz")
)
Copy# [DEBUG]: row={'foo': 1, 'bar': 4, 'baz': 7}
# [DEBUG]: row={'foo': 2, 'bar': 5, 'baz': 8}
# [DEBUG]: row={'foo': 3, 'bar': 6, 'baz': 9}
Copyshape: (3, 4)
βββββββ¬ββββββ¬ββββββ¬ββββββββββββββββββ
β foo β bar β baz β foo + bar + baz β
β --- β --- β --- β --- β
β i64 β i64 β i64 β i64 β
βββββββͺββββββͺββββββͺββββββββββββββββββ‘
β 1 β 4 β 7 β 12 β
β 2 β 5 β 8 β 15 β
β 3 β 6 β 9 β 18 β
βββββββ΄ββββββ΄ββββββ΄ββββββββββββββββββ
Answer from ritchie46 on Stack Overflowpython - Apply function to all columns of a Polars-DataFrame - Stack Overflow
Make it easy to apply a user-defined function to a DataFrame in Rust
Is there a "correct" way to convert a Pandas Function to Polars?
In polars, how to apply a custon function to a column of strings?
Finally broke down and decided I should learn Polars. Been a few hiccups, but going well. One I ran into was this example:
In Pandas I have a function which splits a string every 8 characters, adds ".TIF" and returns it as a string that I can later convert to a list with ast
def split_and_add_tif(text):
return [','.join([text[I:8] + '.TIF' for i in range(0, len(text, 8)])]
to apply this with Pandas I'd simply do:
df['FileList'] = df['FileList'].apply(split_and_add_tif)
However in Polars it appears I need to do this:
df = df.with_columns(
df['FileList'].map_elements(split_and_add_tif, return_dtype = list).alias('FileList')
)
Is this the correct way to do this? Does anyone have a good resource for going from Pandas Functions to Polars? Still a bit confused on the difference between map_rows and map_elements. Any advice/pointers are welcomed.