polars apply lambda function

In a selection context, the function is applied by row. >>> df.with_columns( ... pl.col("a").apply(lambda x: x * 2).alias("a_times_2"), ...

TypeThePipe

typethepipe.com › vizs-and-tips › python-polars-suggest-efficient-expressions-lambda-function

Polars new feature. Suggest more efficient Polars method for apply lambda functions | TypeThePipe

July 20, 2023 - import polars as pl df = pl.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie'], 'offensive_skill': [5, 30, 85], 'defensive_skill': [92, 30, 10] }) df.with_columns( pl.col("defensive_skill").apply(lambda x: x/3) )

Discussions

python - creating a new column in polars applying a function to a column - Stack Overflow

8 Using Polars with Python and being thrown the following exception: AttributeError: 'Expr' object has no attribute 'apply' 4048 How can I use a global variable in a function? More on stackoverflow.com

stackoverflow.com

python - polars apply a lambda with list comprehension like pandas: Any other better way? - Stack Overflow

2 Is it possible to reference another ... using a lambda? 2 How to create a cumulated list of a column's elements · 5 polars dropna equivalent using a subset of columns with threshold · 6 How to multiply each element in a list with a value in a different column? 8 Python Polars: how to convert a list of dictionaries to polars dataframe without using pandas · 1 Python Polars - add element to columns of lists which has value equal to a function of the ... More on stackoverflow.com

stackoverflow.com

December 21, 2022

python - Improving polars statement that adds a column applying a lambda function on each row - Stack Overflow

The idea is to use Polars Expressions instead of applying custom Python functions/lambdas. More on stackoverflow.com

stackoverflow.com

python - Apply function to all columns of a Polars-DataFrame - Stack Overflow

I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame. I checked the section from the Po... More on stackoverflow.com

stackoverflow.com

Towards Data Science

towardsdatascience.com › home › latest › manipulating values in polars dataframes

Manipulating Values in Polars DataFrames | Towards Data Science

January 29, 2025 - This means that the apply() function, when applied to a dataframe, sends the values of each row as a tuple to the receiving function. This is useful for some use cases. For example, say you need to perform an integer division of all the numbers ...

Polars

docs.pola.rs › api › python › version › 0.18 › reference › dataframe › api › polars.DataFrame.apply.html

polars.DataFrame.apply — Polars documentation

If your function is expensive and you don’t want it to be called more than once for a given input, consider applying an @lru_cache decorator to it. With suitable data you may achieve order-of-magnitude speedups (or more). ... >>> df.apply(lambda t: (t[0] * 2, t[1] * 3)) shape: (3, 2) ┌──────────┬──────────┐ │ column_0 ┆ column_1 │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞══════════╪══════════╡ │ 2 ┆ -3 │ │ 4 ┆ 15 │ │ 6 ┆ 24 │ └──────────┴──────────┘

Polars

docs.pola.rs › api › python › dev › reference › expressions › api › polars.Expr.map_elements.html

polars.Expr.map_elements — Polars documentation

Polars may call the function with arbitrary input data. Examples · >>> df = pl.DataFrame( ... { ... "a": [1, 2, 3, 1], ... "b": ["a", "b", "c", "c"], ... } ... ) The function is applied to each element of column 'a': >>> df.with_columns( ... pl.col("a") ... .map_elements(lambda x: x * 2, return_dtype=pl.self_dtype()) ...

Rho Signal

rhosignal.com › posts › polars-aws-lambda

AWS Lambda with Polars | Rho Signal

November 14, 2024 - Then you can create a lambda function that uses your image as a container. See this AWS tutorial for more details on these steps. There’s a lot more to say about optimising Polars and AWS Lambda. For example, you can use Polars to read and write from S3 in lazy mode and this allows Polars to apply query optimisations.

Polars

docs.pola.rs › api › python › version › 0.18 › reference › expressions › api › polars.apply.html

polars.apply — Polars documentation

>>> df.with_columns( ... pl.col("a").apply(lambda x: x * x).alias("product_a") ...

Stack Overflow

stackoverflow.com › questions › 76220250 › creating-a-new-column-in-polars-applying-a-function-to-a-column

python - creating a new column in polars applying a function to a column - Stack Overflow

Top answer

1 of 2

Your first example works because a Series has a multiplication method.

For example if you do

Copyfunc(pl.Series([1,2,3,4,5]))

then you get back a series of the original multiplied by 2.

Your func2 is just an anonymous function. To use map_batches, your function needs to operate on the entire column and return something like a Series.

For instance:

Copyfrom lxml import etree as ET
def func2_series(xml_strings):
    ret_List=[]
    for xml_string in xml_strings:
        root = ET.fromstring(xml_string, ET.XMLParser(recover=True))
        text_list = []
        for elem in root.iter():
            text = elem.text.strip() if elem.text else ''
            text_list.append(text)
        ret_List.append(text_list)
    return pl.Series(ret_List)

followed by

Copydf.with_columns(pl.col("B").map_batches(func2_series).alias('new_col2'))

will work.

Alternatively if you have

Copydef func2(xml_string):
    root = ET.fromstring(xml_string, ET.XMLParser(recover=True))
    text_list = []
    for elem in root.iter():
        text = elem.text.strip() if elem.text else ''
        text_list.append(text)
    return text_list

then you can use map_elements and polars will do the looping for you.

Copydfpl.with_columns(pl.col("B").map_elements(func2))

btw, you don't need to use a lambda if the function you're passing accepts the exact x that you have. In other words where you have .map_batches(lambda x: func2(x)) you can just do .map_batches(func2). The lambda comes into play if you need to transform the parameters.

2 of 2

As stated in the comments, use .map_elements instead of .map_batches. Also, if you want only list of strings I recommend to use beautifulsoups method .stripped_strings:

Copyimport polars as pl
from bs4 import BeautifulSoup

# create a sample dataframe
df = pl.DataFrame({
    'A': [1, 2, 3],
    'B': ['<p>some text</p><p>bla</p>', '<p>some text<p><p>foo</p>', '<p>some text<p>']
})

def func(mystring):
    return mystring*2

def func2(xml_string):
    soup = BeautifulSoup(xml_string, 'html.parser')
    return list(soup.stripped_strings)

# create a sample series to add as a new column
df = df.with_columns((pl.col("A").map_elements(lambda x: func(x)).alias('new_col')))
df = df.with_columns((pl.col("B").map_elements(lambda x: func2(x)).alias('new_col2')))

print(df)

Prints:

shape: (3, 4)
┌─────┬────────────────────────────┬─────────┬──────────────────────┐
│ A   ┆ B                          ┆ new_col ┆ new_col2             │
│ --- ┆ ---                        ┆ ---     ┆ ---                  │
│ i64 ┆ str                        ┆ i64     ┆ list[str]            │
╞═════╪════════════════════════════╪═════════╪══════════════════════╡
│ 1   ┆ <p>some text</p><p>bla</p> ┆ 2       ┆ ["some text", "bla"] │
│ 2   ┆ <p>some text<p><p>foo</p>  ┆ 4       ┆ ["some text", "foo"] │
│ 3   ┆ <p>some text<p>            ┆ 6       ┆ ["some text"]        │
└─────┴────────────────────────────┴─────────┴──────────────────────┘

Stack Overflow

stackoverflow.com › questions › 74874134 › polars-apply-a-lambda-with-list-comprehension-like-pandas-any-other-better-way

python - polars apply a lambda with list comprehension like pandas: Any other better way? - Stack Overflow

Top answer

1 of 1

The functionality is available natively in Polars via the .str namespace.

.str.split() doesn't support regex.

But similar behaviour can be achieved with .extract_all() and .replace_all()

df = pl.DataFrame({"content": ["o neHItw oHIIIIIth ree", "fo urHIIfi veHIIIIs ix"]})

pattern2 = r"HI+"
pattern3 = r"\s"

replacement = ""

df.with_columns(
   pl.col("content").str.extract_all(rf".*?({pattern2}|$)")
     .alias("sentences")
)

shape: (2, 2)
┌────────────────────────┬────────────────────────────────────┐
│ content                ┆ sentences                          │
│ ---                    ┆ ---                                │
│ str                    ┆ list[str]                          │
╞════════════════════════╪════════════════════════════════════╡
│ o neHItw oHIIIIIth ree ┆ ["o neHI", "tw oHIIIII", "th ree"] │
│ fo urHIIfi veHIIIIs ix ┆ ["fo urHII", "fi veHIIII", "s ix"] │
└────────────────────────┴────────────────────────────────────┘

list.eval() could then be used to process the list and "extract" the desired result.

df.with_columns(
   pl.col("content").str.extract_all(rf".*?({pattern2}|$)")
     .list.eval(
        pl.element().str.replace_all(pattern2, "")
                    .str.replace_all(pattern3, replacement)
     )
     .alias("normal_text")
)

shape: (2, 2)
┌────────────────────────┬─────────────────────────┐
│ content                ┆ normal_text             │
│ ---                    ┆ ---                     │
│ str                    ┆ list[str]               │
╞════════════════════════╪═════════════════════════╡
│ o neHItw oHIIIIIth ree ┆ ["one", "two", "three"] │
│ fo urHIIfi veHIIIIs ix ┆ ["four", "five", "six"] │
└────────────────────────┴─────────────────────────┘

Performance

A basic comparison of both approaches.

N = 2000
df = pl.DataFrame({
   "content": [
      "o neHItw oHIIIIIth ree" * N, 
      "fo urHIIfi veHIIIIs ix" * N] * N
})

Name	Time
.str + .list.eval()	8.28s
.map_elements()	29.9s

Find elsewhere

Google Bing Mojeek

Polars

docs.pola.rs › api › python › version › 0.19 › reference › series › api › polars.Series.apply.html

polars.Series.apply — Polars documentation

return_dtype: PolarsDataType | None = None, *, skip_nulls: bool = True, ) → Self[source]# Apply a custom/user-defined function (UDF) over elements in this Series. Deprecated since version 0.19.0: This method has been renamed to Series.map_elements(). Parameters: function · Custom function or lambda.

Polars

docs.pola.rs › polars-cloud › integrations › lambda

AWS Lambda - Polars user guide

The code for the lambda function can be boiled down to the following (pseudo-code): import boto3 import polars as pl import polars_cloud as pc client = boto3.client("secretsmanager") # authenticate to polars cloud with the secrets created above pc.authenticate( client_id=client.get_secret_value(SecretId="<SECRET>").get("SecretString"), client_secret=client.get_secret_value(SecretId="<SECRET>").get("SecretString"), ) # define the compute context cc = pc.ComputeContext(cpus=2, memory=4) # submit the query pl.scan_csv(...).remote(cc).sink_parquet(...)

Polars

docs.pola.rs › user-guide › expressions › user-defined-python-functions

User-defined Python functions - Polars user guide

Polars expressions are quite powerful and flexible, so there is much less need for custom Python functions compared to other libraries. Still, you may need to pass an expression's state to a third party library or apply your black box function to data in Polars.

Stack Overflow

stackoverflow.com › questions › 76507474 › improving-polars-statement-that-adds-a-column-applying-a-lambda-function-on-each

python - Improving polars statement that adds a column applying a lambda function on each row - Stack Overflow

Top answer

1 of 1

The idea is to use Polars Expressions instead of applying custom Python functions/lambdas.

It looks like you're trying to count when ref and another column have the same value?

df.select(pl.exclude("ref") == pl.col("ref"))

shape: (3, 2)
┌───────┬───────┐
│ v1    ┆ v2    │
│ ---   ┆ ---   │
│ bool  ┆ bool  │
╞═══════╪═══════╡
│ true  ┆ true  │
│ false ┆ false │
│ false ┆ true  │
└───────┴───────┘

.sum_horizontal() can be used to get a "count" of the true values on each row.

df.with_columns(count = pl.sum_horizontal(pl.exclude("ref") == pl.col("ref")))

shape: (3, 4)
┌─────┬─────┬─────┬───────┐
│ ref ┆ v1  ┆ v2  ┆ count │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ i64 ┆ i64 ┆ u32   │
╞═════╪═════╪═════╪═══════╡
│ -1  ┆ -1  ┆ -1  ┆ 2     │
│ 2   ┆ 5   ┆ 5   ┆ 0     │
│ 8   ┆ 0   ┆ 8   ┆ 1     │
└─────┴─────┴─────┴───────┘

Stack Overflow

stackoverflow.com › questions › 67834912 › apply-function-to-all-columns-of-a-polars-dataframe

python - Apply function to all columns of a Polars-DataFrame - Stack Overflow

Top answer

1 of 1

You can use the expression syntax to select all columns with pl.all() and then map_batches the numpy np.log2(..) function over the columns.

df.select(
    pl.all().map_batches(np.log2)
)

Note that we choose map_batches here as map_elements would call the function upon each value.

map_elements = pl.Series(np.log2(value) for value in pl.Series([1, 2, 3]))

But np.log2 can be called once with multiple values, which would be faster.

map_batches = np.log2(pl.Series([1, 2, 3]))

See the User guide for more.

map_elements: Call a function separately on each value in the Series.
map_batches: Always passes the full Series to the function.

Numpy

Polars expressions also support numpy universal functions.

That means you can pass a polars expression to a numpy ufunc:

df.select(
    np.log2(pl.all())
)

Polars

docs.pola.rs › py-polars › html › reference › expressions › index.html

Expressions — Polars documentation

where \(\lambda\) equals \(\ln(2) / \text{half_life}\).

Medium

medium.com › @arkimetrix.analytics › part-5-pythons-polars-streamlining-data-processing-fluent-interfaces-in-action-ab8cd31e83d2

Mastering Efficient and Readable Code with Polars: Unleashing the Power of Fluent Interface Design | Medium

June 12, 2023 - Here, each lambda function takes a DataFrame as an argument and applies a transformation before returning it.

Confessions of a Data Guy

confessionsofadataguy.com › home › polars vs pandas. inside an aws lambda.

Polars vs Pandas. Inside an AWS Lambda. - Confessions of a Data Guy

July 22, 2023 - Remember, I just want to read a bucket of s3 files as easily as possible and do some simple work on a Lambda … I want it to be as easy as it would be with Spark! Firstly, because of the file-by-file iteration we had to do in Pandas, I has my hopes extremely hight that Polars in conjunction with pyarrow might be able to simply read a folder.

Medium

medium.com › @kasperjuunge › 20-pandas-operations-translated-to-polars-4b9daba154f5

20 Pandas Operations Translated to Polars | by Kasper Junge | Medium

January 11, 2024 - Polars: pl.concat([df1, df2]) Pandas: df['A'].apply(lambda x: x*2) Polars: df.with_column(pl.col('A').apply(lambda x: x*2)) Pandas: df.dropna() Polars: df.drop_nulls() Pandas: df.fillna(value) Polars: df.fill_none(value) Pandas: df.rename(columns={'A': 'X'}) Polars: df.rename({'A': 'X'}) Pandas: df['A'].unique() Polars: df['A'].unique() Pandas: df.info() Polars: df.describe() Pandas: df[df['A'] > 1] Polars: df.filter(pl.col('A') > 1) Pandas: df.agg({'A': ['sum', 'min'], 'B': ['max', 'mean']}) Polars: df.agg([pl.sum('A'), pl.min('A'), pl.max('B'), pl.mean('B')]) Pandas: df['A'].astype('float') Polars: df.with_column(pl.col('A').cast(pl.Float64)) Pandas: df1.merge(df2, on='key').merge(df3, on='key') Polars: df1.join(df2, on='key').join(df3, on='key') Pandas Vs Polars ·

Stack Overflow

stackoverflow.com › questions › 76822683 › polars-apply-lambda-alternative

python - Polars apply lambda alternative - Stack Overflow

Top answer

1 of 1

Use replace_strict:

In [21]: data1 = {"a": [1, 2, 3, 4], "b1": [11, 12, 13, 14], "c1" : [31, 32, 33, 34]}
    ...: df1_pl = pl.DataFrame(data1)
    ...: print(df1_pl)
    ...: weekday = ['Monday', 'Tuesday', 'Wednesday', 'Thursday']
    ...:
    ...: print(df1_pl.with_columns(
    ...: weekday=pl.col('a').replace_strict({idx: val for idx, val in enumerate(weekday, start=1)})
    ...: ))
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b1  ┆ c1  │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 11  ┆ 31  │
│ 2   ┆ 12  ┆ 32  │
│ 3   ┆ 13  ┆ 33  │
│ 4   ┆ 14  ┆ 34  │
└─────┴─────┴─────┘
shape: (4, 4)
┌─────┬─────┬─────┬───────────┐
│ a   ┆ b1  ┆ c1  ┆ weekday   │
│ --- ┆ --- ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ i64 ┆ str       │
╞═════╪═════╪═════╪═══════════╡
│ 1   ┆ 11  ┆ 31  ┆ Monday    │
│ 2   ┆ 12  ┆ 32  ┆ Tuesday   │
│ 3   ┆ 13  ┆ 33  ┆ Wednesday │
│ 4   ┆ 14  ┆ 34  ┆ Thursday  │
└─────┴─────┴─────┴───────────┘

reddit.com › r/rust › polars: computing a new column from multiple columns - there must be a better way

r/rust on Reddit: Polars: Computing a new column from multiple columns - there must be a better way

May 4, 2023 -

I recently decided to use Polars in my side-project and stumbled upon surprisingly challenging task: computing a new column from two other variables using a function (not Polars expressions) for computation. I read the data from CSV file and want to derive more variables from it and I thought Polars would be a good tool for that.

Because I needed a working, not a good code I created a solution below, but there must be a better way! But maybe Polars is not for such use-cases and I should use some other crate? If so, please tell me which one.

fn add_col3(df: LazyFrame) -> Result<LazyFrame> {
    let mut col3 = vec![];

    let data = df.clone().collect()?.to_ndarray::<Float64Type>()?;

    for row in data.rows() {
        // I'm aware I could do this loop
        // more efficiently
        let a = row[1]; 
        let b = row[2];

        let c = complex_computation(a, b)?;

        col3.push(c);
    }

    let col3 = Series::new("Value3", col3);
    let df = df.with_column(col3.lit());

    Ok(df)
}

fn complex_computation(a: f64, b: f64) -> Result<f64> {
    ...
    
    Ok(c)
}

Having to clone, collect and convert to ndarray seems very inefficient to me and no really idiomatic. But I'm rather clueless how I could do this better - and most questions online discuss Python API of Polars.

In Python I would need to use .struct() and .apply() with lambda to do that computation. But in Rust .struct() seems to not exist and .apply() is for one column only.

Did anyone attempted to do that previously and came up with a better solution?

Top answer

1 of 3

You can pack the expression in a struct with as_struct and then apply a custom function.

2 of 3

I'm still migrating to Rust, but the Python solution is pretty reasonable: df = df.with_column(pl.struct(['col_a','col_b']) \ .apply(lambda x: complex_computation(x['col_a'], x['col_b'])).alias('col_3'))