apply was renamed to .map_elements() some time ago.
Previous versions printed a deprecation warning, but it was eventually removed after a grace period.
You're likely looking at the docs for an older version of Polars, but there is a "version switcher" on the docs site:
As for the actual task, you can also do it natively using .dt.to_string()
import datetime
import polars as pl
pl.select(
pl.lit(str(datetime.datetime.now()))
.str.to_datetime()
.dt.to_string("%A")
)
shape: (1, 1)
βββββββββββ
β literal β
β --- β
β str β
βββββββββββ‘
β Tuesday β
βββββββββββ
Answer from jqurious on Stack Overflowpython - Using apply in polars - Stack Overflow
ColumnTransformer.fit() fails on polars.DataFrame: AttributeError: 'DataFrame' object has no attribute 'size'
AttributeError: 'DataFrame' object has no attribute 'get'
'Expr' object has no attribute 'apply'
pl.Expr.apply was deprecated in favour of pl.Expr.map_elements in Polars release 0.19.0. Recently, pl.Expr.apply was removed in the release of Polars 1.0.0.
You can adapt your code to the new version as follows.
Copydf.with_columns(
pl.col("AH_PROC_REALIZADO")
.map_elements(get_procedure_description, return_dtype=pl.String)
.alias("proced_descr")
)
If you really want to apply python function then you can use map_elements(). However, using native polars expression is always preferrable.
In your case I'd suggest to look at replace() or replace_strict().
If you would want to just search by AH_PROC_REALIZADO column you could use simple replace_strict():
Copydf = pl.DataFrame({
"AH_PROC_REALIZADO": ["30408", "410010065", "410010111", "XXXX"]
})
βββββββββββββββββββββ
β AH_PROC_REALIZADO β
β --- β
β str β
βββββββββββββββββββββ‘
β 30408 β
β 410010065 β
β 410010111 β
β XXXX β
βββββββββββββββββββββ
df.with_columns(
pl.col("AH_PROC_REALIZADO")
.replace_strict(proceds, default=None)
.alias("proced_descr")
)
βββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β AH_PROC_REALIZADO β proced_descr β
β --- β --- β
β str β str β
βββββββββββββββββββββͺβββββββββββββββββββββββββββββββββ‘
β 30408 β QUIMIOTERAPIA β
β 410010065 β MASTECTOMIA SIMPLES β
β 410010111 β SETORECTOMIA / QUADRANTECTOMIA β
β XXXX β null β
βββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ
The problem with your use case is that, as far as I understand, you want to search by prefix of the strings in AH_PROC_REALIZADO column. In that case you could probably adjust the solution to:
itertools.groupby()to transformprocedsdictionary into dictionary of dictionaries where high level keys are length of the key.replace_strict()to search for product description.coalesce()to combine results into final column.
Copyfrom itertools import groupby
mappings = {k: dict(g) for k, g in groupby(proceds.items(), lambda x: len(x[0]))}
df = pl.DataFrame({
"AH_PROC_REALIZADO": ["30408_____", "410010065_____", "410010111____", "XXXX"]
})
βββββββββββββββββββββ
β AH_PROC_REALIZADO β
β --- β
β str β
βββββββββββββββββββββ‘
β 30408_____ β
β 410010065_____ β
β 410010111____ β
β XXXX β
βββββββββββββββββββββ
df.with_columns(
pl.coalesce(
pl.col("AH_PROC_REALIZADO").str.head(k).replace_strict(m, default=None) for k, m in mappings.items()
)
.alias("proced_descr")
)
βββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β AH_PROC_REALIZADO β proced_descr β
β --- β --- β
β str β str β
βββββββββββββββββββββͺβββββββββββββββββββββββββββββββββ‘
β 30408_____ β QUIMIOTERAPIA β
β 410010065_____ β MASTECTOMIA SIMPLES β
β 410010111____ β SETORECTOMIA / QUADRANTECTOMIA β
β XXXX β null β
βββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ
Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.
A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:
##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]
test_var = test.loc[:,['Sex']]
BayesNet = BayesianModel([('Sex','Survived')])I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:
predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictionsFor example, the error I get for this version of the code:
train_var = train.loc[:,['Survived','Pclass','Sex']]
test_var = test.loc[:,['Pclass']]
BayesNet = BayesianModel([('Sex','Pclass','Survived')])is this:
AttributeError Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
2 predictions
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'Survived'Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.
Any help would be greatly appreciated. I know it's a lot.
Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.
It's an issue with how you're calling the data and if it's actually there.
train.loc[:,['Survived','Sex']]
tells me that there's a DataFrame (which is from pandas, hence the error) called train and this line is trying to access parts of that dataframe (it's just a type of an array). Specifically, it's trying to access columns named Survived and Sex.
Similarly, this line tells me there's another dataframe (df) known as test with a column named Sex and this is access that data.
test.loc[:,['Sex']]
The error code also informs me of some things
predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
There's another df called predictions that's of dict type which is trying to access information from the another hypothesis df. The attribute it's tryin to access in the second key of the dict is
hypothesis.Survived.tolist()
which is a way of calling a column from that df. That is, when the predictions line is executed, it's trying to pull all the values from the Survived column of the hypothesis df.
The error is that the df doesn't actually have a column named Survived. So either there's missing data, or you're calling it wrong, or there's a missing reference.
Without knowing more about your code and your question, I can't really extrapolate much more.