Short answer: change data.columns=[headerName] into data.columns=headerName
Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.
When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.
For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.
products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]
However, I guess the first solution is what you're looking for
Answer from hoang tran on Stack OverflowShort answer: change data.columns=[headerName] into data.columns=headerName
Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.
When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.
For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.
products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]
However, I guess the first solution is what you're looking for
You get AttributeError: 'DataFrame' object has no attribute ... when you try to access an attribute your dataframe doesn't have.
A common case is when you try to select a column using . instead of [] when the column name contains white space (e.g. 'col1 ').
df.col1 # <--- error
df['col1 '] # <--- no error
Another common case is when you try to call a Series method on a DataFrame. For example, tolist() (or map()) are Series methods so they must be called on a column. If you call them on a DataFrame, you'll get
AttributeError: 'DataFrame' object has no attribute 'tolist'
AttributeError: 'DataFrame' object has no attribute 'map'
As hoang tran explains, this is what is happening with OP as well. .str is a Series accessor and it's not implemented for DataFrames.
Yet another case is if you have a typo and try to call/access an attribute that's simply not defined; e.g. if you try to call rows() instead of iterrows(), you'll get
AttributeError: 'DataFrame' object has no attribute 'rows'
You can check the full list of attributes using the following comprehension.
[x for x in dir(pd.DataFrame) if not x.startswith('_')]
When you assign column names as df.columns = [['col1', 'col2']], df is a MultiIndex dataframe now, so to access each column, you'll need to pass a tuple:
df['col1'].str.contains('Product A') # <---- error
df['col1',].str.contains('Product A') # <---- no error; note the trailing comma
In fact, you can pass a tuple to select a column of any MultiIndex dataframe, e.g.
df['level_1_colname', 'level_2_colname'].str.contains('Product A')
You can also flatten a MultiIndex column names by mapping a "flattener" function on it. A common one is ''.join:
df.columns = df.columns.map('_'.join)
Hi, I'm trying to run a str.split method on a simple Pandas dataframe that has a ID column and text column that is of 'object' type and get the message AttributeError: 'DataFrame' object has no attribute 'str' . I think the column should be in a series format to run the str.split method and have tried to change the datatype to 'str', but the datatype stays a an object. How can I get the column into a series object to run a series method?
python - 'DataFrame' object has no attribute 'str' - Stack Overflow
pandas - Python AttributeError: 'str' object has no attribute 'DataFrame' - Stack Overflow
python - pandas - 'dataframe' object has no attribute 'str' - Stack Overflow
AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working
The error means exactly what it says:
AttributeError: 'str' object has no attribute 'DataFrame'
^ ^ ^
the kind of error | |
the thing you tried to use what was missing from it
The line it's complaining about:
df = pd.DataFrame(date, columns = ['Date'])
^ ^
| the attribute the error said was missing
the thing the error said was a string
has been working no problem until I added a few lines of code above
Evidently, somewhere in the "few lines of code above", you caused pd to be a string. And sure enough, when we look at those few lines of code, we find:
pd = PDays[j]
^ ^
| the string that you're making it into
the thing that you're making a string
You are reassign pd
import pandas as pd
to
pd = PDays[j]
The function pd.read_csv() is already a DataFrame and thus that kind of object does not support calling .to_dataframe().
You can check the type of your variable ds using print(type(ds)), you will see that it is a pandas DataFrame type.
According to what I understand. You are loading loanapp_c.csv in ds using this code:
ds = pd.read_csv('desktop/python ML/loanapp_c.csv')
ds over here is a DataFrame object. What you are doing is calling to_dataframe on an object which a DataFrame already.
Removing this dataset = ds.to_dataframe() from your code should solve the error
First problem shoud be duplicated columns names, so after select colB get not Series, but DataFrame:
df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colB','colB','colC'])
print (df)
colB colB colC
0 Example: s as 2
1 dd aaa 3
print (df['colB'])
colB colB
0 Example: s as
1 dd aaa
#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'
Solution should be join columns together:
print (df['colB'].apply(' '.join, axis=1))
0 Example: s as
1 dd aaa
df['colB'] = df.pop('colB').apply(' '.join, axis=1)
df = df[~df['colB'].str.contains('Example:')]
print (df)
colC colB
1 3 dd aaa
Second problem should be hidden MultiIndex:
df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colA','colB','colC'])
df.columns = pd.MultiIndex.from_arrays([df.columns])
print (df)
colA colB colC
0 Example: s as 2
1 dd aaa 3
print (df['colB'])
colB
0 as
1 aaa
#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'
Solution is reassign first level:
df.columns = df.columns.get_level_values(0)
df = df[~df['colB'].str.contains('Example:')]
print (df)
colA colB colC
0 Example: s as 2
1 dd aaa 3
And third should be MultiIndex:
df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colA','colB','colC'])
df.columns = pd.MultiIndex.from_product([df.columns, ['a']])
print (df)
colA colB colC
a a a
0 Example: s as 2
1 dd aaa 3
print (df['colB'])
a
0 as
1 aaa
print (df.columns)
MultiIndex(levels=[['colA', 'colB', 'colC'], ['a']],
codes=[[0, 1, 2], [0, 0, 0]])
#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'
Solution is select MultiIndex by tuple:
df1 = df[~df[('colB', 'a')].str.contains('Example:')]
print (df1)
colA colB colC
a a a
0 Example: s as 2
1 dd aaa 3
Or reassign back:
df.columns = df.columns.get_level_values(0)
df2 = df[~df['colB'].str.contains('Example:')]
print (df2)
colA colB colC
0 Example: s as 2
1 dd aaa 3
Or remove second level:
df.columns = df.columns.droplevel(1)
df2 = df[~df['colB'].str.contains('Example:')]
print (df2)
colA colB colC
0 Example: s as 2
1 dd aaa 3
Try this:
df[[~df.iloc[i,:].str.contains('String_to_match').any() for i in range(0,len(df))]]