Short answer: change data.columns=[headerName] into data.columns=headerName
Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.
When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.
For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.
products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]
However, I guess the first solution is what you're looking for
Answer from hoang tran on Stack OverflowShort answer: change data.columns=[headerName] into data.columns=headerName
Explanation: when you set data.columns=[headerName], the columns are MultiIndex object. Therefore, your log_df['Product'] is a DataFrame and for DataFrame, there is no str attribute.
When you set data.columns=headerName, your log_df['Product'] is a single column and you can use str attribute.
For any reason, if you need to keep your data as MultiIndex object, there is another solution: first convert your log_df['Product'] into Series. After that, str attribute is available.
products = pd.Series(df.Product.values.flatten())
include_clique = products[products.str.contains("Product A")]
However, I guess the first solution is what you're looking for
You get AttributeError: 'DataFrame' object has no attribute ... when you try to access an attribute your dataframe doesn't have.
A common case is when you try to select a column using . instead of [] when the column name contains white space (e.g. 'col1 ').
df.col1 # <--- error
df['col1 '] # <--- no error
Another common case is when you try to call a Series method on a DataFrame. For example, tolist() (or map()) are Series methods so they must be called on a column. If you call them on a DataFrame, you'll get
AttributeError: 'DataFrame' object has no attribute 'tolist'
AttributeError: 'DataFrame' object has no attribute 'map'
As hoang tran explains, this is what is happening with OP as well. .str is a Series accessor and it's not implemented for DataFrames.
Yet another case is if you have a typo and try to call/access an attribute that's simply not defined; e.g. if you try to call rows() instead of iterrows(), you'll get
AttributeError: 'DataFrame' object has no attribute 'rows'
You can check the full list of attributes using the following comprehension.
[x for x in dir(pd.DataFrame) if not x.startswith('_')]
When you assign column names as df.columns = [['col1', 'col2']], df is a MultiIndex dataframe now, so to access each column, you'll need to pass a tuple:
df['col1'].str.contains('Product A') # <---- error
df['col1',].str.contains('Product A') # <---- no error; note the trailing comma
In fact, you can pass a tuple to select a column of any MultiIndex dataframe, e.g.
df['level_1_colname', 'level_2_colname'].str.contains('Product A')
You can also flatten a MultiIndex column names by mapping a "flattener" function on it. A common one is ''.join:
df.columns = df.columns.map('_'.join)
Hi, I'm trying to run a str.split method on a simple Pandas dataframe that has a ID column and text column that is of 'object' type and get the message AttributeError: 'DataFrame' object has no attribute 'str' . I think the column should be in a series format to run the str.split method and have tried to change the datatype to 'str', but the datatype stays a an object. How can I get the column into a series object to run a series method?
python - pandas - 'dataframe' object has no attribute 'str' - Stack Overflow
pandas - Python AttributeError: 'str' object has no attribute 'DataFrame' - Stack Overflow
python - 'DataFrame' object has no attribute 'str' - Stack Overflow
AttributeError: 'DataFrame' object has no attribute 'name'; Various stack overflow / github suggested fixes not working
Videos
The error means exactly what it says:
AttributeError: 'str' object has no attribute 'DataFrame'
^ ^ ^
the kind of error | |
the thing you tried to use what was missing from it
The line it's complaining about:
df = pd.DataFrame(date, columns = ['Date'])
^ ^
| the attribute the error said was missing
the thing the error said was a string
has been working no problem until I added a few lines of code above
Evidently, somewhere in the "few lines of code above", you caused pd to be a string. And sure enough, when we look at those few lines of code, we find:
pd = PDays[j]
^ ^
| the string that you're making it into
the thing that you're making a string
You are reassign pd
import pandas as pd
to
pd = PDays[j]
value_counts is a Series method rather than a DataFrame method (and you are trying to use it on a DataFrame, clean). You need to perform this on a specific column:
clean[column_name].value_counts()
It doesn't usually make sense to perform value_counts on a DataFrame, though I suppose you could apply it to every entry by flattening the underlying values array:
pd.value_counts(df.values.flatten())
To get all the counts for all the columns in a dataframe, it's just df.count()
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
First problem shoud be duplicated columns names, so after select colB get not Series, but DataFrame:
df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colB','colB','colC'])
print (df)
colB colB colC
0 Example: s as 2
1 dd aaa 3
print (df['colB'])
colB colB
0 Example: s as
1 dd aaa
#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'
Solution should be join columns together:
print (df['colB'].apply(' '.join, axis=1))
0 Example: s as
1 dd aaa
df['colB'] = df.pop('colB').apply(' '.join, axis=1)
df = df[~df['colB'].str.contains('Example:')]
print (df)
colC colB
1 3 dd aaa
Second problem should be hidden MultiIndex:
df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colA','colB','colC'])
df.columns = pd.MultiIndex.from_arrays([df.columns])
print (df)
colA colB colC
0 Example: s as 2
1 dd aaa 3
print (df['colB'])
colB
0 as
1 aaa
#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'
Solution is reassign first level:
df.columns = df.columns.get_level_values(0)
df = df[~df['colB'].str.contains('Example:')]
print (df)
colA colB colC
0 Example: s as 2
1 dd aaa 3
And third should be MultiIndex:
df = pd.DataFrame([['Example: s', 'as', 2], ['dd', 'aaa', 3]], columns=['colA','colB','colC'])
df.columns = pd.MultiIndex.from_product([df.columns, ['a']])
print (df)
colA colB colC
a a a
0 Example: s as 2
1 dd aaa 3
print (df['colB'])
a
0 as
1 aaa
print (df.columns)
MultiIndex(levels=[['colA', 'colB', 'colC'], ['a']],
codes=[[0, 1, 2], [0, 0, 0]])
#print (df['colB'].str.contains('Example:'))
#>AttributeError: 'DataFrame' object has no attribute 'str'
Solution is select MultiIndex by tuple:
df1 = df[~df[('colB', 'a')].str.contains('Example:')]
print (df1)
colA colB colC
a a a
0 Example: s as 2
1 dd aaa 3
Or reassign back:
df.columns = df.columns.get_level_values(0)
df2 = df[~df['colB'].str.contains('Example:')]
print (df2)
colA colB colC
0 Example: s as 2
1 dd aaa 3
Or remove second level:
df.columns = df.columns.droplevel(1)
df2 = df[~df['colB'].str.contains('Example:')]
print (df2)
colA colB colC
0 Example: s as 2
1 dd aaa 3
Try this:
df[[~df.iloc[i,:].str.contains('String_to_match').any() for i in range(0,len(df))]]