Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
Answer from Merlin on Stack OverflowCheck your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
Hello
I get an error ‘int’ object has no attribute ‘replace’ in line of code
day = column.replace("-", "")Why is that? Both rows and columns have been assigned as “” strings. Why it says they’re int?
wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
datasetERROR:
<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
2 dataset['target']=df.target
3 dataset
D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5464 return self[name]
-> 5465 return object.__getattribute__(self, name)
5466
5467 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'data'I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?
As of pandas 2.0, append (previously deprecated) was removed.
You need to use concat instead (for most applications):
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
As noted by @cottontail, it's also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex:
df.loc[len(df)] = new_row # only use with a RangeIndex!
Why was it removed?
We frequently see new users of pandas try to code like they would do it in pure Python. They use iterrows to access items in a loop (see here why you shouldn't), or append in a way that is similar to python list.append.
However, as noted in pandas' issue #35407, pandas's append and list.append are really not the same thing. list.append is in place, while pandas's append creates a new DataFrame:
I think that we should deprecate Series.append and DataFrame.append. They're making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.
These are also apparently popular methods. DataFrame.append is around the 10th most visited page in our API docs.
Unless I'm mistaken, users are always better off building up a list of values and passing them to the constructor, or building up a list of NDFrames followed by a single concat.
As a consequence, while list.append is amortized O(1) at each step of the loop, pandas' append is O(n), making it inefficient when repeated insertion is performed.
What if I need to repeat the process?
Using append or concat repeatedly is not a good idea (this has a quadratic behavior as it creates a new DataFrame for each step).
In such case, the new items should be collected in a list, and at the end of the loop converted to DataFrame and eventually concatenated to the original DataFrame.
lst = []
for new_row in items_generation_logic:
lst.append(new_row)
# create extension
df_extended = pd.DataFrame(lst, columns=['A', 'B', 'C'])
# or columns=df.columns if identical columns
# concatenate to original
out = pd.concat([df, df_extended])
Disclaimer: this answer seems to attract popularity, but the proposed approach should not be used. append was not changed to _append, _append is a private internal method and append was removed from pandas API. The claim "The append method in pandas look similar to list.append in Python. That's why append method in pandas is now modified to _append." is utterly incorrect. The leading _ only means one thing: the method is private and is not intended to be used outside of pandas' internal code.
In the new version of Pandas, the append method is changed to _append. You can simply use _append instead of append, i.e., df._append(df2).
df = df1._append(df2,ignore_index=True)
Why is it changed?
The append method in pandas looks similar to list.append in Python. That's why the append method in pandas is now modified to _append.
I don't understand... This exact code works for this same dataset look for different "X" variables, but now it is giving me this error...?
My code:
Import os Import pandas as pd
Os.chdir('file locations')
df = pd.read_csv('reading the file.csv') df = df.columns.str.replace(' ', '_')
def alphabet (row): If any(x in ['A', 'B', 'C'] for x in [row['alphanumeric_1'], row['alphanumeric_2]]): return 1 else: return 0
df['alphabet_yn] = df.apply(lambda row: alphabet (row), axis =1)
I don't understand why I am getting the AttributeError on the final line.
I am on my phone, so I apologize for how hard this might be to understand.
I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.
A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:
##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]
test_var = test.loc[:,['Sex']]
BayesNet = BayesianModel([('Sex','Survived')])I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:
predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictionsFor example, the error I get for this version of the code:
train_var = train.loc[:,['Survived','Pclass','Sex']]
test_var = test.loc[:,['Pclass']]
BayesNet = BayesianModel([('Sex','Pclass','Survived')])is this:
AttributeError Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
2 predictions
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'Survived'Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.
Any help would be greatly appreciated. I know it's a lot.
Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.
It's an issue with how you're calling the data and if it's actually there.
train.loc[:,['Survived','Sex']]
tells me that there's a DataFrame (which is from pandas, hence the error) called train and this line is trying to access parts of that dataframe (it's just a type of an array). Specifically, it's trying to access columns named Survived and Sex.
Similarly, this line tells me there's another dataframe (df) known as test with a column named Sex and this is access that data.
test.loc[:,['Sex']]
The error code also informs me of some things
predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
There's another df called predictions that's of dict type which is trying to access information from the another hypothesis df. The attribute it's tryin to access in the second key of the dict is
hypothesis.Survived.tolist()
which is a way of calling a column from that df. That is, when the predictions line is executed, it's trying to pull all the values from the Survived column of the hypothesis df.
The error is that the df doesn't actually have a column named Survived. So either there's missing data, or you're calling it wrong, or there's a missing reference.
Without knowing more about your code and your question, I can't really extrapolate much more.
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)