Using df.apply
pd.Series.str.replace is a series method not for dataframes. You can use apply on each column (series) instead.
dataFrame[headers] = dataFrame[headers].apply(lambda x: x.str.replace(',', ''))
Another option is to use apply on each row (series) with axis=1.
Using df.applymap
Or, you can use applymap and treat each cell as a string and use replace directly on them.
dataFrame[headers] = dataFrame[headers].applymap(lambda x: x.replace(',', ''))
Using df.replace
You can also use df.replace which is a method available to replace values in df directly across all columns selected. But, for this purpose you will have to set regex=True.
dataFrame[headers] = dataFrame[headers].replace(',', '', regex=True)
Answer from Akshay Sehgal on Stack OverflowHello
I get an error ‘int’ object has no attribute ‘replace’ in line of code
day = column.replace("-", "")Why is that? Both rows and columns have been assigned as “” strings. Why it says they’re int?
python - Trying to change pandas column dtype from str to float - Data Science Stack Exchange
python - AttributeError: 'DataFrame' object has no attribute - Stack Overflow
python - Trying to use replace method with pandas - Stack Overflow
python - How to resolve AttributeError: 'DataFrame' object has no attribute - Stack Overflow
Videos
In pandas the object type is used when there is not a clear distinction between the types stored in the column.
So, I guess that in your column, some objects are float type and some objects are str type. Or maybe, you are also dealing with NaN objects, NaN objects are float objects.
a) Convert the column to string: Are you getting your DataFrame from a CSV or XLS format file? Then at the moment of reading the file, you can specify that that column is an str type or just make the type conversion of the column you are dealing with.
b) After that, you can apply the string changes and/or deal with the NaN objects.
c) Finally, you transform your column into float type`.
Maybe it's a very rudimentary method but I would just do
listt = []
for i in data['column_name']:
listt.append(float(i))
data['FloatData'] = listt
value_counts is a Series method rather than a DataFrame method (and you are trying to use it on a DataFrame, clean). You need to perform this on a specific column:
clean[column_name].value_counts()
It doesn't usually make sense to perform value_counts on a DataFrame, though I suppose you could apply it to every entry by flattening the underlying values array:
pd.value_counts(df.values.flatten())
To get all the counts for all the columns in a dataframe, it's just df.count()
Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)
What you are trying to do is not practical - in any language.
First, as others have said, there is no replace() method on a file object.
More importantly you don't seem to understand how text files work. You say: "I saw some answers when they create a new file everytime". Did you consider why they do that?
"I just need to add +1 to an specific line"
Let's take a couple of lines in a text file that you think looks like this:
onetwomumble9
threefourmumble3
But a text file is nothing special, it just goes from the first byte to the last with newlines showing where lines end. It really looks like this:
onetwomumble9\nthreefourmumble3\n
Where \n represents the single newline character.
Let's say that you did replace 9 with 10 and 3 with 4. The problem is that 10 is wider than 9 and the extra character will overwrite the newline (or any other character) which follows. So you end-up with this:
onetwomumble10threefourmumble4\n
You lost the newline!
That's why you have to copy the file to update it! The alternative is to use a database (like SQLite) which handles these issues.
There are a few work-arounds. The simplest is to decide on an absolute maximum number of characters, and pad with zeros, for example 0009 gets updated to 0010.
Let's say the file looks like this:
onetwomumble0009
threefourmumble0005
sixsevenmumble0067
eightnonemumber0042
For reading and writing to the same file, you have to maintain the file position yourself. When you read a line it puts the current position to the next line, so to rewrite a line you have to move it back. Example code:
import re
import sys
# Could be done as a one-line lambda, but would be difficult to read
def addit(m):
num = m.groups()[0]
new_num = "%04d" % (int(num) + 1)
return new_num
line_number = 1
with open('gash.txt', 'r+') as uf:
while True:
start_pos = uf.tell() # Get start of line position
line = uf.readline()
if not line : break
end_pos = uf.tell() # Get end of line position - needed if updating more than one line
# Let's say we update line 2, and we decided on 4 chars
if line_number == 2:
# Do the add
newline = re.sub(r'(\d{4})$', addit, line)
# Sanity check
if len(newline) != len(line):
print("line length changed!", line, file=sys.stderr)
sys.exit(1)
# Seek to start of line just read and write
uf.seek(start_pos)
uf.write(newline)
# Seek to start of next line
uf.seek(end_pos)
# If we only wanted to update one line, we can:
break
line_number += 1
Obviously a _io.TextIOWrapper i.e. a file object will not have a replace method which is a string function. I think you're trying to make replacements in the entire file. Here is a quick fix.
file_data = ''.join(lines)
# Now apply your replacements here
file_data = file_data.replace(str(x),str(y))
wine = pd.read_csv("combined.csv", header=0).iloc[:-1]
df = pd.DataFrame(wine)
df
dataset = pd.DataFrame(df.data, columns =df.feature_names)
dataset['target']=df.target
datasetERROR:
<ipython-input-27-64122078da92> in <module>
----> 1 dataset = pd.DataFrame(df.data, columns =df.feature_names)
2 dataset['target']=df.target
3 dataset
D:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5464 return self[name]
-> 5465 return object.__getattribute__(self, name)
5466
5467 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'data'I'm trying to set up a target to proceed with my Multi Linear Regression Project, but I can't even do that. I've already downloaded the CSV file and have it uploaded on a Jupyter Notebook. What I'm I doing wrong?
I don't understand... This exact code works for this same dataset look for different "X" variables, but now it is giving me this error...?
My code:
Import os Import pandas as pd
Os.chdir('file locations')
df = pd.read_csv('reading the file.csv') df = df.columns.str.replace(' ', '_')
def alphabet (row): If any(x in ['A', 'B', 'C'] for x in [row['alphanumeric_1'], row['alphanumeric_2]]): return 1 else: return 0
df['alphabet_yn] = df.apply(lambda row: alphabet (row), axis =1)
I don't understand why I am getting the AttributeError on the final line.
I am on my phone, so I apologize for how hard this might be to understand.
You tried to use .Values with a capital v instead of .values. Changing the capital v to a lowercase v should solve fix the error you're getting.
Pandas dataframe object does not have any var as Values, rather it is values with small "v". For further reference you can refer pandas documentation -
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html
I am in university and am taking a special topics class regarding AI. I have zero knowledge about Python, how it works, or what anything means.
A project for the class involves manipulating Bayesian networks to predict how many and which individuals die upon the sinking of a ship. This is the code I am supposed to manipulate:
##EDIT VARIABLES TO THE VARIABLES OF INTEREST
train_var = train.loc[:,['Survived','Sex']]
test_var = test.loc[:,['Sex']]
BayesNet = BayesianModel([('Sex','Survived')])I am supposed to add another variable, 'Pclass,' to the mix, paying attention to the order for causation. I have added that variable to every line of this code in every way imaginable and consistently get an error from this line:
predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
predictionsFor example, the error I get for this version of the code:
train_var = train.loc[:,['Survived','Pclass','Sex']]
test_var = test.loc[:,['Pclass']]
BayesNet = BayesianModel([('Sex','Pclass','Survived')])is this:
AttributeError Traceback (most recent call last)
<ipython-input-98-16d9eb9451f7> in <module>
----> 1 predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
2 predictions
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'Survived'Honestly, I have no idea wtf any of this means. I have tried googling this issue and have come up with nothing.
Any help would be greatly appreciated. I know it's a lot.
Double check if there's a space in the column name. 'Survived ' vs 'Survived' It happens more often than you'd think especially with CSV data source.
It's an issue with how you're calling the data and if it's actually there.
train.loc[:,['Survived','Sex']]
tells me that there's a DataFrame (which is from pandas, hence the error) called train and this line is trying to access parts of that dataframe (it's just a type of an array). Specifically, it's trying to access columns named Survived and Sex.
Similarly, this line tells me there's another dataframe (df) known as test with a column named Sex and this is access that data.
test.loc[:,['Sex']]
The error code also informs me of some things
predictions = pandas.DataFrame({'PassengerId': test.PassengerId,'Survived': hypothesis.Survived.tolist()})
There's another df called predictions that's of dict type which is trying to access information from the another hypothesis df. The attribute it's tryin to access in the second key of the dict is
hypothesis.Survived.tolist()
which is a way of calling a column from that df. That is, when the predictions line is executed, it's trying to pull all the values from the Survived column of the hypothesis df.
The error is that the df doesn't actually have a column named Survived. So either there's missing data, or you're calling it wrong, or there's a missing reference.
Without knowing more about your code and your question, I can't really extrapolate much more.