If you're reading in from csv then you can use the thousands arg:
df.read_csv('foo.tsv', sep='\t', thousands=',')
This method is likely to be more efficient than performing the operation as a separate step.
You need to set the locale first:
In [ 9]: import locale
In [10]: from locale import atof
In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'
In [12]: df.applymap(atof)
Out[12]:
0 1
0 1200 4200.00
1 7000 -0.03
2 5 0.00
Answer from Andy Hayden on Stack OverflowIf you're reading in from csv then you can use the thousands arg:
df.read_csv('foo.tsv', sep='\t', thousands=',')
This method is likely to be more efficient than performing the operation as a separate step.
You need to set the locale first:
In [ 9]: import locale
In [10]: from locale import atof
In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'
In [12]: df.applymap(atof)
Out[12]:
0 1
0 1200 4200.00
1 7000 -0.03
2 5 0.00
You can convert one column at a time like this :
df['colname'] = df['colname'].str.replace(',', '').astype(float)
python - Error Converting Pandas Dataframe to float with comma - Stack Overflow
How do I remove commas from data frame column - Pandas
Pandas has a built in replace method for "object" columns.
df["column"] = df["column"].str.replace(",","").astype(float)Alternatively check out the pandas.to_numeric() function- I think this should work.
df["column"] = pd.to_numeric(df["column"])
You can also pass arguments for error handling with the pd.to_numeric() function. See the pandas documentation on it.
More on reddit.compython - How to Convert String Numeric with Comma into Float in Pandas Data Frame - Stack Overflow
Python pandas - can't convert string to float (I think b/c of multiple data types in column...)
Use the argument thousands.
pd.read_csv('exampleData.csv', thousands=',')
John's solution won't work for numbers with multiple commas, like 1,384,496.
A more scalable solution would be to just do
data = data.replace({",":""}, regex=True)
Then convert the strings to numeric.
I have a csv file with a "Prices" column. Right now entries look like 1,000 or 12,456. I could probably remove them in Excel and re-save but I want to know how I can transform the column to remove non-numeric characters so 'objects' like $1,299.99 will become 'float' 1299.99. Thanks
Pandas has a built in replace method for "object" columns.
df["column"] = df["column"].str.replace(",","").astype(float)
Alternatively check out the pandas.to_numeric() function- I think this should work.
df["column"] = pd.to_numeric(df["column"])
You can also pass arguments for error handling with the pd.to_numeric() function. See the pandas documentation on it.
First, make a function that can convert a single string element to a float:
valid = '1234567890.' #valid characters for a float
def sanitize(data):
return float(''.join(filter(lambda char: char in valid, data)))
Then use the apply method to apply that function to every entry in the column. Reassign to the same column if you want to overwrite your old data.
df['column'] = df['column'].apply(sanitize)
Using replace and pd.to_numeric
s=pd.DataFrame({'val':['36,000','36,000','36,000','36,000','36,000']})
s.replace({',':''},regex=True).apply(pd.to_numeric,1)
Out[76]:
val
0 36000
1 36000
2 36000
3 36000
4 36000
Based on Wen's answer, for a dataframe, you can use:
df.apply(lambda x: x.str.replace(',','').apply(pd.to_numeric))