You have to access the str attribute per http://pandas.pydata.org/pandas-docs/stable/text.html
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '')
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '')
df1['Avg_Annual'] = df1['Avg_Annual'].astype(int)
alternately;
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '').str.replace('$', '').astype(int)
if you want to prioritize time spent typing over readability.
Answer from mechanical_meat on Stack OverflowYou have to access the str attribute per http://pandas.pydata.org/pandas-docs/stable/text.html
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '')
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '')
df1['Avg_Annual'] = df1['Avg_Annual'].astype(int)
alternately;
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '').str.replace('$', '').astype(int)
if you want to prioritize time spent typing over readability.
Shamelessly stolen from this answer... but, that answer is only about changing one character and doesn't complete the coolness: since it takes a dictionary, you can replace any number of characters at once, as well as in any number of columns.
# if you want to operate on multiple columns, put them in a list like so:
cols = ['col1', 'col2', ..., 'colN']
# pass them to df.replace(), specifying each char and it's replacement:
df[cols] = df[cols].replace({'\$': '', ',': ''}, regex=True)
@shivsn caught that you need to use regex=True; you already knew about replace (but also didn't show trying to use it on multiple columns or both the dollar sign and comma simultaneously).
This answer is simply spelling out the details I found from others in one place for those like me (e.g. noobs to python an pandas). Hope it's helpful.
python - How to remove commas from ALL the column in pandas at once - Stack Overflow
python - How can I remove the thousand comma separator when converting data frame columns? - Stack Overflow
Thousands Separator in a Number Column of Data Editor
How do I remove commas from data frame column - Pandas
Pandas has a built in replace method for "object" columns.
df["column"] = df["column"].str.replace(",","").astype(float)Alternatively check out the pandas.to_numeric() function- I think this should work.
df["column"] = pd.to_numeric(df["column"])
You can also pass arguments for error handling with the pd.to_numeric() function. See the pandas documentation on it.
More on reddit.comNumeric columns have no ,, so converting to strings is not necessary, only use DataFrame.replace with regex=True for substrings replacement:
df = df.replace(',','', regex=True)
Or:
df.replace(',','', regex=True, inplace=True)
And last convert strings columns to numeric, thank you @anki_91:
c = df.select_dtypes(object).columns
df[c] = df[c].apply(pd.to_numeric,errors='coerce')
Well, you can simplely do:
df = df.apply(lambda x: x.str.replace(',', ''))
Hope it helps!
There is an argument in Pandas DataFrame as pd.read_csv(thousands=',') which is set to None by default.
data = """
State City Population Poverty_Rate Median_Age
VA XYZ 500,00 10.5% 42
MD ABC 12,345 8.9% .
NY . 987,654 . 41"""
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(data),sep='\s+',thousands=',')
print(df)
State City Population Poverty_Rate Median_Age
0 VA XYZ 50000 10.5% 42
1 MD ABC 12345 8.9% .
2 NY . 987654 . 41
Ideally, what you need to do is replace the string markers and then coerce your string columns into integers/floats.
#using your dict.
int_cols = ({"Population": int, "Poverty_Rate": float, "Median_Age": int })
for col in int_cols.keys():
df[col] = pd.to_numeric(df[col].astype(str).str.replace('%',''),errors='coerce')
print(df.dtypes)
State object
City object
Population int64
Poverty_Rate float64
Median_Age float64
dtype: object
print(df)
State City Population Poverty_Rate Median_Age
0 VA XYZ 50000 10.5 42.0
1 MD ABC 12345 8.9 NaN
2 NY . 987654 NaN 41.0
Could you try the following? First do a str.replace on the column before you cast it to an integer?
import pandas as pd
df = pd.DataFrame([
{'value': '123,445'},
{'value': '143,445,788'}
])
df['value'] = df['value'].str.replace(',', '').astype(int)
I have a csv file with a "Prices" column. Right now entries look like 1,000 or 12,456. I could probably remove them in Excel and re-save but I want to know how I can transform the column to remove non-numeric characters so 'objects' like $1,299.99 will become 'float' 1299.99. Thanks
Pandas has a built in replace method for "object" columns.
df["column"] = df["column"].str.replace(",","").astype(float)
Alternatively check out the pandas.to_numeric() function- I think this should work.
df["column"] = pd.to_numeric(df["column"])
You can also pass arguments for error handling with the pd.to_numeric() function. See the pandas documentation on it.
First, make a function that can convert a single string element to a float:
valid = '1234567890.' #valid characters for a float
def sanitize(data):
return float(''.join(filter(lambda char: char in valid, data)))
Then use the apply method to apply that function to every entry in the column. Reassign to the same column if you want to overwrite your old data.
df['column'] = df['column'].apply(sanitize)
I have some data where some of the columns have "." as a thousand separator.
I have named the data frame 'testpos'. I have already tried using the gsub-function, but it returns NA-values for every observation
testpos$Tested <- as.numeric(gsub(".","",testpos$Tested))Does anyone have a better way to do this, or know what I do wrong?
Thanks in advance.
Notice it will convert your float type to object
df.DollarAmount.apply(lambda x : "{:,}".format(x))
Out[509]:
0 5,721.48
1 4,000.0
2 4,769.0
3 824.07
4 643.6
5 620.0
Name: DollarAmount, dtype: object
This is a more pandorable way to get the thousands separator.
df['Dollar Amount']=df['Dollar Amount'].apply('{:,}'.format)
Hey guys, I have a csv file with any number greater than 999 being listed as a string in the form “1,000”. including the quotes. I’m trying to get rid of these commas so I can turn them into an integer, however I’m unsure how to do it without touching the other commas used to seperate the values. Any suggestions? So far I have thought this out but it’s not quite right.
‘’’ import pandas as pd df = pd.read_csv(‘..., sep = “, “)
firstline = True
if firstline: firstline = False else: for line in df: if “,” in line[3]: #the column with the values line[3].replace(“,”, “ “)
‘’’
Sorry for the formatting I am on phone. Thanks for the help :)