pandas remove thousands separator

Trying to remove commas and dollars signs with Pandas in Python

stackoverflow.com › questions › 38516481 › trying-to-remove-commas-and-dollars-signs-with-pandas-in-python

You have to access the str attribute per http://pandas.pydata.org/pandas-docs/stable/text.html

df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '')
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '')
df1['Avg_Annual'] = df1['Avg_Annual'].astype(int)

alternately;

df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '').str.replace('$', '').astype(int)

if you want to prioritize time spent typing over readability.

Answer from mechanical_meat on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 38516481 › trying-to-remove-commas-and-dollars-signs-with-pandas-in-python

Trying to remove commas and dollars signs with Pandas in Python - Stack Overflow

Top answer

1 of 6

124

You have to access the str attribute per http://pandas.pydata.org/pandas-docs/stable/text.html

df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '')
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '')
df1['Avg_Annual'] = df1['Avg_Annual'].astype(int)

alternately;

df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '').str.replace('$', '').astype(int)

if you want to prioritize time spent typing over readability.

2 of 6

Shamelessly stolen from this answer... but, that answer is only about changing one character and doesn't complete the coolness: since it takes a dictionary, you can replace any number of characters at once, as well as in any number of columns.

# if you want to operate on multiple columns, put them in a list like so:
cols = ['col1', 'col2', ..., 'colN']

# pass them to df.replace(), specifying each char and it's replacement:
df[cols] = df[cols].replace({'\$': '', ',': ''}, regex=True)

@shivsn caught that you need to use regex=True; you already knew about replace (but also didn't show trying to use it on multiple columns or both the dollar sign and comma simultaneously).

This answer is simply spelling out the details I found from others in one place for those like me (e.g. noobs to python an pandas). Hope it's helpful.

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.io.formats.style.Styler.format.html

pandas.io.formats.style.Styler.format — pandas documentation

Styler.format(formatter=None, subset=None, na_rep=None, precision=None, decimal='.', thousands=None, escape=None, hyperlinks=None)[source]#

Discussions

python - How to remove commas from ALL the column in pandas at once - Stack Overflow

Communities for your favorite technologies. Explore all Collectives · Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work More on stackoverflow.com

stackoverflow.com

python - How can I remove the thousand comma separator when converting data frame columns? - Stack Overflow

State,City,Population,Poverty_... ... import pandas as pd df = pd.read_csv("/path... /sample_data") ... I suspect the comma separator is causing this problem. How can I remove those from my dataset? ... Yes, when you read in the csv, use the thousands paramete... More on stackoverflow.com

stackoverflow.com

Thousands Separator in a Number Column of Data Editor

Summary How to show thousands separator in a Number Column inside the Data Editor component introduced in streamlit 1.23? I tried to set the number format to “%,.2f” as it is recognized by the defautl pandas DataFrame Styler, but this format results in erro when used in the Data Editor. More on discuss.streamlit.io

discuss.streamlit.io

June 14, 2023

How do I remove commas from data frame column - Pandas

Pandas has a built in replace method for "object" columns.

df["column"] = df["column"].str.replace(",","").astype(float)

Alternatively check out the pandas.to_numeric() function- I think this should work.

df["column"] = pd.to_numeric(df["column"])

You can also pass arguments for error handling with the pd.to_numeric() function. See the pandas documentation on it.

Top answer

1 of 3

Numeric columns have no ,, so converting to strings is not necessary, only use DataFrame.replace with regex=True for substrings replacement:

df = df.replace(',','', regex=True)

Or:

df.replace(',','', regex=True, inplace=True)

And last convert strings columns to numeric, thank you @anki_91:

c = df.select_dtypes(object).columns
df[c] = df[c].apply(pd.to_numeric,errors='coerce')

2 of 3

Well, you can simplely do:

df = df.apply(lambda x: x.str.replace(',', ''))

Hope it helps!

Practical Business Python

pbpython.com › currency-cleanup.html

Cleaning Up Currency Data with Pandas - Practical Business Python

That’s why the numeric values get converted to NaN . The solution is to check if the value is a string, then try to clean it up. Otherwise, avoid calling string functions on a number.

Stack Overflow

stackoverflow.com › questions › 61786815 › how-can-i-remove-the-thousand-comma-separator-when-converting-data-frame-columns

python - How can I remove the thousand comma separator when converting data frame columns? - Stack Overflow

Top answer

1 of 2

There is an argument in Pandas DataFrame as pd.read_csv(thousands=',') which is set to None by default.

data = """
State   City    Population Poverty_Rate  Median_Age
VA      XYZ     500,00          10.5%         42
MD      ABC     12,345      8.9%          .
NY      .       987,654     .             41"""

from io import StringIO
import pandas as pd

df = pd.read_csv(StringIO(data),sep='\s+',thousands=',')

print(df)

  State City  Population Poverty_Rate Median_Age
0    VA  XYZ       50000        10.5%         42
1    MD  ABC       12345         8.9%          .
2    NY    .      987654            .         41

Ideally, what you need to do is replace the string markers and then coerce your string columns into integers/floats.

#using your dict.
int_cols = ({"Population": int, "Poverty_Rate": float, "Median_Age": int })

for col in int_cols.keys():
    df[col] = pd.to_numeric(df[col].astype(str).str.replace('%',''),errors='coerce')

print(df.dtypes)

State            object
City             object
Population        int64
Poverty_Rate    float64
Median_Age      float64
dtype: object


print(df)

  State City  Population  Poverty_Rate  Median_Age
0    VA  XYZ       50000          10.5        42.0
1    MD  ABC       12345           8.9         NaN
2    NY    .      987654           NaN        41.0

2 of 2

Could you try the following? First do a str.replace on the column before you cast it to an integer?

import pandas as pd

df = pd.DataFrame([
    {'value': '123,445'},
    {'value': '143,445,788'}
])
df['value'] = df['value'].str.replace(',', '').astype(int)

Pandas

pandas.pydata.org › docs › reference › api › pandas.io.formats.style.Styler.format.html

pandas.io.formats.style.Styler.format — pandas 3.0.1 documentation

Styler.format(formatter=None, subset=None, na_rep=None, precision=None, decimal='.', thousands=None, escape=None, hyperlinks=None)[source]#

Streamlit

discuss.streamlit.io › using streamlit

Thousands Separator in a Number Column of Data Editor - Using Streamlit - Streamlit

June 14, 2023 - Summary How to show thousands separator in a Number Column inside the Data Editor component introduced in streamlit 1.23? I tried to set the number format to “%,.2f” as it is recognized by the defautl pandas DataFrame S…

Find elsewhere

Google Bing Mojeek

IncludeHelp

includehelp.com › python › format-a-number-with-commas-to-separate-thousands-in-pandas.aspx

Python - Format a number with commas to separate thousands in pandas

DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data. Suppose, we have a large DataFrame with a column named X. This column has a field of large numbers (in thousands or lakhs). We need to format these numbers by putting commas in between the digits for proper data analysis. To format a number with commas to separate thousands, you can use pd.options.display method which contains a feature called float_format which will allow us to format these numbers in such a way that they can be separated with commas.

Mark Needham

markhneedham.com › blog › 2021 › 04 › 11 › pandas-format-dataframe-numbers-commas-decimals

Pandas - Format DataFrame numbers with commas and control decimal places | Mark Needham

April 11, 2021 - This works, but we’ve lost the LTLA Name column and the Population column isn’t formatted how we’d like. Instead of passing a single style to style.format, we can instead pass a dictionary of {"column: "style"}. So to style Population with a comma as thousands separator and PercentageVaccinated with two decimal places, we can do the following:

Pandas

pandas.pydata.org › docs › reference › api › pandas.read_csv.html

pandas.read_csv — pandas 3.0.1 documentation - PyData |

True, False, and NA values, and thousands separators have defaults, but can be explicitly specified, too.

reddit.com › r/learnpython › how do i remove commas from data frame column - pandas

r/learnpython on Reddit: How do I remove commas from data frame column - Pandas

October 28, 2015 -

I have a csv file with a "Prices" column. Right now entries look like 1,000 or 12,456. I could probably remove them in Excel and re-save but I want to know how I can transform the column to remove non-numeric characters so 'objects' like $1,299.99 will become 'float' 1299.99. Thanks

Top answer

1 of 2

Pandas has a built in replace method for "object" columns.

df["column"] = df["column"].str.replace(",","").astype(float)

Alternatively check out the pandas.to_numeric() function- I think this should work.

df["column"] = pd.to_numeric(df["column"])

You can also pass arguments for error handling with the pd.to_numeric() function. See the pandas documentation on it.

2 of 2

First, make a function that can convert a single string element to a float:

valid = '1234567890.' #valid characters for a float
def sanitize(data):
    return float(''.join(filter(lambda char: char in valid, data)))

Then use the apply method to apply that function to every entry in the column. Reassign to the same column if you want to overwrite your old data.

df['column'] = df['column'].apply(sanitize)

reddit.com › r/rstats › how to remove "." as thousand separator in data frame?

r/rstats on Reddit: How to remove "." as thousand separator in data frame?

February 23, 2021 -

I have some data where some of the columns have "." as a thousand separator.

I have named the data frame 'testpos'. I have already tried using the gsub-function, but it returns NA-values for every observation

testpos$Tested <- as.numeric(gsub(".","",testpos$Tested))

Does anyone have a better way to do this, or know what I do wrong?

Thanks in advance.

Top answer

1 of 3

By default gsub uses regular expression matching (that’s a powerful technique and you should definitely learn the basics of it). And the regular expression . matches any character, so what you’re doing is replace every character in the entire input string with the empty string. To fix this, you can either escape the . character to match a literal .; this is done via \\ (the backslash is the escape character for the regular expression language, but also for strings in R in general ; that’s why you need to double it up: one backslash to escape the . in the regular expression, and a second backslash to stop R from treating your single backslash as a string escape). Alternatively, you can tell tell gsub to perform direct text matching instead of regular expression matching, by specifying the fixed = TRUE argument: testpos$Tested <- as.numeric(gsub(".", "", testpos$Tested, fixed = TRUE))

2 of 3

testpos$Tested <- as.numeric(gsub("[\\.]","",testpos$Tested))

Stack Overflow

stackoverflow.com › questions › 66653909 › how-to-delete-1000-separator-from-csv-file-columns-when-using-pandas

python - How to delete 1000 separator from CSV file columns when using Pandas? - Stack Overflow

Top answer

1 of 1

The pandas read_csv method has a "thousands" optional parameter, which you could use to indicate what this thousands separator is.

Hence, you can use something like

df = pd.read_csv('newdata.csv', sep = ";", thousands = ' ')

to handle your data input.

Stack Overflow

stackoverflow.com › questions › 47404472 › how-to-insert-a-comma-as-a-thousands-separator-in-a-pandas-dataframe-column

excel - How to insert a comma as a thousands separator in a pandas dataframe column? - Stack Overflow

Top answer

1 of 6

Notice it will convert your float type to object

df.DollarAmount.apply(lambda x : "{:,}".format(x))
Out[509]: 
0    5,721.48
1     4,000.0
2     4,769.0
3      824.07
4       643.6
5       620.0
Name: DollarAmount, dtype: object

2 of 6

This is a more pandorable way to get the thousands separator.

df['Dollar Amount']=df['Dollar Amount'].apply('{:,}'.format)

Stack Overflow

stackoverflow.com › questions › 54941642 › string-with-space-as-thousand-separator-to-float-python-pandas-can-not-replace

string with space as thousand separator to float Python Pandas can not replace ' ' with '' - Stack Overflow

March 1, 2019 - I am trying to remove space as thousands separator from pandas series. print(newframe['ALV 0 %'].head()) newframe['ALV 0 %'] = newframe['ALV 0 %'].str.replace(',','.') newframe['ALV 0 %'] = newfra...

GitHub

github.com › pandas-dev › pandas › issues › 30045

Thousands separator for to_csv · Issue #30045 · pandas-dev/pandas

December 4, 2019 - Pandas exposes a thousands optional parameter to read_csv used to specify a custom thousands separator, so that 1,000 or 1_000 can be successfully parsed to a numeral in the resulting DataFrame. Un...

Author ghisvail

reddit.com › r/learnpython › removing number comma seperators in csv file

r/learnpython on Reddit: Removing number comma seperators in csv file

November 22, 2020 -

Hey guys, I have a csv file with any number greater than 999 being listed as a string in the form “1,000”. including the quotes. I’m trying to get rid of these commas so I can turn them into an integer, however I’m unsure how to do it without touching the other commas used to seperate the values. Any suggestions? So far I have thought this out but it’s not quite right.

‘’’ import pandas as pd df = pd.read_csv(‘..., sep = “, “)

firstline = True

if firstline: firstline = False else: for line in df: if “,” in line[3]: #the column with the values line[3].replace(“,”, “ “)

‘’’

Sorry for the formatting I am on phone. Thanks for the help :)

Top answer

1 of 2

Pandas will recognize the comma as thousands separators; there is a optional argument for that, use thousands=",": >>> inventory_mock_file_contents = """"Part No","Date","Value date","Account","Description","Amount","Quantity" ... "123,456","05/12/2019","05/12/2019","12,345","Payment 04/12/2019 21:23 to:","6,00","56" ... "123,456","05/11/2019","05/12/2019","5,536","Payment 04/11/2019 21:51 to:","10,00","677" ... "123,458","05/10/2019","05/12/2019","100","Payment 04/10/2019 22:55 to:","16,00","2" ... """ >>> df = pd.read_csv(io.StringIO(inventory_mock_file_contents), thousands=',') >>> df Part No Date ... Amount Quantity 0 123456 05/12/2019 ... 600 56 1 123456 05/11/2019 ... 1000 677 2 123458 05/10/2019 ... 1600 2 [3 rows x 7 columns] >>> df.dtypes Part No int64 Date object Value date object Account int64 Description object Amount int64 Quantity int64 dtype: object >>>

2 of 2

If you are iterating row by row in a pandas dataframe you are likely doing it the slow way. Checkout this SO for a few better options.

Linux find Examples

queirozf.com › entries › pandas-display-options-examples-and-reference

Pandas Display Options: Examples and Reference

August 22, 2023 - Use pandas.set_option('display.float_format', lambda x: '{:,.2f}' % x) to use comma separators and 2 decimal places · import pandas as pd pd.set_option('display.float_format', lambda x: '{:,.2f}'.format(x)) pd.DataFrame({ "num":[100000, 100,100,200.50], "str":["foo","bar","baz","quux"] }) BEFORE: default display options · AFTER: using commas as thousands separator and 2 decimal places ·

Iifx

iifx.dev › en › articles › 457660460

python - Pandas Integer Formatting: A Guide to Thousands Separators

Here's how you can do that using the Styler object, which is built into pandas for this exact purpose