Pass param thousands=',' to read_csv to read those values as thousands:
In [27]:
import pandas as pd
import io
t="""id;value
0;123,123
1;221,323,330
2;32,001"""
pd.read_csv(io.StringIO(t), thousands=r',', sep=';')
Out[27]:
id value
0 0 123123
1 1 221323330
2 2 32001
Answer from EdChum on Stack OverflowPass param thousands=',' to read_csv to read those values as thousands:
In [27]:
import pandas as pd
import io
t="""id;value
0;123,123
1;221,323,330
2;32,001"""
pd.read_csv(io.StringIO(t), thousands=r',', sep=';')
Out[27]:
id value
0 0 123123
1 1 221323330
2 2 32001
The answer to this question should be short:
df=pd.read_csv('filename.csv', thousands=',')
Thousands separator for to_csv
Option for thousands separator in read_csv()
Pandas df: How to add thousand separators to a column?
python pandas read_csv thousands separator does not work - Stack Overflow
Two points here. I've 'pd.read_csv'ed a CSV file which has three columns.
I've used the following in order to extract the data and add headings to the columns (as currently the data is just naked)
Column 1 & 3 are text, and column 2 is a number.
How can I output the number with separators? EG 1,000,000 rather than 1000000
Also, what's the best way for formatting this dataframe to be included in an email body?
It seems there is problem with quoting, because if separator is , and thousands is , too, some quoting has to be in csv:
import pandas as pd
from pandas.compat import StringIO
import csv
temp=u"""'a','Base Amount'
'11','79,026,695.50'"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
dtype = { 'Base Amount' : 'float64' },
thousands = ',' ,
quotechar = "'",
quoting = csv.QUOTE_ALL,
decimal = '.',
encoding = 'ISO-8859-1')
print (df)
a Base Amount
0 11 79026695.5
temp=u'''"a","Base Amount"
"11","79,026,695.50"'''
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
dtype = { 'Base Amount' : 'float64' },
thousands = ',' ,
quotechar = '"',
quoting = csv.QUOTE_ALL,
decimal = '.',
encoding = 'ISO-8859-1')
print (df)
a Base Amount
0 11 79026695.5
First of all you get rid of the comma: Example:
num = '79,026,695.50'
print(num)
# '79,026,695.50'
num = num.replace(',', '')
print(num)
79026695.50
num = float(num)
in case:
rawdata['base_amount'] = rawdata['base_amount'].str.replace(',', '').astype(np.float64)
Use thousands parameter.
df = pd.read_csv("file.csv", parse_dates=['Date'], thousands=',')
Use converters parameter if you have special format.
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)
Output:
>>> df
Date Number
0 2021-12-30 2345.55
>>> df.dtypes
Date datetime64[ns]
Number float64
dtype: object
# data.csv
Date,Number
"Dec 30, 2021","2,345.55"
Else use standard parameters:
df = pd.read_csv("data.csv", header=None, parse_dates=[0], thousands=',', quoting=1)
Output:
>>> df
0 1 2 3 4
0 2021-12-30 1234.11 1654.22 11876.23 1676234
>>> df.dtypes
0 datetime64[ns]
1 float64
2 float64
3 float64
4 int64
dtype: object
I have a few excel files that I want to process but they use a mix of thousands and decimal separators.
For example, 1.000.000 and 50,56 on one file and 1,000,000 and 50.56 on another.
This is the function that I use:
import os import pandas def process_file(data_file): try: df = pd.read_excel(data_file, header=None, na_values=["", "-", " "], thousands=".", decimal=",") print "XLS" except Exception as e: df = pd.read_html(data_file, header=None, na_values=["", "-", " "], thousands=".", decimal=",") df = df[0] print "HTML" df = df.iloc[3:] #, :8] # Remove header rows df.reset_index(drop=True, inplace=True) # Because we removed top rows column_names = ["N", "DATE", "J1", "J2", "J3", "J4", "J5", "J6", "T", "S5+1", "P5+1", "S5", "P5", "S4+1", "P4+1", "S4", "P4", "S3+1", "P3+1", "S3", "P3", "S2+1", "P2+1", "S1+1", "P1+1"] df.columns = column_names numeric_colums = ["N", "J1", "J2", "J3", "J4", "J5", "J6", "T", "S5+1", "P5+1", "S5", "P5", "S4+1", "P4+1", "S4", "P4", "S3+1", "P3+1", "S3", "P3", "S2+1", "P2+1", "S1+1", "P1+1"] df.replace(u"Not available", 0, inplace=True) df.replace(np.nan, 0, inplace=True) # Convert coulmns with numbers for num_col in numeric_colums: try: df[num_col] = df[num_col].astype(float) except Exception as e: print e df.DATE = pd.to_datetime(df.DATE) df.sort_values(by="N", inplace=True) return df
When the thousands and decimal are wrong I get an error if I try to convert the columns into numbers.
Is there a way to find out which combination of separators is used in each file ?
There is a thousands parameter for this. Try,
arq_pedido = pd.read_csv('Pedido.csv', delimiter=";", encoding = "ISO-8859-1", thousands=".")
You may also wish to set decimal="," to handle decimal numbers correcltly.
The read_csv method has parameters for just about every conceivable scenario. You're probably interested in the thousands parameter for the thousands place separator, the decimal parameter for the decimal point, and the sep parameter for the column separator.
import pandas as pd
import io
foobar = io.StringIO("foo;bar \n 1,000; 2.0")
pd.read_csv(foobar, thousands=",", decimal=".", sep=";")
# foo bar
#0 1000 2.0