Casting full DataFrame.
df = df.astype(str).astype(float)
For single column. IDs is the name of the column.
df["IDs"] = df.IDs.astype(str).astype(float)
Test implementation
from pprint import pprint
import bson
df = pd.DataFrame()
y = []
for i in range(1,6):
i = i *2/3.5
y.append(bson.decimal128.Decimal128(str(i)))
pprint(y)
df["D128"] = y
df["D128"] = df.D128.astype(str).astype(float)
print("\n", df)
Output:
[Decimal128('0.5714285714285714'),
Decimal128('1.1428571428571428'),
Decimal128('1.7142857142857142'),
Decimal128('2.2857142857142856'),
Decimal128('2.857142857142857')]
D128
0 0.571429
1 1.142857
2 1.714286
3 2.285714
4 2.857143
Answer from Srce Cde on Stack Overflowpython - pandas read_csv column dtype is set to decimal but converts to string - Stack Overflow
Different precision calling .astype(str) on float numbers
python - Set decimal precision of a pandas dataframe column with a datatype of Decimal - Stack Overflow
python - Convert floats to ints in Pandas? - Stack Overflow
Casting full DataFrame.
df = df.astype(str).astype(float)
For single column. IDs is the name of the column.
df["IDs"] = df.IDs.astype(str).astype(float)
Test implementation
from pprint import pprint
import bson
df = pd.DataFrame()
y = []
for i in range(1,6):
i = i *2/3.5
y.append(bson.decimal128.Decimal128(str(i)))
pprint(y)
df["D128"] = y
df["D128"] = df.D128.astype(str).astype(float)
print("\n", df)
Output:
[Decimal128('0.5714285714285714'),
Decimal128('1.1428571428571428'),
Decimal128('1.7142857142857142'),
Decimal128('2.2857142857142856'),
Decimal128('2.857142857142857')]
D128
0 0.571429
1 1.142857
2 1.714286
3 2.285714
4 2.857143
Just use:
df = df.astype(float)
You can also use apply or applymap(applying element wise operations), although these are inefficient compared to previous method.
df = df.applymap(float)
I can't reproduce a Decimal128 number in my system. Can you please check if the next line works for you?
df = df.apply(lambda x: x.astype(float) if isinstance(x, bson.decimal.Decimal128) else x)
It will check if a column is of type Decimal128 and then convert it to float.
This can be modified by changing the print options for floats, however it will modify how every float datatype is printed
pd.set_option('display.float_format', '{:.10f}'.format)
Keep in mind that this is only the way it's printed. The value is stored in the dataframe, with every decimal.
On the other hand, you can restrict decimals by:
df.Value = df.Value.round(4)
But this will round depending the fifth decimal. Last option would be to use np.ceil or np.floor but since this wont support decimals, an approach with multiplication and division is requierd:
precision = 4
df['Value_ceil'] = np.ceil(df.Value * 10**precision) / (10**precision)
df['Value_floor'] = np.floor(df.Value * 10**precision) / (10**precision)
Fixed the issue, seems to be related to how Decimal converts from float to decimal. Setting the Values column to be of data type string then converting to Decimal got me the result I desired.
def get_df(table_filepath):
df = pd.read_csv(table_filepath)
df['Value'] = df['Value'].apply(str)
df['Value'] = df['Value'].apply(Decimal)
| Key | Value |
|---|---|
| A | 1.2089 |
| B | 5.6718 |
| B | 7.3084 |
To modify the float output do this:
df= pd.DataFrame(range(5), columns=['a'])
df.a = df.a.astype(float)
df
Out[33]:
a
0 0.0000000
1 1.0000000
2 2.0000000
3 3.0000000
4 4.0000000
pd.options.display.float_format = '{:,.0f}'.format
df
Out[35]:
a
0 0
1 1
2 2
3 3
4 4
Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.
>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
A B C D
0 0.542447 0.949988 0.669239 0.879887
1 0.068542 0.757775 0.891903 0.384542
2 0.021274 0.587504 0.180426 0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
A B C D
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
EDIT:
To handle missing values:
>>> df
A B C D
0 0.475103 0.355453 0.66 0.869336
1 0.260395 0.200287 NaN 0.617024
2 0.517692 0.735613 0.18 0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
A B C D
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
For some reason, some of the columns are being loaded as a Decimal rather than as a float - not my team, apparently can't be changed.
Is there a way to identify which columns are Decimal? df[col].dtype just returns "O" which makes it impossible to distinguish from objects using this method.
The problem is for the .iloc it assign the value and did not change the column type
l = df.columns[2:]
df[l] = df[l].astype(int)
df
0 1 2 3 4
0 1.1 2.1 3 4 5
1 6.1 7.1 8 9 10
One way to solve that is to use .convert_dtypes()
df.iloc[:, 2:] = df.iloc[:, 2:].round()
df = df.convert_dtypes()
print(df)
output:
0 1 2 3 4
0 1.1 2.1 3 4 5
1 6.1 7.1 8 9 10
It will help you to coerce all dtypes of your dataframe to a better fit.