You are performing the rounding operation passing a series as an argument. Instead you need to fix this to perform the rounding up for each value in the series. I suggest you use map with a lambda in the function to do it:
Data['Numerator'] = Data['Numerator'].map(lambda x: Decimal(x).quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
The output we get is as expected:
Code Disaggregation Numerator
0 x a 19.3
1 x b 82.1
2 x Total 101.2
Answer from Celius Stingher on Stack OverflowYou are performing the rounding operation passing a series as an argument. Instead you need to fix this to perform the rounding up for each value in the series. I suggest you use map with a lambda in the function to do it:
Data['Numerator'] = Data['Numerator'].map(lambda x: Decimal(x).quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
The output we get is as expected:
Code Disaggregation Numerator
0 x a 19.3
1 x b 82.1
2 x Total 101.2
try:
Data['Numerator'] = Data.Numerator.apply(lambda x : round(x, 1))
change the number to your desired rounding value
output:
Code Disaggregation Numerator
0 x a 19.3
1 x b 82.1
2 x Total 101.2
python - Round up half of the hour in pandas - Stack Overflow
[pandas] How to round up to the nearest 0.5?
python - Python3 pandas dataframe round .5 always up - Stack Overflow
dataframe - Rounding Python Values to the Nearest Half Number - Stack Overflow
timestamps
You need to use dt.round. This is however a bit as the previous/next hour behavior depends on the hour itself. You can force it by adding or subtracting a small amount of time (here 1ns):
s = pd.to_datetime(pd.Series(['1/2/2021 3:45', '25/4/2021 12:30',
'25/4/2021 13:30', '12/4/2022 23:45']))
# xx:30 -> rounding depending on the hour parity (default)
s.dt.round(freq='1h')
0 2021-01-02 04:00:00
1 2021-04-25 12:00:00 <- -30min
2 2021-04-25 14:00:00 <- +30min
3 2022-12-05 00:00:00
dtype: datetime64[ns]
# 00:30 -> 00:00 (force down)
s.sub(pd.Timedelta('1ns')).dt.round(freq='1h')
0 2021-01-02 04:00:00
1 2021-04-25 12:00:00
2 2021-04-25 13:00:00
3 2022-12-05 00:00:00
dtype: datetime64[ns]
# 00:30 -> 01:00 (force up)
s.add(pd.Timedelta('1ns')).dt.round(freq='1h')
0 2021-01-02 04:00:00
1 2021-04-25 12:00:00
2 2021-04-25 13:00:00
3 2022-12-05 00:00:00
dtype: datetime64[ns]
floats
IIUC, you can use divmod (or numpy.modf) to get the integer and decimal part, then perform simple boolean arithmetic:
s = pd.Series([7.15, 5.25, 22.30, 18.45])
s2, r = s.divmod(1) # or np.modf(s)
s2[r.ge(0.3)] += 1
s2 = s2.astype(int)
Alternative: using mod and boolean to int equivalence:
s2 = s.astype(int)+s.mod(1).ge(0.3)
output:
0 7
1 5
2 23
3 19
dtype: int64
Note on precision. It is not always easy to compare floats due to floating point arithmetics. For instance using gt would fail on the 22.30 here. To ensure precision round to 2 digits first.
s.mod(1).round(2).ge(0.3)
or use integers:
s.mod(1).mul(100).astype(int).ge(30)
Here a version that works with timestamps:
#dummy data:
df = pd.DataFrame({'time':pd.to_datetime([np.random.randint(0,10**8) for a in range(10)], unit='s')})
def custom_round(df, col, out):
if df[col].minute >= 30:
df[out] = df[col].ceil('H')
else:
df[out] = df[col].floor('H')
return df
df.apply(lambda x: custom_round(x, 'time', 'new_time'), axis=1)
#edit:
using numpy:
def custom_round(df, col, out):
df[out] = np.where(
(
df['time'].dt.minute>=30),
df[col].dt.ceil('H'),
df[col].dt.floor('H')
)
return df
df = custom_round(df, 'time', 'new_time')
I have a dataframe df:
df = pd.DataFrame({"volume": [0.3300, 5.600, 64.0915, 1.730000, 4.123000]})| volume |
|---|
| 0.3300 |
| 5.600 |
| 64.0915 |
| 1.730000 |
| 4.123000 |
I also have a non-exhausting dict di:
di = {
0.5: 6.26,
1.0: 6.28,
1.5: 6.36,
2.0: 6.46,
2.5: 6.56,
3.0: 6.66,
3.5: 6.76,
4.0: 6.86,
4.5: 6.96,
5.0: 6.98,
5.5: 7.15
...
}I need to create a new column ["map"] where I map di to df["volume"].
df["map"] = df["volume"].map(di)
but for that I need to round up each number in df["volume"] to the next 0.5, so the values should look like:
| volume | volume_round_up |
|---|---|
| 0.3300 | 0.5 |
| 5.600 | 6.0 |
| 64.0915 | 64.5 |
| 1.730000 | 2.0 |
| 4.123000 | 4.5 |
How can I do this in a vectorized way?
You can add some tiny value to orig when the decimal is 0.5. That guarantees that any integer + 0.5 will always round up to the next integer.
import numpy as np
df['round_up'] = np.round(np.where(df['orig'] % 1 == 0.5,
df['orig'] + 0.1,
df['orig']))
print(df)
orig round_up
0 0.500000 1.0
1 1.499999 2.0
2 1.500000 2.0
3 2.500000 3.0
4 3.500000 4.0
5 4.500000 5.0
6 5.500000 6.0
7 6.500000 7.0
Using the decimal module, you could do
import decimal
df = pd.DataFrame(data=[0.5, 1.499999, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5], columns=["orig"])
df.orig = df.orig.apply(
lambda x: decimal.Decimal(x).to_integral_value(rounding=decimal.ROUND_HALF_UP)
)
As @cᴏʟᴅsᴘᴇᴇᴅ pointed out, this is happening because numpy rounds half-values to the nearest even integer (see docs here and a more general discussion here), and pandas uses numpy for most of its numerical work. You can resolve this by rounding the "old-fashioned" way:
import numpy as np
df.anual_jobs = np.floor(df.anual_jobs + 0.5)
or
import pandas as pd
df.anual_jobs = pd.np.floor(df.anual_jobs + 0.5)
As @cᴏʟᴅsᴘᴇᴇᴅ pointed out you can also resolve the slice assignment warning by creating your dataframe as a free-standing frame instead of a view on an older dataframe, i.e., execute the following at some point before you assign values into the dataframe:
df = df.copy()
If what you want is because of the half-integer use decimal
from decimal import Decimal, ROUND_HALF_UP
print(Decimal(10.5).quantize(0, ROUND_HALF_UP))
print(Decimal(10.2).quantize(0, ROUND_HALF_UP))
>> 11
>> 10