You can use numpy.allclose:
numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)Returns
Trueif two arrays are element-wise equal within a tolerance.The tolerance values are positive, typically very small numbers. The relative difference (
rtol * abs(b)) and the absolute differenceatolare added together to compare against the absolute difference betweenaandb.
numpy works well with pandas.Series objects, so if you have two of them - s1 and s2, you can simply do:
np.allclose(s1, s2, atol=...)
Where atol is your tolerance value.
You can use numpy.allclose:
numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)Returns
Trueif two arrays are element-wise equal within a tolerance.The tolerance values are positive, typically very small numbers. The relative difference (
rtol * abs(b)) and the absolute differenceatolare added together to compare against the absolute difference betweenaandb.
numpy works well with pandas.Series objects, so if you have two of them - s1 and s2, you can simply do:
np.allclose(s1, s2, atol=...)
Where atol is your tolerance value.
Numpy works well with pandas Series. However one has to be careful with the order of indices (or columns and indices for pandas DataFrame)
For example
series_1 = pd.Series(data=[0,1], index=['a','b'])
series_2 = pd.Series(data=[1,0], index=['b','a'])
np.allclose(series_1,series_2)
will return False
A workaround is to use the index of one pandas series
np.allclose(series_1, series_2.loc[series_1.index])
Due to imprecise float comparison you can or your comparison with np.isclose, isclose takes a relative and absolute tolerance param so the following should work:
Copydf['result'] = df['actual_credit'].ge(df['min_required_credit']) | np.isclose(df['actual_credit'], df['min_required_credit'])
@EdChum's answer works great, but using the pandas.DataFrame.round function is another clean option that works well without the use of numpy.
Copydf = pd.DataFrame( # adding a small difference at the thousandths place to reproduce the issue
data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
columns=['actual_credit', 'min_required_credit'])
df['result'] = df['actual_credit'].round(1) >= df['min_required_credit'].round(1)
print(df)
Copy actual_credit min_required_credit result
0 0.3 0.400 False
1 0.5 0.200 True
2 0.4 0.401 True
3 0.2 0.300 False
You might consider using round() to more permanently edit your dataframe, depending if you desire that precision or not. In this example, it seems like the OP suggests this is probably just noise and is just causing confusion.
Copydf = pd.DataFrame( # adding a small difference at the thousandths place to reproduce the issue
data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
columns=['actual_credit', 'min_required_credit'])
df = df.round(1)
df['result'] = df['actual_credit'] >= df['min_required_credit']
print(df)
Copy actual_credit min_required_credit result
0 0.3 0.4 False
1 0.5 0.2 True
2 0.4 0.4 True
3 0.2 0.3 False
You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except: # appeantly AssertionError doesn't catch all
return False
There was discussion of a better way...
Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.
def get_different_rows(source_df, new_df):
"""Returns just the rows from the new dataframe that differ from the source dataframe"""
merged_df = source_df.merge(new_df, indicator=True, how='outer')
changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
return changed_rows_df.drop('_merge', axis=1)