Use np.isclose, which allows you to precisely control the absolute and relative tolerance of the comparison.
I assume that you only want to compare rows with labels that exist in both dataframes. Rows that exist in one but not the other are ignored. Also, since you use a relative criterion for A, C, G, T, compare(df0,df1) is not the same as compare(df1,df0). It assumes the second parameter is the reference value. This is consistent with how np.isclose works.
def compare(dfa, dfb):
s = pd.Series(['A','C','G','T'])
tmp = dfa.join(dfb, how='inner', lsuffix='_a', rsuffix='_b')
# The A, C, G, T columns: within 90% of dfb
lhs = tmp[s + '_a'].values
rhs = tmp[s + '_b'].values
compare1 = np.isclose(lhs, rhs, atol=0, rtol=0.9)
# The uA, uC, uG, uT columns: within 1e-5
lhs = tmp['u' + s + '_a'].values
rhs = tmp['u' + s + '_b'].values
compare2 = np.isclose(lhs, rhs, atol=1e-5, rtol=0)
# The cmA, cmC, cmG, cmT columns: within 1e-3
lhs = tmp['cm' + s + '_a'].values
rhs = tmp['cm' + s + '_b'].values
compare3 = np.isclose(lhs, rhs, atol=1e-3, rtol=0)
# Assemble the result
data = np.concatenate([compare1, compare2, compare3], axis=1)
cols = pd.concat([s, 'u'+s, 'cm'+s])
result = pd.DataFrame(data, columns=cols, index=tmp.index)
return result
compare(df0, df2)
For an easy visualization of the result:
def highlight_false(cell):
return '' if cell else 'background-color: yellow'
result = compare(df0,df2)
result.style.applymap(highlight_false)
Answer from Code Different on Stack OverflowDue to imprecise float comparison you can or your comparison with np.isclose, isclose takes a relative and absolute tolerance param so the following should work:
Copydf['result'] = df['actual_credit'].ge(df['min_required_credit']) | np.isclose(df['actual_credit'], df['min_required_credit'])
@EdChum's answer works great, but using the pandas.DataFrame.round function is another clean option that works well without the use of numpy.
Copydf = pd.DataFrame( # adding a small difference at the thousandths place to reproduce the issue
data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
columns=['actual_credit', 'min_required_credit'])
df['result'] = df['actual_credit'].round(1) >= df['min_required_credit'].round(1)
print(df)
Copy actual_credit min_required_credit result
0 0.3 0.400 False
1 0.5 0.200 True
2 0.4 0.401 True
3 0.2 0.300 False
You might consider using round() to more permanently edit your dataframe, depending if you desire that precision or not. In this example, it seems like the OP suggests this is probably just noise and is just causing confusion.
Copydf = pd.DataFrame( # adding a small difference at the thousandths place to reproduce the issue
data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
columns=['actual_credit', 'min_required_credit'])
df = df.round(1)
df['result'] = df['actual_credit'] >= df['min_required_credit']
print(df)
Copy actual_credit min_required_credit result
0 0.3 0.4 False
1 0.5 0.2 True
2 0.4 0.4 True
3 0.2 0.3 False
OK you can use np.isclose for this:
In [250]:
np.isclose(a,b)
Out[250]:
array([[ True],
[ True]], dtype=bool)
np.isclose takes relative tolerance and absolute tolerance. These have default values: rtol=1e-05, atol=1e-08 respectively
You can use Pandas built-in assert_frame_equal, that automagically performs the numpy isclose() for floating point columns. The advantage is that you can pass an entire dataframe with mixed column types.
For fine tuning see arguments rtol and atol.
from pandas.testing import assert_frame_equal
assert_frame_equal(df1, df2)