pandas equals with tolerance

Comparing different pandas dataframes by columns with tolerance variation

stackoverflow.com › questions › 59733859 › comparing-different-pandas-dataframes-by-columns-with-tolerance-variation

Use np.isclose, which allows you to precisely control the absolute and relative tolerance of the comparison.

I assume that you only want to compare rows with labels that exist in both dataframes. Rows that exist in one but not the other are ignored. Also, since you use a relative criterion for A, C, G, T, compare(df0,df1) is not the same as compare(df1,df0). It assumes the second parameter is the reference value. This is consistent with how np.isclose works.

def compare(dfa, dfb):
    s = pd.Series(['A','C','G','T'])
    tmp = dfa.join(dfb, how='inner', lsuffix='_a', rsuffix='_b')

    # The A, C, G, T columns: within 90% of dfb
    lhs = tmp[s + '_a'].values
    rhs = tmp[s + '_b'].values
    compare1 = np.isclose(lhs, rhs, atol=0, rtol=0.9)

    # The uA, uC, uG, uT columns: within 1e-5
    lhs = tmp['u' + s + '_a'].values
    rhs = tmp['u' + s + '_b'].values
    compare2 = np.isclose(lhs, rhs, atol=1e-5, rtol=0)

    # The cmA, cmC, cmG, cmT columns: within 1e-3
    lhs = tmp['cm' + s + '_a'].values
    rhs = tmp['cm' + s + '_b'].values
    compare3 = np.isclose(lhs, rhs, atol=1e-3, rtol=0)

    # Assemble the result
    data = np.concatenate([compare1, compare2, compare3], axis=1)
    cols = pd.concat([s, 'u'+s, 'cm'+s])    
    result = pd.DataFrame(data, columns=cols, index=tmp.index)

    return result

compare(df0, df2)

For an easy visualization of the result:

def highlight_false(cell):
    return '' if cell else 'background-color: yellow'

result = compare(df0,df2)
result.style.applymap(highlight_false)

Answer from Code Different on Stack Overflow

Pandas

pandas.pydata.org › docs › reference › api › pandas.testing.assert_frame_equal.html

pandas.testing.assert_frame_equal — pandas 3.0.1 documentation

Absolute tolerance. Only used when check_exact is False. ... Specify object name being compared, internally used to show appropriate assertion message. ... Equivalent method for asserting Series equality. ... Check DataFrame equality. ... This example shows comparing two DataFrames that are equal but with columns of differing dtypes. >>> from pandas.testing import assert_frame_equal >>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}) >>> df2 = pd.DataFrame({"a": [1, 2], "b": [3.0, 4.0]})

Pandas

pandas.pydata.org › docs › dev › reference › api › pandas.testing.assert_frame_equal.html

pandas.testing.assert_frame_equal — pandas 3.0.0rc1+92.g17b66cc0ad documentation

Absolute tolerance. Only used when check_exact is False. ... Specify object name being compared, internally used to show appropriate assertion message. ... Equivalent method for asserting Series equality. ... Check DataFrame equality. ... This example shows comparing two DataFrames that are equal but with columns of differing dtypes. >>> from pandas.testing import assert_frame_equal >>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}) >>> df2 = pd.DataFrame({"a": [1, 2], "b": [3.0, 4.0]})

Stack Overflow

stackoverflow.com › questions › 59733859 › comparing-different-pandas-dataframes-by-columns-with-tolerance-variation

python - Comparing different pandas dataframes by columns with tolerance variation - Stack Overflow

Top answer

1 of 1

5

Use np.isclose, which allows you to precisely control the absolute and relative tolerance of the comparison.

I assume that you only want to compare rows with labels that exist in both dataframes. Rows that exist in one but not the other are ignored. Also, since you use a relative criterion for A, C, G, T, compare(df0,df1) is not the same as compare(df1,df0). It assumes the second parameter is the reference value. This is consistent with how np.isclose works.

def compare(dfa, dfb):
    s = pd.Series(['A','C','G','T'])
    tmp = dfa.join(dfb, how='inner', lsuffix='_a', rsuffix='_b')

    # The A, C, G, T columns: within 90% of dfb
    lhs = tmp[s + '_a'].values
    rhs = tmp[s + '_b'].values
    compare1 = np.isclose(lhs, rhs, atol=0, rtol=0.9)

    # The uA, uC, uG, uT columns: within 1e-5
    lhs = tmp['u' + s + '_a'].values
    rhs = tmp['u' + s + '_b'].values
    compare2 = np.isclose(lhs, rhs, atol=1e-5, rtol=0)

    # The cmA, cmC, cmG, cmT columns: within 1e-3
    lhs = tmp['cm' + s + '_a'].values
    rhs = tmp['cm' + s + '_b'].values
    compare3 = np.isclose(lhs, rhs, atol=1e-3, rtol=0)

    # Assemble the result
    data = np.concatenate([compare1, compare2, compare3], axis=1)
    cols = pd.concat([s, 'u'+s, 'cm'+s])    
    result = pd.DataFrame(data, columns=cols, index=tmp.index)

    return result

compare(df0, df2)

For an easy visualization of the result:

def highlight_false(cell):
    return '' if cell else 'background-color: yellow'

result = compare(df0,df2)
result.style.applymap(highlight_false)

Pandas

pandas.pydata.org › pandas-docs › version › 1.1 › reference › api › pandas.testing.assert_frame_equal.html

pandas.testing.assert_frame_equal — pandas 1.1.5 documentation

Absolute tolerance. Only used when check_exact is False. New in version 1.1.0. ... Specify object name being compared, internally used to show appropriate assertion message. ... Equivalent method for asserting Series equality. ... Check DataFrame equality. ... This example shows comparing two DataFrames that are equal but with columns of differing dtypes. >>> from pandas._testing import assert_frame_equal >>> df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) >>> df2 = pd.DataFrame({'a': [1, 2], 'b': [3.0, 4.0]})

GitHub

github.com › pandas-dev › pandas › issues › 54677

ENH: add `atol` to pd.DataFrame.compare() · Issue #54677 · pandas-dev/pandas

August 21, 2023 - Args: df1 (pd.DataFrame): The left dataframe df2 (pd.DataFrame): The right dataframe atol (float): Absolute tolerance Returns: pd.DataFrame: A dataframe with the differences between the two frames """ diff_df = pd.DataFrame(index=df1.index, columns=df1.columns) for col in df1.columns: if check_cols_are_numeric(df1, df2, col): diff_df[col] = tolerance_compare(df1, df2, atol, col) else: diff_df[col] = exact_compare(df1, df2, col) diff_df = remove_rows_cols_all_na(diff_df) diff_colums = diff_df.columns right_df = df2[diff_colums] diff_df = diff_df.merge( right_df, left_index=True, right_index=Tru

Author JonahBreslow

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.equals.html

pandas.DataFrame.equals — pandas 3.0.1 documentation

DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True.

Stack Overflow

stackoverflow.com › questions › 33626443 › comparing-floats-in-a-pandas-column

python - Comparing floats in a pandas column - Stack Overflow

Top answer

1 of 4

56

Due to imprecise float comparison you can or your comparison with np.isclose, isclose takes a relative and absolute tolerance param so the following should work:

Copydf['result'] = df['actual_credit'].ge(df['min_required_credit']) | np.isclose(df['actual_credit'], df['min_required_credit'])

2 of 4

10

@EdChum's answer works great, but using the pandas.DataFrame.round function is another clean option that works well without the use of numpy.

Copydf = pd.DataFrame(  # adding a small difference at the thousandths place to reproduce the issue
    data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
    columns=['actual_credit', 'min_required_credit'])

df['result'] = df['actual_credit'].round(1) >= df['min_required_credit'].round(1)
print(df)

Copy   actual_credit  min_required_credit  result
0            0.3                0.400   False
1            0.5                0.200    True
2            0.4                0.401    True
3            0.2                0.300   False

You might consider using round() to more permanently edit your dataframe, depending if you desire that precision or not. In this example, it seems like the OP suggests this is probably just noise and is just causing confusion.

Copydf = pd.DataFrame(  # adding a small difference at the thousandths place to reproduce the issue
    data=[[0.3, 0.4], [0.5, 0.2], [0.400, 0.401], [0.2, 0.3]],
    columns=['actual_credit', 'min_required_credit'])
df = df.round(1)
df['result'] = df['actual_credit'] >= df['min_required_credit']
print(df)

Copy   actual_credit  min_required_credit  result
0            0.3                  0.4   False
1            0.5                  0.2    True
2            0.4                  0.4    True
3            0.2                  0.3   False

Pandas

pandas.pydata.org › docs › reference › api › pandas.testing.assert_series_equal.html

pandas.testing.assert_series_equal — pandas 3.0.1 documentation

Absolute tolerance. Only used when check_exact is False. ... Specify object name being compared, internally used to show appropriate assertion message. ... Whether to check index equivalence. If False, then compare only values. ... If True, ignore the order of the index. Must be False if check_index is False. Note: same labels must be with the same data. ... Check that two Indexes are equal. ... Check that two DataFrames are equal. ... >>> from pandas import testing as tm >>> a = pd.Series([1, 2, 3, 4]) >>> b = pd.Series([1, 2, 3, 4]) >>> tm.assert_series_equal(a, b)

GitHub

github.com › pola-rs › polars › issues › 1167

Inexact number comparison in DataFrame.frame_equal · Issue #1167 · pola-rs/polars

May 5, 2021 - In pandas assert_frame_equals has the check_exact flag which allows for inexact number comparisons with tolerances defined by the rtol (relative tolerance) and atol (absolute tolerance). This would be nice to have in polars as well, sinc...

Author tversteeg

Find elsewhere

Google Bing Mojeek

GitHub

gist.github.com › bmweiner › 1b7b837feb280057918ccd1e83f0898b

Compare equality of two pandas dataframes · GitHub

Compare equality of two pandas dataframes. GitHub Gist: instantly share code, notes, and snippets.

Stack Overflow

stackoverflow.com › questions › 33549193 › pandas-dataframe-comparison-and-floating-point-precision

python - Pandas Dataframe Comparison and Floating Point Precision - Stack Overflow

Top answer

1 of 2

22

OK you can use np.isclose for this:

In [250]:
np.isclose(a,b)

Out[250]:
array([[ True],
       [ True]], dtype=bool)

np.isclose takes relative tolerance and absolute tolerance. These have default values: rtol=1e-05, atol=1e-08 respectively

2 of 2

4

You can use Pandas built-in assert_frame_equal, that automagically performs the numpy isclose() for floating point columns. The advantage is that you can pass an entire dataframe with mixed column types.

For fine tuning see arguments rtol and atol.

from pandas.testing import assert_frame_equal

assert_frame_equal(df1, df2)

Pandas

pandas.pydata.org › pandas-docs › version › 1.5.0rc0 › reference › api › pandas.testing.assert_frame_equal.html

pandas.testing.assert_frame_equal — pandas 1.5.0rc0 documentation

Absolute tolerance. Only used when check_exact is False. New in version 1.1.0. ... Specify object name being compared, internally used to show appropriate assertion message. ... Equivalent method for asserting Series equality. ... Check DataFrame equality. ... This example shows comparing two DataFrames that are equal but with columns of differing dtypes. >>> from pandas.testing import assert_frame_equal >>> df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) >>> df2 = pd.DataFrame({'a': [1, 2], 'b': [3.0, 4.0]})

Pandas

pandas.pydata.org › docs › reference › api › pandas.testing.assert_index_equal.html

pandas.testing.assert_index_equal — pandas documentation

Absolute tolerance. Only used when check_exact is False. ... Specify object name being compared, internally used to show appropriate assertion message. ... Check that two Series are equal. ... Check that two DataFrames are equal. ... >>> from pandas import testing as tm >>> a = pd.Index([1, 2, 3]) >>> b = pd.Index([1, 2, 3]) >>> tm.assert_index_equal(a, b)

NumPy

numpy.org › doc › stable › reference › generated › numpy.isclose.html

numpy.isclose — NumPy v2.4 Manual

>>> np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True) array([ True, True])

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.testing.assert_frame_equal.html

pyspark.pandas.testing.assert_frame_equal — PySpark 4.1.1 documentation

Absolute tolerance. Only used when check_exact is False. ... Specify object name being compared, internally used to show appropriate assertion message. ... Equivalent method for asserting Series equality. ... Check DataFrame equality. ... This example shows comparing two DataFrames that are equal but with columns of differing dtypes. >>> from pyspark.pandas.testing import assert_frame_equal >>> df1 = ps.DataFrame({'a': [1, 2], 'b': [3, 4]}) >>> df2 = ps.DataFrame({'a': [1, 2], 'b': [3.0, 4.0]})

Saturn Cloud

saturncloud.io › blog › what-is-float-comparison-in-pandas-and-how-to-do-it

What Is Float Comparison in Pandas and How to Do It | Saturn Cloud Blog

September 9, 2023 - Here are some common techniques: np.isclose() is a NumPy function that returns a boolean array indicating whether two arrays or values are element-wise equal within some tolerance value.

Davidamos

ww25.davidamos.dev › the-right-way-to-compare-floats-in-python

The Right Way to Compare Floats in Python | by David Amos

We cannot provide a description for this page right now

Sling Academy

slingacademy.com › article › pandas-checking-equality-of-2-dataframes-element-wise

Pandas: Checking equality of 2 DataFrames (element-wise) - Sling Academy

Pandas offers pd.testing.assert_frame_equal(), which allows for a comparison with a specified tolerance.

Appsloveworld

appsloveworld.com › pandas › 100 › 198 › comparing-two-data-frames-with-a-given-tolerance-range

[Code]-Comparing two data frames with a given tolerance range-pandas

Comparing two data frames with a given tolerance range · how to concat two data frames with different column names in pandas? - python · Merging two data frames into a new one with unique items marked with 1 or 0 · Merge two data frames with the closest number into a single row using pandas?

Medium

medium.com › @whyamit101 › understanding-assert-frame-equal-in-pandas-89aeda00e089

Understanding assert_frame_equal() in Pandas | by why amit | Medium

February 26, 2025 - df4 = df1.set_index('A') # Changes index to column 'A' # This fails because indexes are different # assert_frame_equal(df1, df4) # Ignore index type mismatch assert_frame_equal(df1, df4, check_index_type=False) # ✅ Passes · If your index doesn’t matter, disable this check. ... Sometimes, floating-point operations introduce tiny differences due to precision errors. Instead of failing over minuscule differences, you can allow a tolerance range.