Brave Search

pandas.pydata.org › docs › reference › api › pandas.DataFrame.equals.html

pandas.DataFrame.equals — pandas 3.0.1 documentation

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

stackoverflow.com › questions › 54405704 › check-if-all-values-in-dataframe-column-are-the-same

python - Check if all values in dataframe column are the same - Stack Overflow

1 of 7

89

An efficient way to do this is by comparing the first value with the rest, and using all:

def is_unique(s):
    a = s.to_numpy() # s.values (pandas<0.24)
    return (a[0] == a).all()

is_unique(df['counts'])
# False

Although the most intuitive idea could possibly be to count the amount of unique values and check if there is only one, this would have a needlessly high complexity for what we're trying to do. Numpy's' np.unique, called by pandas' nunique, implements a sorting of the underlying arrays, which has an evarage complexity of O(n·log(n)) using quicksort (default). The above approach is O(n).

The difference in performance becomes more obvious when we're applying this to an entire dataframe (see below).

For an entire dataframe

In the case of wanting to perform the same task on an entire dataframe, we can extend the above by setting axis=0 in all:

def unique_cols(df):
    a = df.to_numpy() # df.values (pandas<0.24)
    return (a[0] == a).all(0)

For the shared example, we'd get:

unique_cols(df)
# array([False, False])

Here's a benchmark of the above methods compared with some other approaches, such as using nunique (for a pd.Series):

s_num = pd.Series(np.random.randint(0, 1_000, 1_100_000))

perfplot.show(
    setup=lambda n: s_num.iloc[:int(n)], 

    kernels=[
        lambda s: s.nunique() == 1,
        lambda s: is_unique(s)
    ],

    labels=['nunique', 'first_vs_rest'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)

And below are the timings for a pd.DataFrame. Let's compare too with a numba approach, which is especially useful here since we can take advantage of short-cutting as soon as we see a repeated value in a given column (note: the numba approach will only work with numerical data):

from numba import njit

@njit
def unique_cols_nb(a):
    n_cols = a.shape[1]
    out = np.zeros(n_cols, dtype=np.int32)
    for i in range(n_cols):
        init = a[0, i]
        for j in a[1:, i]:
            if j != init:
                break
        else:
            out[i] = 1
    return out

If we compare the three methods:

df = pd.DataFrame(np.concatenate([np.random.randint(0, 1_000, (500_000, 200)), 
                                  np.zeros((500_000, 10))], axis=1))

perfplot.show(
    setup=lambda n: df.iloc[:int(n),:], 

    kernels=[
        lambda df: (df.nunique(0) == 1).values,
        lambda df: unique_cols_nb(df.values).astype(bool),
        lambda df: unique_cols(df) 
    ],

    labels=['nunique', 'unique_cols_nb', 'unique_cols'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)

2 of 7

20

Update using np.unique

len(np.unique(df.counts))==1
False

Or

len(set(df.counts.tolist()))==1

Or

df.counts.eq(df.counts.iloc[0]).all()
False

Or

df.counts.std()==0
False

Discussions

python - Pandas Dataframe Find Rows Where all Columns Equal - Stack Overflow

I have a dataframe that has characters in it - I want a boolean result by row that tells me if all columns for that row have the same value. For example, I have df = [ a b c d 0 'C' 'C... More on stackoverflow.com

stackoverflow.com

python - Comparing two pandas dataframes for differences - Stack Overflow

In the end if you want to ensure ... boolean outcomes resulting from the comparison. all(df_1 == df_2) in fact, returns wrongly True. At the end of the day, to safely check if two dataframes are equal (using only the pandas library), you should:... More on stackoverflow.com

stackoverflow.com

How to test if all values in pandas dataframe column are equal?

Don't think there's a built-in functionality to quickly do that, but it can be done in two steps: In [19]: df Out[19]: A B 0 h h 1 h h 2 h i Count the number of uniques in each column: In [20]: uniques = df.apply(lambda x: x.nunique()) In [21]: uniques Out[21]: A 1 B 2 dtype: int64 Use boolean indexing on uniques to filter out rows where the number of uniques is not equal to one. Use the result's index to drop the columns in the original dataframe. In [22]: df = df.drop(uniques[uniques==1].index, axis=1) In [23]: df Out[23]: B 0 h 1 h 2 i More on reddit.com

r/learnpython

4

1

March 9, 2017

python - Confirming equality of two pandas dataframes? - Stack Overflow

I am a bit late to the party, but with more modern versions of Pandas you do not need to resort to NumPy's np.allclose() for checking approximate numerical equality. For instance, in recent versions of Pandas (2.2.x at the time of this writing) the assert_frame_equal method supports the ... More on stackoverflow.com

stackoverflow.com

pandas.pydata.org › docs › reference › api › pandas.DataFrame.all.html

pandas.DataFrame.all — pandas 3.0.1 documentation - PyData |

Specify axis='columns' to check if values in each row all return True.

pandas.pydata.org › pandas-docs › version › 0.25.0 › reference › api › pandas.DataFrame.equals.html

pandas.DataFrame.equals — pandas 0.25.0 documentation

Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

stackoverflow.com › questions › 22701799 › pandas-dataframe-find-rows-where-all-columns-equal

python - Pandas Dataframe Find Rows Where all Columns Equal - Stack Overflow

bobbyhadz.com › blog › pandas-check-if-all-values-in-column-are-equal

1 of 5

56

I think the cleanest way is to check all columns against the first column using eq:

CopyIn [11]: df
Out[11]: 
   a  b  c  d
0  C  C  C  C
1  C  C  A  A
2  A  A  A  A

In [12]: df.iloc[:, 0]
Out[12]: 
0    C
1    C
2    A
Name: a, dtype: object

In [13]: df.eq(df.iloc[:, 0], axis=0)
Out[13]: 
      a     b      c      d
0  True  True   True   True
1  True  True  False  False
2  True  True   True   True

Now you can use all (if they are all equal to the first item, they are all equal):

CopyIn [14]: df.eq(df.iloc[:, 0], axis=0).all(1)
Out[14]: 
0     True
1    False
2     True
dtype: bool

2 of 5

20

Compare array by first column and check if all Trues per row:

Same solution in numpy for better performance:

Copya = df.values
b = (a == a[:, [0]]).all(axis=1)
print (b)
[ True  True False]

And if need Series:

Copys = pd.Series(b, axis=df.index)

Comparing solutions:

Copydata = [[10,10,10],[12,12,12],[10,12,10]]
df = pd.DataFrame(data,columns=['Col1','Col2','Col3'])

#[30000 rows x 3 columns]
df = pd.concat([df] * 10000, ignore_index=True)

Copy#jez - numpy array
In [14]: %%timeit
    ...: a = df.values
    ...: b = (a == a[:, [0]]).all(axis=1)
141 µs ± 3.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#jez - Series 
In [15]: %%timeit
    ...: a = df.values
    ...: b = (a == a[:, [0]]).all(axis=1)
    ...: pd.Series(b, index=df.index)
169 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#Andy Hayden
In [16]: %%timeit
    ...: df.eq(df.iloc[:, 0], axis=0).all(axis=1)
2.22 ms ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Wen1
In [17]: %%timeit
    ...: list(map(lambda x : len(set(x))==1,df.values))
56.8 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#K.-Michael Aye
In [18]: %%timeit
    ...: df.apply(lambda x: len(set(x)) == 1, axis=1)
686 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Wen2    
In [19]: %%timeit
    ...: df.nunique(1).eq(1)
2.87 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Bobby Hadz

Check if all values in a Column are Equal in Pandas | bobbyhadz

April 12, 2024 - Copied!import pandas as pd df = ...'].to_numpy() # 👇️ [ True True True True] print(arr[0] == arr) # 👇️ True print((arr[0] == arr).all()) ... If the condition returns True for all array elements, then all values in the ...

W3Schools

w3schools.com › python › pandas › ref_df_equals.asp

Pandas DataFrame equals() Method

A Boolean, True if the DataFrames are equal, otherwise False.

Skytowner

skytowner.com › explore › pandas_dataframe_equals_method

Pandas DataFrame | equals method with Examples

Pandas DataFrame.equals(~) checks whether two DataFrames are identical, that is, all their respective values, column labels and index names are equal, and have the same data type.

Find elsewhere

Google Bing Mojeek

stackoverflow.com › questions › 19917545 › comparing-two-pandas-dataframes-for-differences

python - Comparing two pandas dataframes for differences - Stack Overflow

reddit.com › r/learnpython › how to test if all values in pandas dataframe column are equal?

1 of 10

90

You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):

csvdata_old = csvdata.copy()

To check whether they are equal, you can use assert_frame_equal as in this answer:

from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)

You can wrap this in a function with something like:

try:
    assert_frame_equal(csvdata, csvdata_old)
    return True
except:  # appeantly AssertionError doesn't catch all
    return False

There was discussion of a better way...

2 of 10

50

Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.

def get_different_rows(source_df, new_df):
    """Returns just the rows from the new dataframe that differ from the source dataframe"""
    merged_df = source_df.merge(new_df, indicator=True, how='outer')
    changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
    return changed_rows_df.drop('_merge', axis=1)

r/learnpython on Reddit: How to test if all values in pandas dataframe column are equal?

March 9, 2017 -

I need to test whether all values in a column (for all columns) in my pandas dataframe are equal, and if so, delete those columns. Note that all the values in the dataframe are strings and not integers.

I've looked online elsewhere but either don't understand the answers, or the answer isn't quite what I'm looking for, and being new to this, I don't know how to modify it.

Could anyone offer some advice?

Don't think there's a built-in functionality to quickly do that, but it can be done in two steps: In [19]: df Out[19]: A B 0 h h 1 h h 2 h i Count the number of uniques in each column: In [20]: uniques = df.apply(lambda x: x.nunique()) In [21]: uniques Out[21]: A 1 B 2 dtype: int64 Use boolean indexing on uniques to filter out rows where the number of uniques is not equal to one. Use the result's index to drop the columns in the original dataframe. In [22]: df = df.drop(uniques[uniques==1].index, axis=1) In [23]: df Out[23]: B 0 h 1 h 2 i

1 of 3

3

2 of 3

1

My first thought is to check the length of the result of pd.unique() but there may be a better way.

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.DataFrame.all.html

pandas.DataFrame.all — pandas 3.0.1 documentation

Specify axis='columns' to check if values in each row all return True.

pandas.pydata.org › docs › reference › api › pandas.DataFrame.eq.html

pandas.DataFrame.eq — pandas 3.0.1 documentation

Compare DataFrames for greater than inequality or equality elementwise.

Statology

statology.org › home › pandas: how to check if multiple columns are equal

Pandas: How to Check if Multiple Columns are Equal

September 13, 2022 - import pandas as pd #create DataFrame df = pd.DataFrame({'A': [4, 0, 3, 3, 6, 8, 7], 'B': [4, 2, 3, 5, 6, 4, 7], 'C': [4, 0, 3, 3, 5, 10, 7], 'D': [4, 0, 3, 3, 3, 8, 7]}) #view DataFrame print(df) A B C D 0 4 4 4 4 1 0 2 0 0 2 3 3 3 3 3 3 5 3 3 4 6 6 5 3 5 8 4 10 8 6 7 7 7 7 · We can use the following syntax to check if the value in every column in the DataFrame is equal for each row: #create new column that checks if all columns match in each row df['matching'] = df.eq(df.iloc[:, 0], axis=0).all(1) #view updated DataFrame print(df) A B C D matching 0 4 4 4 4 True 1 0 2 0 0 False 2 3 3 3 3 True 3 3 5 3 3 False 4 6 6 5 3 False 5 8 4 10 8 False 6 7 7 7 7 True

stackoverflow.com › questions › 38212697 › confirming-equality-of-two-pandas-dataframes

python - Confirming equality of two pandas dataframes? - Stack Overflow

w3resource.com › pandas › dataframe › dataframe-equals.php

1 of 3

42

You can use assert_frame_equal and not check the dtype of the columns.

# Pre v. 0.20.3
# from pandas.util.testing import assert_frame_equal

from pandas.testing import assert_frame_equal

assert_frame_equal(df1, df2, check_dtype=False)

2 of 3

7

Using elegant @Divakar's idea - numpy's allclose() will do the main trick for numbers:

In [128]: df1
Out[128]:
   0    s  n
0  1  aaa  1
1  2  aaa  2
2  3  aaa  3

In [129]: df2
Out[129]:
     0    s    n
0  1.0  aaa  1.0
1  2.0  aaa  2.0
2  3.0  aaa  3.0

In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object]))
   .....:  &
   .....:  df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object]))
   .....: )
Out[130]: True

select_dtypes() will help you to separate strings and all other numeric dtypes

w3resource

Pandas DataFrame: equals() function - w3resource

August 19, 2022 - NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype. ... Returns: bool True if all elements are the same in both objects, False otherwise. ... Download the Pandas DataFrame Notebooks from here.

stackoverflow.com › questions › 73812609 › check-if-all-element-of-a-column-of-a-pandas-dataframe-are-equal-to-a-particular

python - check if all element of a column of a pandas dataframe are equal to a particular value - Stack Overflow

w3resource.com › pandas › series › series-equals.php

1 of 1

1

Use DataFrame.all with compare by Series.eq:

if df.B.eq(9).all():
    print ('all values are 9')

EDIT: Like mentioned @mozway use:

assert df.B.eq(9).all()
#AssertionError

With custom error:

if df.B.ne(9).any():
    raise Exception('All values in column are not equal 9')
    
Exception: All values in column are not equal 9

w3resource

Pandas Series: equals() function - w3resource

September 15, 2022 - Returns: bool True if all elements are the same in both objects, False otherwise. ... Example - DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True: ... import numpy as ...