An efficient way to do this is by comparing the first value with the rest, and using all:

def is_unique(s):
    a = s.to_numpy() # s.values (pandas<0.24)
    return (a[0] == a).all()

is_unique(df['counts'])
# False

Although the most intuitive idea could possibly be to count the amount of unique values and check if there is only one, this would have a needlessly high complexity for what we're trying to do. Numpy's' np.unique, called by pandas' nunique, implements a sorting of the underlying arrays, which has an evarage complexity of O(n·log(n)) using quicksort (default). The above approach is O(n).

The difference in performance becomes more obvious when we're applying this to an entire dataframe (see below).


For an entire dataframe

In the case of wanting to perform the same task on an entire dataframe, we can extend the above by setting axis=0 in all:

def unique_cols(df):
    a = df.to_numpy() # df.values (pandas<0.24)
    return (a[0] == a).all(0)

For the shared example, we'd get:

unique_cols(df)
# array([False, False])

Here's a benchmark of the above methods compared with some other approaches, such as using nunique (for a pd.Series):

s_num = pd.Series(np.random.randint(0, 1_000, 1_100_000))

perfplot.show(
    setup=lambda n: s_num.iloc[:int(n)], 

    kernels=[
        lambda s: s.nunique() == 1,
        lambda s: is_unique(s)
    ],

    labels=['nunique', 'first_vs_rest'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)


And below are the timings for a pd.DataFrame. Let's compare too with a numba approach, which is especially useful here since we can take advantage of short-cutting as soon as we see a repeated value in a given column (note: the numba approach will only work with numerical data):

from numba import njit

@njit
def unique_cols_nb(a):
    n_cols = a.shape[1]
    out = np.zeros(n_cols, dtype=np.int32)
    for i in range(n_cols):
        init = a[0, i]
        for j in a[1:, i]:
            if j != init:
                break
        else:
            out[i] = 1
    return out

If we compare the three methods:

df = pd.DataFrame(np.concatenate([np.random.randint(0, 1_000, (500_000, 200)), 
                                  np.zeros((500_000, 10))], axis=1))

perfplot.show(
    setup=lambda n: df.iloc[:int(n),:], 

    kernels=[
        lambda df: (df.nunique(0) == 1).values,
        lambda df: unique_cols_nb(df.values).astype(bool),
        lambda df: unique_cols(df) 
    ],

    labels=['nunique', 'unique_cols_nb', 'unique_cols'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)

Answer from yatu on Stack Overflow
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.equals.html
pandas.DataFrame.equals — pandas 3.0.1 documentation
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
Top answer
1 of 7
89

An efficient way to do this is by comparing the first value with the rest, and using all:

def is_unique(s):
    a = s.to_numpy() # s.values (pandas<0.24)
    return (a[0] == a).all()

is_unique(df['counts'])
# False

Although the most intuitive idea could possibly be to count the amount of unique values and check if there is only one, this would have a needlessly high complexity for what we're trying to do. Numpy's' np.unique, called by pandas' nunique, implements a sorting of the underlying arrays, which has an evarage complexity of O(n·log(n)) using quicksort (default). The above approach is O(n).

The difference in performance becomes more obvious when we're applying this to an entire dataframe (see below).


For an entire dataframe

In the case of wanting to perform the same task on an entire dataframe, we can extend the above by setting axis=0 in all:

def unique_cols(df):
    a = df.to_numpy() # df.values (pandas<0.24)
    return (a[0] == a).all(0)

For the shared example, we'd get:

unique_cols(df)
# array([False, False])

Here's a benchmark of the above methods compared with some other approaches, such as using nunique (for a pd.Series):

s_num = pd.Series(np.random.randint(0, 1_000, 1_100_000))

perfplot.show(
    setup=lambda n: s_num.iloc[:int(n)], 

    kernels=[
        lambda s: s.nunique() == 1,
        lambda s: is_unique(s)
    ],

    labels=['nunique', 'first_vs_rest'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)


And below are the timings for a pd.DataFrame. Let's compare too with a numba approach, which is especially useful here since we can take advantage of short-cutting as soon as we see a repeated value in a given column (note: the numba approach will only work with numerical data):

from numba import njit

@njit
def unique_cols_nb(a):
    n_cols = a.shape[1]
    out = np.zeros(n_cols, dtype=np.int32)
    for i in range(n_cols):
        init = a[0, i]
        for j in a[1:, i]:
            if j != init:
                break
        else:
            out[i] = 1
    return out

If we compare the three methods:

df = pd.DataFrame(np.concatenate([np.random.randint(0, 1_000, (500_000, 200)), 
                                  np.zeros((500_000, 10))], axis=1))

perfplot.show(
    setup=lambda n: df.iloc[:int(n),:], 

    kernels=[
        lambda df: (df.nunique(0) == 1).values,
        lambda df: unique_cols_nb(df.values).astype(bool),
        lambda df: unique_cols(df) 
    ],

    labels=['nunique', 'unique_cols_nb', 'unique_cols'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)

2 of 7
20

Update using np.unique

len(np.unique(df.counts))==1
False

Or

len(set(df.counts.tolist()))==1

Or

df.counts.eq(df.counts.iloc[0]).all()
False

Or

df.counts.std()==0
False
Discussions

python - Pandas Dataframe Find Rows Where all Columns Equal - Stack Overflow
I have a dataframe that has characters in it - I want a boolean result by row that tells me if all columns for that row have the same value. For example, I have df = [ a b c d 0 'C' 'C... More on stackoverflow.com
🌐 stackoverflow.com
python - Comparing two pandas dataframes for differences - Stack Overflow
In the end if you want to ensure ... boolean outcomes resulting from the comparison. all(df_1 == df_2) in fact, returns wrongly True. At the end of the day, to safely check if two dataframes are equal (using only the pandas library), you should:... More on stackoverflow.com
🌐 stackoverflow.com
How to test if all values in pandas dataframe column are equal?
Don't think there's a built-in functionality to quickly do that, but it can be done in two steps: In [19]: df Out[19]: A B 0 h h 1 h h 2 h i Count the number of uniques in each column: In [20]: uniques = df.apply(lambda x: x.nunique()) In [21]: uniques Out[21]: A 1 B 2 dtype: int64 Use boolean indexing on uniques to filter out rows where the number of uniques is not equal to one. Use the result's index to drop the columns in the original dataframe. In [22]: df = df.drop(uniques[uniques==1].index, axis=1) In [23]: df Out[23]: B 0 h 1 h 2 i More on reddit.com
🌐 r/learnpython
4
1
March 9, 2017
python - Confirming equality of two pandas dataframes? - Stack Overflow
I am a bit late to the party, but with more modern versions of Pandas you do not need to resort to NumPy's np.allclose() for checking approximate numerical equality. For instance, in recent versions of Pandas (2.2.x at the time of this writing) the assert_frame_equal method supports the ... More on stackoverflow.com
🌐 stackoverflow.com
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 0.25.0 › reference › api › pandas.DataFrame.equals.html
pandas.DataFrame.equals — pandas 0.25.0 documentation
Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.
Top answer
1 of 5
56

I think the cleanest way is to check all columns against the first column using eq:

CopyIn [11]: df
Out[11]: 
   a  b  c  d
0  C  C  C  C
1  C  C  A  A
2  A  A  A  A

In [12]: df.iloc[:, 0]
Out[12]: 
0    C
1    C
2    A
Name: a, dtype: object

In [13]: df.eq(df.iloc[:, 0], axis=0)
Out[13]: 
      a     b      c      d
0  True  True   True   True
1  True  True  False  False
2  True  True   True   True

Now you can use all (if they are all equal to the first item, they are all equal):

CopyIn [14]: df.eq(df.iloc[:, 0], axis=0).all(1)
Out[14]: 
0     True
1    False
2     True
dtype: bool
2 of 5
20

Compare array by first column and check if all Trues per row:

Same solution in numpy for better performance:

Copya = df.values
b = (a == a[:, [0]]).all(axis=1)
print (b)
[ True  True False]

And if need Series:

Copys = pd.Series(b, axis=df.index)

Comparing solutions:

Copydata = [[10,10,10],[12,12,12],[10,12,10]]
df = pd.DataFrame(data,columns=['Col1','Col2','Col3'])

#[30000 rows x 3 columns]
df = pd.concat([df] * 10000, ignore_index=True)

Copy#jez - numpy array
In [14]: %%timeit
    ...: a = df.values
    ...: b = (a == a[:, [0]]).all(axis=1)
141 µs ± 3.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#jez - Series 
In [15]: %%timeit
    ...: a = df.values
    ...: b = (a == a[:, [0]]).all(axis=1)
    ...: pd.Series(b, index=df.index)
169 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#Andy Hayden
In [16]: %%timeit
    ...: df.eq(df.iloc[:, 0], axis=0).all(axis=1)
2.22 ms ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#Wen1
In [17]: %%timeit
    ...: list(map(lambda x : len(set(x))==1,df.values))
56.8 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#K.-Michael Aye
In [18]: %%timeit
    ...: df.apply(lambda x: len(set(x)) == 1, axis=1)
686 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Wen2    
In [19]: %%timeit
    ...: df.nunique(1).eq(1)
2.87 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
🌐
Bobby Hadz
bobbyhadz.com › blog › pandas-check-if-all-values-in-column-are-equal
Check if all values in a Column are Equal in Pandas | bobbyhadz
April 12, 2024 - Copied!import pandas as pd df = ...'].to_numpy() # 👇️ [ True True True True] print(arr[0] == arr) # 👇️ True print((arr[0] == arr).all()) ... If the condition returns True for all array elements, then all values in the ...
🌐
Skytowner
skytowner.com › explore › pandas_dataframe_equals_method
Pandas DataFrame | equals method with Examples
Pandas DataFrame.equals(~) checks whether two DataFrames are identical, that is, all their respective values, column labels and index names are equal, and have the same data type.
Find elsewhere
🌐
Statology
statology.org › home › pandas: how to check if multiple columns are equal
Pandas: How to Check if Multiple Columns are Equal
September 13, 2022 - import pandas as pd #create DataFrame df = pd.DataFrame({'A': [4, 0, 3, 3, 6, 8, 7], 'B': [4, 2, 3, 5, 6, 4, 7], 'C': [4, 0, 3, 3, 5, 10, 7], 'D': [4, 0, 3, 3, 3, 8, 7]}) #view DataFrame print(df) A B C D 0 4 4 4 4 1 0 2 0 0 2 3 3 3 3 3 3 5 3 3 4 6 6 5 3 5 8 4 10 8 6 7 7 7 7 · We can use the following syntax to check if the value in every column in the DataFrame is equal for each row: #create new column that checks if all columns match in each row df['matching'] = df.eq(df.iloc[:, 0], axis=0).all(1) #view updated DataFrame print(df) A B C D matching 0 4 4 4 4 True 1 0 2 0 0 False 2 3 3 3 3 True 3 3 5 3 3 False 4 6 6 5 3 False 5 8 4 10 8 False 6 7 7 7 7 True
🌐
w3resource
w3resource.com › pandas › dataframe › dataframe-equals.php
Pandas DataFrame: equals() function - w3resource
August 19, 2022 - NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype. ... Returns: bool True if all elements are the same in both objects, False otherwise. ... Download the Pandas DataFrame Notebooks from here.
🌐
w3resource
w3resource.com › pandas › series › series-equals.php
Pandas Series: equals() function - w3resource
September 15, 2022 - Returns: bool True if all elements are the same in both objects, False otherwise. ... Example - DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True: ... import numpy as ...
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.compare.html
pandas.DataFrame.compare — pandas 3.0.1 documentation
Keep all original rows and columns and also all original values · >>> df.compare(df2, keep_shape=True, keep_equal=True) col1 col2 col3 self other self other self other 0 a c 1.0 1.0 1.0 1.0 1 a a 2.0 2.0 2.0 2.0 2 b b 3.0 3.0 3.0 4.0 3 b b NaN NaN 4.0 4.0 4 a a 5.0 5.0 5.0 5.0
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-pandas-dataframe-equals
Python | Pandas dataframe.equals() - GeeksforGeeks
November 20, 2018 - Syntax: DataFrame.equals(other) Parameters: other : DataFrame Returns: Scalar : boolean value Example #1: Use equals() function to find the result of comparison between two different dataframe objects. ... # importing pandas as pd import pandas as pd # Creating the first dataframe df1 = pd.DataFrame({"A":[1,5,7,8], "B":[5,8,4,3], "C":[10,4,9,3]}) # Creating the second dataframe df2 = pd.DataFrame({"A":[5,3,6,4], "B":[11,2,4,3], "C":[4,3,8,5]}) # Print the first dataframe df1 # Print the second dataframe df2 Let's find the result of comparison between both the data frames.
🌐
Studytonight
studytonight.com › pandas › pandas-dataframe-equals-method
Pandas DataFrame equals() Method - Studytonight
It returns a bool, True if all elements are the same in both objects, False otherwise. The row/column index does not need to have the same type, as long as the values are considered equal.