pandas set intersection

pandas.pydata.org › docs › reference › api › pandas.Index.intersection.html

pandas.Index.intersection — pandas 3.0.3 documentation

>>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Index([3, 4], dtype='int64')

stackoverflow.com › questions › 18079563 › finding-the-intersection-between-two-series-in-pandas

python - Finding the intersection between two series in Pandas - Stack Overflow

kdnuggets.com › 2019 › 11 › set-operations-applied-pandas-dataframes.html

1 of 6

116

Place both series in Python's set container then use the set intersection method:

s1.intersection(s2)

and then transform back to list if needed.

Just noticed pandas in the tag. Can translate back to that:

pd.Series(list(set(s1).intersection(set(s2))))

From comments I have changed this to a more Pythonic expression, which is shorter and easier to read:

Series(list(set(s1) & set(s2)))

should do the trick, except if the index data is also important to you.

Have added the list(...) to translate the set before going to pd.Series as pandas does not accept a set as direct input for a Series.

2 of 6

Setup:

s1 = pd.Series([4,5,6,20,42])
s2 = pd.Series([1,2,3,5,42])

Timings:

%%timeit
pd.Series(list(set(s1).intersection(set(s2))))
10000 loops, best of 3: 57.7 µs per loop

%%timeit
pd.Series(np.intersect1d(s1,s2))
1000 loops, best of 3: 659 µs per loop

%%timeit
pd.Series(np.intersect1d(s1.values,s2.values))
10000 loops, best of 3: 64.7 µs per loop

So the numpy solution can be comparable to the set solution even for small series, if one uses the values explicitly.

KDnuggets

Set Operations Applied to Pandas DataFrames - KDnuggets

P ∩ S, the intersection of P and S, is the set of elements that are in both P and S. Now, only Elizabeth appears, because she is the only in both sets. P − S, the difference of P and S, is the set that includes all elements that are in P but not in S: ... It is important to remark that ...

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.Index.intersection.html

pandas.Index.intersection — pandas 3.0.2 documentation

>>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Index([3, 4], dtype='int64')

Statology

statology.org › home › how to find the intersection between series in pandas

How to Find the Intersection Between Series in Pandas

January 20, 2022 - The result is a set that contains the values 4, 5, and 10. These are the only three values that are in both the first and second Series. Also note that this syntax works with pandas Series that contain strings: import pandas as pd #create two Series series1 = pd.Series(['A', 'B', 'C', 'D', 'E']) series2 = pd.Series(['A', 'B', 'B', 'B', 'F']) #find intersection between the two series set(series1) & set(series2) {'A', 'B'}

Towards Data Science

towardsdatascience.com › home › latest › finding the intersection of python sets: a practical use case

Finding the Intersection of Python Sets: A Practical Use Case | Towards Data Science

March 5, 2025 - We will use the popular Tableau sample superstore dataset and run a simple market basket analysis to find out what products tend to be purchased together from the superstore, using the set.intersection() method! You can download the sample superstore data from here. This free, public dataset contains information about a superstore’s products, sales, customer purchase histories, etc. from 2014 to 2017. We are interested in finding out what product sub-categories are likely to be purchased together. Let’s first import the data into a pandas dataframe.

pandas.pydata.org › pandas-docs › version › 0.22 › generated › pandas.Index.intersection.html

pandas.Index.intersection — pandas 0.22.0 documentation

>>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64')

W3Schools

w3schools.com › python › ref_set_intersection.asp

Python Set intersection() Method

The intersection() method returns a set that contains the similarity between two or more sets.

Arab Psychology

scales.arabpsychology.com › home › how to find common values between pandas series

How To Find Common Values Between Pandas Series

December 1, 2025 - We will detail the methodology, provide concrete coding examples, and explore how this technique applies equally well to both numerical and string-based data housed within Pandas structures. The primary method demonstrated here uses the Python set() conversion coupled with the bitwise AND operator (&), which acts as the intersection operator for Python sets.

Find elsewhere

Google Bing Mojeek

Spark By {Examples}

sparkbyexamples.com › home › pandas › find intersection between two series in pandas?

Find Intersection Between Two Series in Pandas? - Spark By {Examples}

March 27, 2024 - We can find the intersection between the two Pandas Series in different ways. Intersection means common elements of given Series. In this article, I will

stackoverflow.com › questions › 45239540 › intersection-of-sets-as-columns-in-pandas

python - Intersection of sets as columns in pandas - Stack Overflow

geeksforgeeks.org › intersection-of-two-dataframe-in-pandas-python

1 of 2

Working with sets, lists and dicts in pandas is a bit problematic, because best working with scalars:

df['k'] = [x[0] & x[1] for x in zip(df['i'], df['j'])]
print (df)
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

df['k'] = [x[0].intersection(x[1]) for x in zip(df['i'], df['j'])]
print (df)
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

Solution with apply:

df['k'] = df.apply(lambda x: x['i'].intersection(x['j']), axis=1)
print (df)
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

2 of 2

You can reproduce the set intersection using set differences. The intersection between A and B is equal to A minus the elements of A that are not in B. (You can symmetrical do it using B).

So, you can use dataframe sub method to operate set differences:

df['k'] = df['i'].sub(df['i'].sub(df['j']))
# df['k'] = df['j'].sub(df['j'].sub(df['i'])) # equivalent

Which gives the expected output:

df
Out[11]: 
              i       j       k
0  {1, 2, 3, 4}  {2, 3}  {2, 3}
1  {1, 2, 3, 4}     {1}     {1}
2  {1, 2, 3, 4}     {4}     {4}
3  {1, 2, 3, 4}  {3, 4}  {3, 4}

GeeksforGeeks

Intersection of two dataframe in Pandas - Python - GeeksforGeeks

July 26, 2020 - Intersection of Two data frames in Pandas can be easily calculated by using the pre-defined function merge(). This function takes both the data frames as argument and returns the intersection between them.

GeeksforGeeks

geeksforgeeks.org › python-pandas-index-intersection

Python | Pandas Index.intersection() | GeeksforGeeks

December 17, 2018 - # find intersection and maintain # ordering of labels based on idx1 idx1.intersection(idx2) Output : Note : The missing values in both the indexes are considered common to each other. ... Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

pandas.pydata.org › pandas-docs › version › 0.18 › generated › pandas.Index.intersection.html

pandas.Index.intersection — pandas 0.18.1 documentation

>>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64')

Dontusethiscode

dontusethiscode.com › blog › 2024-03-06_indexes_and_sets.html

Python Set vs Pandas.Index

An underappreciated feature of Index objects is that they also implement the set vocabulary: from pandas import Index idx1 = Index(['a', 'b', 'c', 'd' ]) idx2 = Index([ 'c', 'd', 'e', 'f']) print( f'{idx1.union(idx2) = }', f'{idx1.intersection(idx2) = }', f'{idx1.difference(idx2) = }', f'{idx2.difference(idx1) = }', f'{idx1.symmetric_difference(idx2) = }', sep='\n', )

stackoverflow.com › questions › 68077122 › performance-pandas-index-intersection-vs-set-intersection

python - Performance: Pandas index.intersection() vs set intersection - Stack Overflow

discuss.python.org › python help

1 of 2

Index in pandas is a NumPy array. As such, it is going to have a worse performance characteristic for set operations than Python set which is optimized for such an operation - underlying implementation is a hash map which greatly reduces the time complexity of checking if a value is in a set to O(1).

For the NumPy array optimization is for quick traversal, so it won't be ever so fast to perform an operation alluding to set operation by its name but actually performed in a much different way.

In your particular situation the gain may be in the elegance of the call to one method instead of using an expression that is somewhat more cryptic on the first glance.

2 of 2

The accepted answer is wrong in equating Pandas Index to a NumPy Array. In reality, Pandas Index is based on a hash table, which is why it can only contain hashable objects (or supposed to).

Pandas internal method is slower because it is not optimised for intersecting unordered Indexes. If you look into source code as of 2024 (ver 2.2), you will see that ix_a.intersection(ix_b) makes several fast path checks and then defaults to building an indexer from ix_b to ix_a (or other way around, not sure). In other words, to answer the question

what elements do ix_a and ix_b have in common?

it first answers the question

where are elements of ix_a located in ix_b?

which is a more difficult question and requires to do more work than needed.

Now if your Indexes are ordered (they contain elements that increase or decrease), then ix_a.intersection(ix_b) will outperform Python built-in sets (in some cases for sure) by taking a fast path and taking advantage of the order. I suppose, Pandas just traverses both arrays in a "merge-sort" fashion.

Python.org

Find the intersection of one list and one dataframe - Python Help - Discussions on Python.org

October 19, 2023 - df = pd.DataFrame({'a': [7,8,9], 'b':[4,5,6]}) y = [11,1,6] I want to find the intersection of df[‘b’] and y, but this [x for x in y if x in df['b']] gives me [1] but I expect [6] In order to get [6], I need [x for x in y if x in list(df['b'])] Why doesn’t my first way give me a [6]?

pandas.pydata.org › pandas-docs › version › 0.23 › generated › pandas.Index.intersection.html

pandas.Index.intersection — pandas 0.23.1 documentation

>>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64')

stackoverflow.com › questions › 46427558 › pandas-multiple-column-intersection

python - Pandas multiple column intersection - Stack Overflow