A full implementation of what you want can be found here:

series_set = df.apply(frozenset, axis=1)
new_df = series_set.apply(lambda a: series_set.apply(lambda b: jaccard(a,b)))
Answer from Sebastian Mendez on Stack Overflow
Top answer
1 of 2
3

A full implementation of what you want can be found here:

series_set = df.apply(frozenset, axis=1)
new_df = series_set.apply(lambda a: series_set.apply(lambda b: jaccard(a,b)))
2 of 2
3

You could get rid of the nested apply by vectorizing your function. First, get all pair-wise combinations and pass it to a vectorized version of your function -

def jaccard_similarity_score(a, b):
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

i = df.apply(frozenset, 1).to_frame()
j = i.assign(foo=1)
k = j.merge(j, on='foo').drop('foo', 1)
k.columns = ['A', 'B']

fnc = np.vectorize(jaccard_similarity_score)
y = fnc(k['A'], k['B']).reshape(len(df), -1)
y
array([[ 1. ,  0.5,  0.5,  0.5,  0.2,  0.2],
       [ 0.5,  1. ,  0.5,  0.2,  0.5,  0.2],
       [ 0.5,  0.5,  1. ,  0.2,  0.2,  0.5],
       [ 0.5,  0.2,  0.2,  1. ,  0.5,  0.5],
       [ 0.2,  0.5,  0.2,  0.5,  1. ,  0.5],
       [ 0.2,  0.2,  0.5,  0.5,  0.5,  1. ]])

This is already faster, but let's see if we can get even faster.


Using senderle's fast cartesian_product -

def cartesian_product(*arrays):
    la = len(arrays)
    dtype = numpy.result_type(*arrays)
    arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
    for i, a in enumerate(numpy.ix_(*arrays)):
        arr[...,i] = a
    return arr.reshape(-1, la)  


i = df.apply(frozenset, 1).values
j = cartesian_product(i, i)
y = fnc(j[:, 0], j[:, 1]).reshape(-1, len(df))

y

array([[ 1. ,  0.5,  0.5,  0.5,  0.2,  0.2],
       [ 0.5,  1. ,  0.5,  0.2,  0.5,  0.2],
       [ 0.5,  0.5,  1. ,  0.2,  0.2,  0.5],
       [ 0.5,  0.2,  0.2,  1. ,  0.5,  0.5],
       [ 0.2,  0.5,  0.2,  0.5,  1. ,  0.5],
       [ 0.2,  0.2,  0.5,  0.5,  0.5,  1. ]])
๐ŸŒ
Reddit
reddit.com โ€บ r/learnpython โ€บ convert dataframe rows into sets
r/learnpython on Reddit: Convert dataframe rows into sets
June 30, 2021 -

How can I convert my pandas dataframe into this format?

``` sets items weight value

0 set1 a 9 10

1 set1 b 14 100

2 set2 c 5 69

3 set2 d 4 100

Outcome i'm looking for:

set1 = (("a", 9, 10), ("b", 14, 100))

set2 = (("c", 5, 69), ("d", 4, 100))

print(set1)

set1 = (("a", 9, 10), ("b", 14, 100))

Top answer
1 of 2
3
I'm curious what you want this for! I've assumed your dataframe comes with each of your variables in their own column so this starts with a bit to combine a row into a tuple. Then it aggregates the tuples belonging to each set. #Make your example dataframe data=[["set1","a",9,10], ["set1","b",14,100], ["set2","c",5,69], ["set2","d",4,100]] df=pd.DataFrame(columns=["Set","var1","var2","var3"],data=data) #turn your columns into tuples df["tuple"]=list(df[["var1","var2","var3"]].to_records()) #combine df=df.groupby("Set")["tuple"].agg(lambda x: [y for y in x]).reset_index()
2 of 2
2
Note that in your example your set1 and set2 are actually tuples, not sets. I'll assume tuples/lists are what you want, and that you just mean "set" in the mathematical sense as a collection of related elements rather than an actual Python set. I don't think it's possible to use the values in a column as variable names, which is what it seems you want to do with your sets column. Someone will correct me if I'm wrong on that, I hope. However, the logic for creating nested groups based on the sets column is as follows: >>> [g.drop(columns=['sets']).values.tolist() for _, g in df.groupby('sets')] [[['a', 9, 10], ['b', 14, 100]], [['c', 5, 69], ['d', 4, 100]]] Alternatively, if you do want to be able to query your data by set name, different from but similar to print(set1), you can do it this way: >>> sets = df.set_index('sets').groupby('sets').apply(pd.Series.tolist) >>> sets sets set1 [[a, 9, 10], [b, 14, 100]] set2 [[c, 5, 69], [d, 4, 100]] >>> print(sets['set1']) [['a', 9, 10], ['b', 14, 100]]
Discussions

python - Create a set from a series in pandas - Stack Overflow
I have a dataframe extracted from Kaggle's San Fransico Salaries: https://www.kaggle.com/kaggle/sf-salaries and I wish to create a set of the values of a column, for instance 'Status'. This is what I have tried but it brings a list of all the records instead of the set (sf is how I name the data frame). ... According to this webpage, this should work. How to construct a set out of list items in python... More on stackoverflow.com
๐ŸŒ stackoverflow.com
pandas - Convert data frame into set using python - Stack Overflow
Dataframe is a 2D data structure, but set can not have a nested set so you can say its 1D structure ... I have edited my question with data frame. For upset plot I need my output in set ... Yes. I have other columns in csv files but need to work with only INSTANCE_ID so created data frame with that column only ... Since you want to convert ... More on stackoverflow.com
๐ŸŒ stackoverflow.com
Python set to array and dataframe - Stack Overflow
The above code converts the set without a problem into a numpy array. But when I try to create a DataFrame from it I get the following error: ValueError: DataFrame constructor not properly called! So is there any way to convert a python set/nested set into a numpy array/dictionary so I can ... More on stackoverflow.com
๐ŸŒ stackoverflow.com
python - How to convert list into set in pandas? - Stack Overflow
I have a dataframe as below: date uids 0 2018-11-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 1 2018-11-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] ... More on stackoverflow.com
๐ŸŒ stackoverflow.com
๐ŸŒ
IncludeHelp
includehelp.com โ€บ python โ€บ create-a-set-from-a-series-in-pandas.aspx
Python - Create a set from a series in pandas
To create a set from a series in pandas, you have to first find the unique elements using the series.unique() method and then convert it into a set by using the set() method which is an inbuilt method in Python.
๐ŸŒ
Finxter
blog.finxter.com โ€บ home โ€บ learn python blog โ€บ 5 best ways to convert a pandas dataframe to a set
5 Best Ways to Convert a Pandas DataFrame to a Set - Be on the Right Side of Change
February 18, 2024 - One straightforward way to convert DataFrame values to a set is to select a specific series (column) and then pass it to the built-in Python set() function.
๐ŸŒ
Finxter
blog.finxter.com โ€บ 5-best-ways-to-convert-pandas-dataframe-column-values-to-a-set
5 Best Ways to Convert Pandas DataFrame Column Values to a Set โ€“ Be on the Right Side of Change
February 19, 2024 - This method involves directly converting the column values into a list and then casting it to a set. The set() function is a Python built-in that creates a set from an iterable. This method is straightforward and the go-to for a quick conversion. ... import pandas as pd # Creating a pandas ...
๐ŸŒ
Spark By {Examples}
sparkbyexamples.com โ€บ home โ€บ pandas โ€บ create a set from a series in pandas
Create a Set From a Series in Pandas - Spark By {Examples}
March 27, 2024 - We can create a set from a series of pandas by using set(), Series.unique() function. The set object is used to store multiple items which are
Find elsewhere
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ pandas โ€บ create-a-set-from-a-series-in-pandas
Create A Set From A Series In Pandas - GeeksforGeeks
July 23, 2025 - We can directly apply set() function to the pandas series, the set function automatically convert the pandas series into a set. ... In conclusion, creating a set from a Pandas Series in Python is a useful technique for data manipulation.
๐ŸŒ
Saturn Cloud
saturncloud.io โ€บ blog โ€บ converting-dataframe-column-with-type-object-to-a-set-in-python
Saturn Cloud | Saturn Cloud | The Control Plane for GPU Clouds
July 10, 2023 - Saturn drives every layer of the stack from a single control plane. Your customers move from a bare-metal request to a running inference endpoint without leaving the console, and without your team writing the glue between layers.
๐ŸŒ
Note.nkmk.me
note.nkmk.me โ€บ home โ€บ python โ€บ pandas
Convert between pandas DataFrame/Series and Python list | note.nkmk.me
January 24, 2024 - Transpose 2D list in Python (swap rows and columns) print(pd.DataFrame(zip(*l_2d))) # 0 1 # 0 0 30 # 1 10 40 # 2 20 50 ... Row names can be specified with the index argument, and column names with the columns argument. print(pd.Series(l_1d, index=['X', 'Y', 'Z'])) # X 0 # Y 10 # Z 20 # dtype: int64 print(pd.DataFrame(l_2d, index=['X', 'Y'], columns=['A', 'B', 'C'])) # A B C # X 0 10 20 # Y 30 40 50 ... It is also possible to set or change the index and columns after creating a Series or a DataFrame.
๐ŸŒ
Finxter
blog.finxter.com โ€บ 5-best-ways-to-convert-python-set-to-dataframe
5 Best Ways to Convert Python Set to DataFrame โ€“ Be on the Right Side of Change
February 21, 2024 - In the above code, we build a list ... each element of the set in a separate row under the โ€˜Fruitโ€™ column. By creating a pandas Series object from the set, we can easily convert it into a DataFrame....
๐ŸŒ
KDnuggets
kdnuggets.com โ€บ 2019 โ€บ 11 โ€บ set-operations-applied-pandas-dataframes.html
Set Operations Applied to Pandas DataFrames - KDnuggets
In this tutorial you will learn that set operations are one of the best and most natural techniques you can choose to perform such a task. Suppose you have two DataFrames, named P and S, which respectively contain the names and emails from students enrolled in two different courses, SQL and Python.
๐ŸŒ
FavTutor
favtutor.com โ€บ blogs โ€บ list-to-dataframe-python
8 Ways to Convert List to Dataframe in Python (with code)
December 4, 2023 - Understand how to convert list to dataframe using various methods in Python. Learn 8 different methods along with codes.
๐ŸŒ
Pandas
pandas.pydata.org โ€บ docs โ€บ reference โ€บ api โ€บ pandas.DataFrame.astype.html
pandas.DataFrame.astype โ€” pandas 3.0.3 documentation
This method allows the conversion ... DataFrames and Series, to the specified dtype. It supports casting entire objects to a single data type or applying different data types to individual columns using a mapping. ... Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to cast ...