convert dataframe to set python

stackoverflow.com › questions › 47545052 › convert-dataframe-rows-to-python-set

A full implementation of what you want can be found here:

series_set = df.apply(frozenset, axis=1)
new_df = series_set.apply(lambda a: series_set.apply(lambda b: jaccard(a,b)))

Answer from Sebastian Mendez on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 47545052 › convert-dataframe-rows-to-python-set

pandas - Convert dataframe rows to Python set - Stack Overflow

Top answer

1 of 2

A full implementation of what you want can be found here:

series_set = df.apply(frozenset, axis=1)
new_df = series_set.apply(lambda a: series_set.apply(lambda b: jaccard(a,b)))

2 of 2

You could get rid of the nested apply by vectorizing your function. First, get all pair-wise combinations and pass it to a vectorized version of your function -

def jaccard_similarity_score(a, b):
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

i = df.apply(frozenset, 1).to_frame()
j = i.assign(foo=1)
k = j.merge(j, on='foo').drop('foo', 1)
k.columns = ['A', 'B']

fnc = np.vectorize(jaccard_similarity_score)
y = fnc(k['A'], k['B']).reshape(len(df), -1)

y
array([[ 1. ,  0.5,  0.5,  0.5,  0.2,  0.2],
       [ 0.5,  1. ,  0.5,  0.2,  0.5,  0.2],
       [ 0.5,  0.5,  1. ,  0.2,  0.2,  0.5],
       [ 0.5,  0.2,  0.2,  1. ,  0.5,  0.5],
       [ 0.2,  0.5,  0.2,  0.5,  1. ,  0.5],
       [ 0.2,  0.2,  0.5,  0.5,  0.5,  1. ]])

This is already faster, but let's see if we can get even faster.

Using senderle's fast cartesian_product -

def cartesian_product(*arrays):
    la = len(arrays)
    dtype = numpy.result_type(*arrays)
    arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
    for i, a in enumerate(numpy.ix_(*arrays)):
        arr[...,i] = a
    return arr.reshape(-1, la)  


i = df.apply(frozenset, 1).values
j = cartesian_product(i, i)
y = fnc(j[:, 0], j[:, 1]).reshape(-1, len(df))

y

array([[ 1. ,  0.5,  0.5,  0.5,  0.2,  0.2],
       [ 0.5,  1. ,  0.5,  0.2,  0.5,  0.2],
       [ 0.5,  0.5,  1. ,  0.2,  0.2,  0.5],
       [ 0.5,  0.2,  0.2,  1. ,  0.5,  0.5],
       [ 0.2,  0.5,  0.2,  0.5,  1. ,  0.5],
       [ 0.2,  0.2,  0.5,  0.5,  0.5,  1. ]])

reddit.com › r/learnpython › convert dataframe rows into sets

r/learnpython on Reddit: Convert dataframe rows into sets

June 30, 2021 -

How can I convert my pandas dataframe into this format?

``` sets items weight value

0 set1 a 9 10

1 set1 b 14 100

2 set2 c 5 69

3 set2 d 4 100

Outcome i'm looking for:

set1 = (("a", 9, 10), ("b", 14, 100))

set2 = (("c", 5, 69), ("d", 4, 100))

print(set1)

set1 = (("a", 9, 10), ("b", 14, 100))

Top answer

1 of 2

I'm curious what you want this for! I've assumed your dataframe comes with each of your variables in their own column so this starts with a bit to combine a row into a tuple. Then it aggregates the tuples belonging to each set. #Make your example dataframe data=[["set1","a",9,10], ["set1","b",14,100], ["set2","c",5,69], ["set2","d",4,100]] df=pd.DataFrame(columns=["Set","var1","var2","var3"],data=data) #turn your columns into tuples df["tuple"]=list(df[["var1","var2","var3"]].to_records()) #combine df=df.groupby("Set")["tuple"].agg(lambda x: [y for y in x]).reset_index()

2 of 2

Note that in your example your set1 and set2 are actually tuples, not sets. I'll assume tuples/lists are what you want, and that you just mean "set" in the mathematical sense as a collection of related elements rather than an actual Python set. I don't think it's possible to use the values in a column as variable names, which is what it seems you want to do with your sets column. Someone will correct me if I'm wrong on that, I hope. However, the logic for creating nested groups based on the sets column is as follows: >>> [g.drop(columns=['sets']).values.tolist() for _, g in df.groupby('sets')] [[['a', 9, 10], ['b', 14, 100]], [['c', 5, 69], ['d', 4, 100]]] Alternatively, if you do want to be able to query your data by set name, different from but similar to print(set1), you can do it this way: >>> sets = df.set_index('sets').groupby('sets').apply(pd.Series.tolist) >>> sets sets set1 [[a, 9, 10], [b, 14, 100]] set2 [[c, 5, 69], [d, 4, 100]] >>> print(sets['set1']) [['a', 9, 10], ['b', 14, 100]]

Discussions

python - Create a set from a series in pandas - Stack Overflow

I have a dataframe extracted from Kaggle's San Fransico Salaries: https://www.kaggle.com/kaggle/sf-salaries and I wish to create a set of the values of a column, for instance 'Status'. This is what I have tried but it brings a list of all the records instead of the set (sf is how I name the data frame). ... According to this webpage, this should work. How to construct a set out of list items in python... More on stackoverflow.com

stackoverflow.com

pandas - Convert data frame into set using python - Stack Overflow

Dataframe is a 2D data structure, but set can not have a nested set so you can say its 1D structure ... I have edited my question with data frame. For upset plot I need my output in set ... Yes. I have other columns in csv files but need to work with only INSTANCE_ID so created data frame with that column only ... Since you want to convert ... More on stackoverflow.com

stackoverflow.com

Python set to array and dataframe - Stack Overflow

The above code converts the set without a problem into a numpy array. But when I try to create a DataFrame from it I get the following error: ValueError: DataFrame constructor not properly called! So is there any way to convert a python set/nested set into a numpy array/dictionary so I can ... More on stackoverflow.com

stackoverflow.com

python - How to convert list into set in pandas? - Stack Overflow

I have a dataframe as below: date uids 0 2018-11-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] 1 2018-11-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] ... More on stackoverflow.com

stackoverflow.com

IncludeHelp

includehelp.com › python › create-a-set-from-a-series-in-pandas.aspx

Python - Create a set from a series in pandas

To create a set from a series in pandas, you have to first find the unique elements using the series.unique() method and then convert it into a set by using the set() method which is an inbuilt method in Python.

Finxter

blog.finxter.com › home › learn python blog › 5 best ways to convert a pandas dataframe to a set

5 Best Ways to Convert a Pandas DataFrame to a Set - Be on the Right Side of Change

February 18, 2024 - One straightforward way to convert DataFrame values to a set is to select a specific series (column) and then pass it to the built-in Python set() function.

Stack Overflow

stackoverflow.com › questions › 39551566 › create-a-set-from-a-series-in-pandas

python - Create a set from a series in pandas - Stack Overflow

Top answer

1 of 2

120

If you only need to get list of unique values, you can just use unique method. If you want to have Python's set, then do set(some_series)

In [1]: s = pd.Series([1, 2, 3, 1, 1, 4])

In [2]: s.unique()
Out[2]: array([1, 2, 3, 4])

In [3]: set(s)
Out[3]: {1, 2, 3, 4}

However, if you have DataFrame, just select series out of it ( some_data_frame['<col_name>'] ).

2 of 2

With large size series with duplicates the set(some_series) execution-time will evolve exponentially with series size.

Better practice would be to set(some_series.unique()).

A simple exemple showing x16 execution time.

Finxter

blog.finxter.com › 5-best-ways-to-convert-pandas-dataframe-column-values-to-a-set

5 Best Ways to Convert Pandas DataFrame Column Values to a Set – Be on the Right Side of Change

February 19, 2024 - This method involves directly converting the column values into a list and then casting it to a set. The set() function is a Python built-in that creates a set from an iterable. This method is straightforward and the go-to for a quick conversion. ... import pandas as pd # Creating a pandas ...

Stack Overflow

stackoverflow.com › questions › 68994580 › convert-data-frame-into-set-using-python

pandas - Convert data frame into set using python - Stack Overflow

Top answer

1 of 1

Since you want to convert the values in the columns to set, you can use series.agg and pass set as the aggregate:

file1 = df['INSTANCE_ID'].agg(set)

It will get you the values in the column as a set. You can do the same for all the dataframes

SAMPLE RUN

>>> df =  pd.DataFrame({'INSTANCE_ID': [random.randint(0,3) for _ in range(5)]})
>>> df
   INSTANCE_ID
0            0
1            1
2            0
3            1
4            0

>>> df['INSTANCE_ID'].agg(set)
{0, 1}

Since you want the union of all the sets, better option will be just to concatenate all the dataframe column values, then create the set:

result = pd.concat([df1['INSTANCE_ID'], df2['INSTANCE_ID'], ...., dfn['INSTANCE_ID']]).agg(set)

Spark By {Examples}

sparkbyexamples.com › home › pandas › create a set from a series in pandas

Create a Set From a Series in Pandas - Spark By {Examples}

March 27, 2024 - We can create a set from a series of pandas by using set(), Series.unique() function. The set object is used to store multiple items which are

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 52082100 › python-set-to-array-and-dataframe

Python set to array and dataframe - Stack Overflow

Top answer

1 of 4

Pandas can't deal with sets (dicts are ok you can use p.DataFrame.from_dict(s) for those)

What you need to do is to convert your set into a list and then convert to DataFrame:

import pandas as pd

s = {12,34,78,100}
s = list(s)
print(pd.DataFrame(s))

2 of 4

You can use list(s):

import pandas as p
s = {12,34,78,100}
df = p.DataFrame(list(s))
print(df)

GeeksforGeeks

geeksforgeeks.org › pandas › create-a-set-from-a-series-in-pandas

Create A Set From A Series In Pandas - GeeksforGeeks

July 23, 2025 - We can directly apply set() function to the pandas series, the set function automatically convert the pandas series into a set. ... In conclusion, creating a set from a Pandas Series in Python is a useful technique for data manipulation.

Stack Overflow

stackoverflow.com › questions › 33125611 › how-to-convert-list-into-set-in-pandas

python - How to convert list into set in pandas? - Stack Overflow

Top answer

1 of 2

You should use apply method of DataFrame API:

df['uids'] = df.apply(lambda row: set(row['uids']), axis=1)

df = df['uids'].apply(set) # great thanks to EdChum

You can find more information about apply method here.

Examples of use

df = pd.DataFrame({'A': [[1,2,3,4,5,1,1,1], [2,3,4,2,2,2,3,3]]})
df = df['A'].apply(set)

Output:

>>> df
0    set([1, 2, 3, 4, 5])
1          set([2, 3, 4])
Name: A, dtype: object

Or:

>>> df = pd.DataFrame({'A': [[1,2,3,4,5,1,1,1], [2,3,4,2,2,2,3,3]]})
>>> df['A'] = df.apply(lambda row: set(row['A']), axis=1)
>>> df
                      A
0  set([1, 2, 3, 4, 5])
1        set([2, 3, 4])

2 of 2

For anyone who wants to know the fastest way to convert list into set in Pandas:

Method 1:

df['uids'] = df.apply(lambda row: set(row['uids']), axis=1)

Method 2:

df['uids'] = df['uids'].apply(set)

Method 3:

df['uids'] = df['uids'].map(set)

I run timeit with repeat(50, 5) on DF with 4000 rows:

Method 1 - mean:  0.13299, min:  0.12723
Method 2 - mean:  0.01319, min:  0.01207
Method 3 - mean:  0.01261, min:  0.01164

Saturn Cloud

saturncloud.io › blog › converting-dataframe-column-with-type-object-to-a-set-in-python

Saturn Cloud | Saturn Cloud | The Control Plane for GPU Clouds

July 10, 2023 - Saturn drives every layer of the stack from a single control plane. Your customers move from a bare-metal request to a running inference endpoint without leaving the console, and without your team writing the glue between layers.

Note.nkmk.me

note.nkmk.me › home › python › pandas

Convert between pandas DataFrame/Series and Python list | note.nkmk.me

January 24, 2024 - Transpose 2D list in Python (swap rows and columns) print(pd.DataFrame(zip(*l_2d))) # 0 1 # 0 0 30 # 1 10 40 # 2 20 50 ... Row names can be specified with the index argument, and column names with the columns argument. print(pd.Series(l_1d, index=['X', 'Y', 'Z'])) # X 0 # Y 10 # Z 20 # dtype: int64 print(pd.DataFrame(l_2d, index=['X', 'Y'], columns=['A', 'B', 'C'])) # A B C # X 0 10 20 # Y 30 40 50 ... It is also possible to set or change the index and columns after creating a Series or a DataFrame.

Finxter

blog.finxter.com › 5-best-ways-to-convert-python-set-to-dataframe

5 Best Ways to Convert Python Set to DataFrame – Be on the Right Side of Change

February 21, 2024 - In the above code, we build a list ... each element of the set in a separate row under the ‘Fruit’ column. By creating a pandas Series object from the set, we can easily convert it into a DataFrame....

Pandas

pandas.pydata.org › pandas-docs › stable › generated › pandas.DataFrame.html

pandas.DataFrame — pandas 3.0.1 documentation - PyData |

The page has been moved to this page

KDnuggets

kdnuggets.com › 2019 › 11 › set-operations-applied-pandas-dataframes.html

Set Operations Applied to Pandas DataFrames - KDnuggets

In this tutorial you will learn that set operations are one of the best and most natural techniques you can choose to perform such a task. Suppose you have two DataFrames, named P and S, which respectively contain the names and emails from students enrolled in two different courses, SQL and Python.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.html

pandas.DataFrame — pandas 3.0.3 documentation - PyData |

Data type to force. Only a single dtype is allowed. If None, infer. If data is DataFrame then is ignored.

FavTutor

favtutor.com › blogs › list-to-dataframe-python

8 Ways to Convert List to Dataframe in Python (with code)

December 4, 2023 - Understand how to convert list to dataframe using various methods in Python. Learn 8 different methods along with codes.

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.astype.html

pandas.DataFrame.astype — pandas 3.0.3 documentation

This method allows the conversion ... DataFrames and Series, to the specified dtype. It supports casting entire objects to a single data type or applying different data types to individual columns using a mapping. ... Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to cast ...

Stack Overflow

stackoverflow.com › questions › 53965508 › python-dataframe-how-to-convert-set-column-to-list

python dataframe how to convert set column to list - Stack Overflow

Top answer

1 of 2

Use apply:

tdf['c'] = tdf['b'].apply(list)

Because using list is doing to whole column not one by one.

Or do:

tdf['c'] = tdf['b'].map(list)

2 of 2

You could do:

import pandas as pd

data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)

tdf['c'] = [list(e) for e in tdf.b]

print(tdf)