Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist() on to turn them into a Python list. Alternatively you cast it with list(x).
Copyimport pandas as pd
data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data_dict)
print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")
col_one_list = df['one'].tolist()
col_one_arr = df['one'].to_numpy()
print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")
Output:
DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
column types:
one float64
two int64
dtype: object
col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>
col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>
Answer from Ben on Stack OverflowPandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist() on to turn them into a Python list. Alternatively you cast it with list(x).
Copyimport pandas as pd
data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data_dict)
print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")
col_one_list = df['one'].tolist()
col_one_arr = df['one'].to_numpy()
print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")
Output:
DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
column types:
one float64
two int64
dtype: object
col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>
col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>
This returns a numpy array:
Copyarr = df["cluster"].to_numpy()
This returns a numpy array of unique values:
Copyunique_arr = df["cluster"].unique()
You can also use numpy to get the unique values, although there are differences between the two methods:
Copyarr = df["cluster"].to_numpy()
unique_arr = np.unique(arr)
How to deal with list values in Pandas Dataframe?
Extracting only numerical value from a list in a Pandas DataFrame
Mean of a list of Series [Pandas]
How do I parse out a Pandas list-like column into a list, and then explode the row?
Videos
I have a csv file where some of the data cells contains a list, such as ['2.93' '1.02'] (a string). When reading it into a panda Dataframe, what should I do to make sure the entry in the data cell is a list containing float type numbers [2.93 1.02] ? I have tried using the eval() function, but it failed to detect that it is two separate numbers, it just returns ['2.931.02'], which is a list containing a string.