Indeed, you’ve astutely identified that the solution with a for loop is not going to be very efficient especially for large dataframes like the one you’re using, due to the quadratic complexity of the search as well as the inherent efficiency of using a for-loop rather than built-in vectorized panda… Answer from CAM-Gerlach on discuss.python.org
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.Series.str.split.html
pandas.Series.str.split — pandas 3.0.2 documentation - PyData |
>>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object · The pat parameter can be used to split by other characters.
🌐
datagy
datagy.io › home › pandas tutorials › pandas dataframes › python: split a pandas dataframe
Python: Split a Pandas Dataframe • datagy
December 16, 2022 - Learn how to split a Pandas dataframe in Python. Split a dataframe by column value, by position, and by random values.
Top answer
1 of 13
113

Can I ask why not just do it by slicing the data frame. Something like

#create some data with Names column
data = pd.DataFrame({'Names': ['Joe', 'John', 'Jasper', 'Jez'] *4, 'Ob1' : np.random.rand(16), 'Ob2' : np.random.rand(16)})

#create unique list of names
UniqueNames = data.Names.unique()

#create a data frame dictionary to store your data frames
DataFrameDict = {elem : pd.DataFrame() for elem in UniqueNames}

for key in DataFrameDict.keys():
    DataFrameDict[key] = data[:][data.Names == key]

Hey presto you have a dictionary of data frames just as (I think) you want them. Need to access one? Just enter

DataFrameDict['Joe']
2 of 13
79

Firstly your approach is inefficient because the appending to the list on a row by basis will be slow as it has to periodically grow the list when there is insufficient space for the new entry, list comprehensions are better in this respect as the size is determined up front and allocated once.

However, I think fundamentally your approach is a little wasteful as you have a dataframe already so why create a new one for each of these users?

I would sort the dataframe by column 'name', set the index to be this and if required not drop the column.

Then generate a list of all the unique entries and then you can perform a lookup using these entries and crucially if you only querying the data, use the selection criteria to return a view on the dataframe without incurring a costly data copy.

Use pandas.DataFrame.sort_values and pandas.DataFrame.set_index:

# sort the dataframe
df.sort_values(by='name', axis=1, inplace=True)

# set the index to be this and don't drop
df.set_index(keys=['name'], drop=False,inplace=True)

# get a list of names
names=df['name'].unique().tolist()

# now we can perform a lookup on a 'view' of the dataframe
joe = df.loc[df.name=='joe']

# now you can query all 'joes'
Top answer
1 of 11
359

Use np.array_split:

Docstring:
Split an array into multiple sub-arrays.

Please refer to the ``split`` documentation.  The only difference
between these functions is that ``array_split`` allows
`indices_or_sections` to be an integer that does *not* equally
divide the axis.
In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
   ...:                           'foo', 'bar', 'foo', 'foo'],
   ...:                    'B' : ['one', 'one', 'two', 'three',
   ...:                           'two', 'two', 'one', 'three'],
   ...:                    'C' : randn(8), 'D' : randn(8)})

In [3]: print df
     A      B         C         D
0  foo    one -0.174067 -0.608579
1  bar    one -0.860386 -1.210518
2  foo    two  0.614102  1.689837
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861
6  foo    one  0.131529 -0.968151
7  foo  three -1.002946 -0.257468

In [4]: import numpy as np
In [5]: np.array_split(df, 3)
Out[5]: 
[     A    B         C         D
0  foo  one -0.174067 -0.608579
1  bar  one -0.860386 -1.210518
2  foo  two  0.614102  1.689837,
      A      B         C         D
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861,
      A      B         C         D
6  foo    one  0.131529 -0.968151
7  foo  three -1.002946 -0.257468]
2 of 11
90

I wanted to do the same, and I had first problems with the split function, then problems with installing pandas 0.15.2, so I went back to my old version, and wrote a little function that works very well. I hope this can help!

# input - df: a Dataframe, chunkSize: the chunk size
# output - a list of DataFrame
# purpose - splits the DataFrame into smaller chunks
def split_dataframe(df, chunk_size = 10000): 
    chunks = list()
    num_chunks = len(df) // chunk_size + 1
    for i in range(num_chunks):
        chunks.append(df[i*chunk_size:(i+1)*chunk_size])
    return chunks
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › reference › api › pandas.Series.str.split.html
pandas.Series.str.split — pandas 2.3.3 documentation - PyData |
>>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object · The pat parameter can be used to split by other characters.
🌐
GeeksforGeeks
geeksforgeeks.org › split-pandas-dataframe-by-rows
Split Pandas Dataframe by Rows - GeeksforGeeks
November 30, 2023 - When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement. We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has nu ... Let's see how can we retrieve the unique values from pandas dataframe. Let's create a dataframe from CSV file. We are using the past data of GDP from different countries. You can get the dataset from here. Python3 # import pandas as pd import pandas as pd gapminder_csv_url ='http://bit.ly/2cLzoxH' #
Find elsewhere
🌐
Spark By {Examples}
sparkbyexamples.com › home › pandas › how to split pandas dataframe?
How to Split Pandas DataFrame? - Spark By {Examples}
December 6, 2024 - # Output: Courses Fee Discount Duration 0 Spark 22000 1000 35days 1 PySpark 25000 2300 35days 2 Hadoop 23000 1000 40days 3 Python 24000 1200 30days 4 Pandas 26000 2500 25days · We can use the iloc[] attribute to split the given DataFrame.
🌐
Python Forum
python-forum.io › thread-30905.html
How to split dataframe object rows to columns
Hi, I have below data frame (multiple rows): I want to split rows at dilimiter to columns df_obj: 20;12;AA;2020/11/05 10:23:13;2020/11/05 10:25:13;12;LL 20;12;PT;2020/11/05 11:18:13;2020/11/05 11:23:43;34;KLY 20;12;BN;2020/11/09 11:18:13;2020/11/05...
🌐
Delft Stack
delftstack.com › home › howto › python pandas › split pandas dataframe
How to Split Pandas DataFrame | Delft Stack
February 2, 2024 - Apprix Team DataFrame: Name Post Qualification 0 Anish CEO MBA 1 Rabindra CTO MS 2 Manish System Admin MS 3 Samir Consultant PhD 4 Binam Engineer MS Group with Qualification MS: Name Post Qualification 1 Rabindra CTO MS 2 Manish System Admin MS 4 Binam Engineer MS Group with Qualification MBA: Name Post Qualification 0 Anish CEO MBA Group with Qualification PhD: Name Post Qualification 3 Samir Consultant PhD · It splits the DataFrame apprix_df into three parts based on the value of the Qualification column.
🌐
GeeksforGeeks
geeksforgeeks.org › python › split-dataframe-in-pandas-based-on-values-in-multiple-columns
Split dataframe in Pandas based on values in multiple columns - GeeksforGeeks
July 23, 2025 - In this article, we are going to see how to divide a dataframe by various methods and based on various parameters using Python.
🌐
Untitled Publication
elisa.hashnode.dev › split-a-dataframe
Split a Pandas Dataframe in Python - Elisa's Blog - Hashnode
September 28, 2021 - Split a Pandas Dataframe into two parts, either sequentially or randomly. Select the number of rows or size ratio for the split dataframes.
🌐
Saturn Cloud
saturncloud.io › blog › how-to-split-pandas-dataframe-column-values-in-python
How to Split Pandas Dataframe Column Values in Python | Saturn Cloud Blog
October 27, 2023 - In this article, we explored how to split Pandas dataframe column values in Python. We learned how to split a column into two columns using a single separator and how to split a column into multiple columns using multiple separators.
Top answer
1 of 12
832

TL;DR version:

For the simple case of:

  • I have a text column with a delimiter and I want two columns

The simplest solution is:

df[['A', 'B']] = df['AB'].str.split(' ', n=1, expand=True)

You must use expand=True if your strings have a non-uniform number of splits and you want None to replace the missing values.

Notice how, in either case, the .tolist() method is not necessary. Neither is zip().

In detail:

Andy Hayden's solution is most excellent in demonstrating the power of the str.extract() method.

But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the .str.split() method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:

>>> import pandas as pd
>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2']})
>>> df

      AB
0  A1-B1
1  A2-B2
>>> df['AB_split'] = df['AB'].str.split('-')
>>> df

      AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]

1: If you're unsure what the first two parameters of .str.split() do, I recommend the docs for the plain Python version of the method.

But how do you go from:

  • a column containing two-element lists

to:

  • two columns, each containing the respective element of the lists?

Well, we need to take a closer look at the .str attribute of a column.

It's a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:

>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df

   U
0  A
1  B
2  C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df

   U  L
0  A  a
1  B  b
2  C  c

But it also has an "indexing" interface for getting each element of a string by its index:

>>> df['AB'].str[0]

0    A
1    A
Name: AB, dtype: object

>>> df['AB'].str[1]

0    1
1    2
Name: AB, dtype: object

Of course, this indexing interface of .str doesn't really care if each element it's indexing is actually a string, as long as it can be indexed, so:

>>> df['AB'].str.split('-', 1).str[0]

0    A1
1    A2
Name: AB, dtype: object

>>> df['AB'].str.split('-', 1).str[1]

0    B1
1    B2
Name: AB, dtype: object

Then, it's a simple matter of taking advantage of the Python tuple unpacking of iterables to do

>>> df['A'], df['B'] = df['AB'].str.split('-', n=1).str
>>> df

      AB  AB_split   A   B
0  A1-B1  [A1, B1]  A1  B1
1  A2-B2  [A2, B2]  A2  B2

Of course, getting a DataFrame out of splitting a column of strings is so useful that the .str.split() method can do it for you with the expand=True parameter:

>>> df['AB'].str.split('-', n=1, expand=True)

    0   1
0  A1  B1
1  A2  B2

So, another way of accomplishing what we wanted is to do:

>>> df = df[['AB']]
>>> df

      AB
0  A1-B1
1  A2-B2

>>> df.join(df['AB'].str.split('-', n=1, expand=True).rename(columns={0:'A', 1:'B'}))

      AB   A   B
0  A1-B1  A1  B1
1  A2-B2  A2  B2

The expand=True version, although longer, has a distinct advantage over the tuple unpacking method. Tuple unpacking doesn't deal well with splits of different lengths:

>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2', 'A3-B3-C3']})
>>> df
         AB
0     A1-B1
1     A2-B2
2  A3-B3-C3
>>> df['A'], df['B'], df['C'] = df['AB'].str.split('-')
Traceback (most recent call last):
  [...]    
ValueError: Length of values does not match length of index
>>> 

But expand=True handles it nicely by placing None in the columns for which there aren't enough "splits":

>>> df.join(
...     df['AB'].str.split('-', expand=True).rename(
...         columns={0:'A', 1:'B', 2:'C'}
...     )
... )
         AB   A   B     C
0     A1-B1  A1  B1  None
1     A2-B2  A2  B2  None
2  A3-B3-C3  A3  B3    C3
2 of 12
188

There might be a better way, but this here's one approach:

                            row
    0       00000 UNITED STATES
    1             01000 ALABAMA
    2  01001 Autauga County, AL
    3  01003 Baldwin County, AL
    4  01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
                                 columns = ['fips','row'])
   fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL
🌐
Delft Stack
delftstack.com › home › howto › python pandas › split column in python pandas
How to Split a Single Column Into Multiple Columns in Pandas DataFrame Column | Delft Stack
February 5, 2025 - Dataframe series : Email Number ... Columbus,Georgia · We will use the Series.str.split() function to separate the Number column and pass the - in split() method ....
🌐
Spark By {Examples}
sparkbyexamples.com › home › pandas › split pandas dataframe by column value
Split Pandas DataFrame by Column Value - Spark By {Examples}
June 27, 2025 - To run some examples of split Pandas DataFrame by column value, let’s create Pandas DataFrame using data from a dictionary. import pandas as pd import numpy as np technologies= { 'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"], 'Fee' :[22000, 25000, 23000, 24000, 26000], 'Discount':[1000, 2300, 1000, 1200, 2500], 'Duration':['35days', '35days', '40days', '30days', '25days'] } df = pd.DataFrame(technologies) print("Create DateFrame:\n", df)
🌐
Note.nkmk.me
note.nkmk.me › home › python › pandas
pandas: Split string columns by delimiters or regular expressions | note.nkmk.me
March 26, 2023 - import re pat = re.compile(r'@.+\.') print(s_org.str.split(pat)) # A [aaa, com] # B [bbb, com] # C [ccc] # dtype: object ... pat is always treated as a regular expression pattern if regex=True, and as a normal string if regex=False. If you want to specify a one-character regular expression pattern or a two-or-more-character normal string, set True or False. Keep in mind that setting regex=False and specifying a compiled regular expression pattern for pat will cause an error. To get the split result as a pandas.DataFrame with multiple columns, set the expand argument to True.