Split the pandas dataframe by a column value

discuss.python.org › t › split-the-pandas-dataframe-by-a-column-value › 25027

Indeed, you’ve astutely identified that the solution with a for loop is not going to be very efficient especially for large dataframes like the one you’re using, due to the quadratic complexity of the search as well as the inherent efficiency of using a for-loop rather than built-in vectorized panda… Answer from CAM-Gerlach on discuss.python.org

pandas.pydata.org › docs › reference › api › pandas.Series.str.split.html

pandas.Series.str.split — pandas 3.0.2 documentation - PyData |

>>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object · The pat parameter can be used to split by other characters.

discuss.python.org › python help

Split the pandas dataframe by a column value - Python Help - Discussions on Python.org

Indeed, you’ve astutely identified that the solution with a for loop is not going to be very efficient especially for large dataframes like the one you’re using, due to the quadratic complexity of the search as well as the inherent efficiency of using a for-loop rather than built-in vectorized panda…

Videos

How to Split DataFrame Columns and Transform Data in Pandas

How to Split a Pandas DataFrame into Multiple DataFrames by DateTime ...

January 13, 2025

Slice pandas DataFrame by Index in Python (Example) | Split at ...

How to Split up Columns in Python Pandas - YouTube

Split a Pandas Column by a Delimiter | Python Tutorial - YouTube

February 10, 2023

SIMPLE String Splitting With Pandas! - YouTube

datagy.io › home › pandas tutorials › pandas dataframes › python: split a pandas dataframe

Python: Split a Pandas Dataframe • datagy

December 16, 2022 - Learn how to split a Pandas dataframe in Python. Split a dataframe by column value, by position, and by random values.

stackoverflow.com › questions › 55585851 › splitting-a-pandas-object

splitting a pandas object - Stack Overflow

Use Series.str.split with select first values of lists by indexing:

df = pd.DataFrame({'col':['45+2','98+3','90+5']})

df['new'] = df['col'].str.split('+').str[0]
print (df)
    col new
0  45+2  45
1  98+3  98
2  90+5  90

Or use Series.str.extract for first integers from values:

df['new'] = df['col'].str.extract('(\d+)')
print (df)
    col new
0  45+2  45
1  98+3  98
2  90+5  90

You can use lambda function for doing this.

df1 = pd.DataFrame(data=['45+2','98+3','90+5'],columns=['col'])
print df1
   col
0  45+2
1  98+3
2  90+5

Delete unwanted parts from the strings in the "col" column

df1['col'] = df1['col'].map(lambda x:x.split('+')[0])
print df1
  col
0  45
1  98
2  90

stackoverflow.com › questions › 19790790 › splitting-dataframe-into-multiple-dataframes

python - Splitting dataframe into multiple dataframes - Stack Overflow

Can I ask why not just do it by slicing the data frame. Something like

#create some data with Names column
data = pd.DataFrame({'Names': ['Joe', 'John', 'Jasper', 'Jez'] *4, 'Ob1' : np.random.rand(16), 'Ob2' : np.random.rand(16)})

#create unique list of names
UniqueNames = data.Names.unique()

#create a data frame dictionary to store your data frames
DataFrameDict = {elem : pd.DataFrame() for elem in UniqueNames}

for key in DataFrameDict.keys():
    DataFrameDict[key] = data[:][data.Names == key]

Hey presto you have a dictionary of data frames just as (I think) you want them. Need to access one? Just enter

DataFrameDict['Joe']

Firstly your approach is inefficient because the appending to the list on a row by basis will be slow as it has to periodically grow the list when there is insufficient space for the new entry, list comprehensions are better in this respect as the size is determined up front and allocated once.

However, I think fundamentally your approach is a little wasteful as you have a dataframe already so why create a new one for each of these users?

I would sort the dataframe by column 'name', set the index to be this and if required not drop the column.

Then generate a list of all the unique entries and then you can perform a lookup using these entries and crucially if you only querying the data, use the selection criteria to return a view on the dataframe without incurring a costly data copy.

Use pandas.DataFrame.sort_values and pandas.DataFrame.set_index:

# sort the dataframe
df.sort_values(by='name', axis=1, inplace=True)

# set the index to be this and don't drop
df.set_index(keys=['name'], drop=False,inplace=True)

# get a list of names
names=df['name'].unique().tolist()

# now we can perform a lookup on a 'view' of the dataframe
joe = df.loc[df.name=='joe']

# now you can query all 'joes'

stackoverflow.com › questions › 17315737 › split-a-large-pandas-dataframe

python - Split a large pandas dataframe - Stack Overflow

Use np.array_split:

Docstring:
Split an array into multiple sub-arrays.

Please refer to the ``split`` documentation.  The only difference
between these functions is that ``array_split`` allows
`indices_or_sections` to be an integer that does *not* equally
divide the axis.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
   ...:                           'foo', 'bar', 'foo', 'foo'],
   ...:                    'B' : ['one', 'one', 'two', 'three',
   ...:                           'two', 'two', 'one', 'three'],
   ...:                    'C' : randn(8), 'D' : randn(8)})

In [3]: print df
     A      B         C         D
0  foo    one -0.174067 -0.608579
1  bar    one -0.860386 -1.210518
2  foo    two  0.614102  1.689837
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861
6  foo    one  0.131529 -0.968151
7  foo  three -1.002946 -0.257468

In [4]: import numpy as np
In [5]: np.array_split(df, 3)
Out[5]: 
[     A    B         C         D
0  foo  one -0.174067 -0.608579
1  bar  one -0.860386 -1.210518
2  foo  two  0.614102  1.689837,
      A      B         C         D
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861,
      A      B         C         D
6  foo    one  0.131529 -0.968151
7  foo  three -1.002946 -0.257468]

I wanted to do the same, and I had first problems with the split function, then problems with installing pandas 0.15.2, so I went back to my old version, and wrote a little function that works very well. I hope this can help!

# input - df: a Dataframe, chunkSize: the chunk size
# output - a list of DataFrame
# purpose - splits the DataFrame into smaller chunks
def split_dataframe(df, chunk_size = 10000): 
    chunks = list()
    num_chunks = len(df) // chunk_size + 1
    for i in range(num_chunks):
        chunks.append(df[i*chunk_size:(i+1)*chunk_size])
    return chunks

pandas.pydata.org › pandas-docs › stable › reference › api › pandas.Series.str.split.html

pandas.Series.str.split — pandas 2.3.3 documentation - PyData |

>>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object · The pat parameter can be used to split by other characters.

geeksforgeeks.org › split-pandas-dataframe-by-rows

Split Pandas Dataframe by Rows - GeeksforGeeks

November 30, 2023 - When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement. We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has nu ... Let's see how can we retrieve the unique values from pandas dataframe. Let's create a dataframe from CSV file. We are using the past data of GDP from different countries. You can get the dataset from here. Python3 # import pandas as pd import pandas as pd gapminder_csv_url ='http://bit.ly/2cLzoxH' #

Find elsewhere

Google Bing Mojeek

Spark By {Examples}

sparkbyexamples.com › home › pandas › how to split pandas dataframe?

How to Split Pandas DataFrame? - Spark By {Examples}

December 6, 2024 - # Output: Courses Fee Discount Duration 0 Spark 22000 1000 35days 1 PySpark 25000 2300 35days 2 Hadoop 23000 1000 40days 3 Python 24000 1200 30days 4 Pandas 26000 2500 25days · We can use the iloc[] attribute to split the given DataFrame.

python-forum.io › thread-30905.html

How to split dataframe object rows to columns

Hi, I have below data frame (multiple rows): I want to split rows at dilimiter to columns df_obj: 20;12;AA;2020/11/05 10:23:13;2020/11/05 10:25:13;12;LL 20;12;PT;2020/11/05 11:18:13;2020/11/05 11:23:43;34;KLY 20;12;BN;2020/11/09 11:18:13;2020/11/05...

delftstack.com › home › howto › python pandas › split pandas dataframe

How to Split Pandas DataFrame | Delft Stack

February 2, 2024 - Apprix Team DataFrame: Name Post Qualification 0 Anish CEO MBA 1 Rabindra CTO MS 2 Manish System Admin MS 3 Samir Consultant PhD 4 Binam Engineer MS Group with Qualification MS: Name Post Qualification 1 Rabindra CTO MS 2 Manish System Admin MS 4 Binam Engineer MS Group with Qualification MBA: Name Post Qualification 0 Anish CEO MBA Group with Qualification PhD: Name Post Qualification 3 Samir Consultant PhD · It splits the DataFrame apprix_df into three parts based on the value of the Qualification column.

geeksforgeeks.org › python › split-dataframe-in-pandas-based-on-values-in-multiple-columns

Split dataframe in Pandas based on values in multiple columns - GeeksforGeeks

July 23, 2025 - In this article, we are going to see how to divide a dataframe by various methods and based on various parameters using Python.

java2blog.com › home › python › pandas › split dataframe in pandas

Split dataframe in Pandas - Java2Blog

July 28, 2021 - We will discuss different methods to split dataframe in Python.

stackoverflow.com › questions › 41624241 › pandas-split-dataframe-into-two-dataframes-at-a-specific-column

python - Pandas Split Dataframe into two Dataframes at a specific column - Stack Overflow

`iloc`

df1 = datasX.iloc[:, :72]
df2 = datasX.iloc[:, 72:]

(iloc docs)

use np.split(..., axis=1):

Demo:

In [255]: df = pd.DataFrame(np.random.rand(5, 6), columns=list('abcdef'))

In [256]: df
Out[256]:
          a         b         c         d         e         f
0  0.823638  0.767999  0.460358  0.034578  0.592420  0.776803
1  0.344320  0.754412  0.274944  0.545039  0.031752  0.784564
2  0.238826  0.610893  0.861127  0.189441  0.294646  0.557034
3  0.478562  0.571750  0.116209  0.534039  0.869545  0.855520
4  0.130601  0.678583  0.157052  0.899672  0.093976  0.268974

In [257]: dfs = np.split(df, [4], axis=1)

In [258]: dfs[0]
Out[258]:
          a         b         c         d
0  0.823638  0.767999  0.460358  0.034578
1  0.344320  0.754412  0.274944  0.545039
2  0.238826  0.610893  0.861127  0.189441
3  0.478562  0.571750  0.116209  0.534039
4  0.130601  0.678583  0.157052  0.899672

In [259]: dfs[1]
Out[259]:
          e         f
0  0.592420  0.776803
1  0.031752  0.784564
2  0.294646  0.557034
3  0.869545  0.855520
4  0.093976  0.268974

np.split() is pretty flexible - let's split an original DF into 3 DFs at columns with indexes [2,3]:

In [260]: dfs = np.split(df, [2,3], axis=1)

In [261]: dfs[0]
Out[261]:
          a         b
0  0.823638  0.767999
1  0.344320  0.754412
2  0.238826  0.610893
3  0.478562  0.571750
4  0.130601  0.678583

In [262]: dfs[1]
Out[262]:
          c
0  0.460358
1  0.274944
2  0.861127
3  0.116209
4  0.157052

In [263]: dfs[2]
Out[263]:
          d         e         f
0  0.034578  0.592420  0.776803
1  0.545039  0.031752  0.784564
2  0.189441  0.294646  0.557034
3  0.534039  0.869545  0.855520
4  0.899672  0.093976  0.268974

Untitled Publication

elisa.hashnode.dev › split-a-dataframe

Split a Pandas Dataframe in Python - Elisa's Blog - Hashnode

September 28, 2021 - Split a Pandas Dataframe into two parts, either sequentially or randomly. Select the number of rows or size ratio for the split dataframes.

saturncloud.io › blog › how-to-split-pandas-dataframe-column-values-in-python

How to Split Pandas Dataframe Column Values in Python | Saturn Cloud Blog

October 27, 2023 - In this article, we explored how to split Pandas dataframe column values in Python. We learned how to split a column into two columns using a single separator and how to split a column into multiple columns using multiple separators.

stackoverflow.com › questions › 14745022 › how-to-split-a-dataframe-string-column-into-two-columns

python - How to split a dataframe string column into two columns? - Stack Overflow

TL;DR version:

For the simple case of:

I have a text column with a delimiter and I want two columns

The simplest solution is:

df[['A', 'B']] = df['AB'].str.split(' ', n=1, expand=True)

You must use expand=True if your strings have a non-uniform number of splits and you want None to replace the missing values.

Notice how, in either case, the .tolist() method is not necessary. Neither is zip().

In detail:

Andy Hayden's solution is most excellent in demonstrating the power of the str.extract() method.

But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the .str.split() method is enough¹. It operates on a column (Series) of strings, and returns a column (Series) of lists:

>>> import pandas as pd
>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2']})
>>> df

      AB
0  A1-B1
1  A2-B2
>>> df['AB_split'] = df['AB'].str.split('-')
>>> df

      AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]

1: If you're unsure what the first two parameters of .str.split() do, I recommend the docs for the plain Python version of the method.

But how do you go from:

a column containing two-element lists

to:

two columns, each containing the respective element of the lists?

Well, we need to take a closer look at the .str attribute of a column.

It's a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:

>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df

   U
0  A
1  B
2  C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df

   U  L
0  A  a
1  B  b
2  C  c

But it also has an "indexing" interface for getting each element of a string by its index:

>>> df['AB'].str[0]

0    A
1    A
Name: AB, dtype: object

>>> df['AB'].str[1]

0    1
1    2
Name: AB, dtype: object

Of course, this indexing interface of .str doesn't really care if each element it's indexing is actually a string, as long as it can be indexed, so:

>>> df['AB'].str.split('-', 1).str[0]

0    A1
1    A2
Name: AB, dtype: object

>>> df['AB'].str.split('-', 1).str[1]

0    B1
1    B2
Name: AB, dtype: object

Then, it's a simple matter of taking advantage of the Python tuple unpacking of iterables to do

>>> df['A'], df['B'] = df['AB'].str.split('-', n=1).str
>>> df

      AB  AB_split   A   B
0  A1-B1  [A1, B1]  A1  B1
1  A2-B2  [A2, B2]  A2  B2

Of course, getting a DataFrame out of splitting a column of strings is so useful that the .str.split() method can do it for you with the expand=True parameter:

>>> df['AB'].str.split('-', n=1, expand=True)

    0   1
0  A1  B1
1  A2  B2

So, another way of accomplishing what we wanted is to do:

>>> df = df[['AB']]
>>> df

      AB
0  A1-B1
1  A2-B2

>>> df.join(df['AB'].str.split('-', n=1, expand=True).rename(columns={0:'A', 1:'B'}))

      AB   A   B
0  A1-B1  A1  B1
1  A2-B2  A2  B2

The expand=True version, although longer, has a distinct advantage over the tuple unpacking method. Tuple unpacking doesn't deal well with splits of different lengths:

>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2', 'A3-B3-C3']})
>>> df
         AB
0     A1-B1
1     A2-B2
2  A3-B3-C3
>>> df['A'], df['B'], df['C'] = df['AB'].str.split('-')
Traceback (most recent call last):
  [...]    
ValueError: Length of values does not match length of index
>>>

But expand=True handles it nicely by placing None in the columns for which there aren't enough "splits":

>>> df.join(
...     df['AB'].str.split('-', expand=True).rename(
...         columns={0:'A', 1:'B', 2:'C'}
...     )
... )
         AB   A   B     C
0     A1-B1  A1  B1  None
1     A2-B2  A2  B2  None
2  A3-B3-C3  A3  B3    C3

There might be a better way, but this here's one approach:

                            row
    0       00000 UNITED STATES
    1             01000 ALABAMA
    2  01001 Autauga County, AL
    3  01003 Baldwin County, AL
    4  01005 Barbour County, AL

df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
                                 columns = ['fips','row'])

   fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

delftstack.com › home › howto › python pandas › split column in python pandas

How to Split a Single Column Into Multiple Columns in Pandas DataFrame Column | Delft Stack

February 5, 2025 - Dataframe series : Email Number ... Columbus,Georgia · We will use the Series.str.split() function to separate the Number column and pass the - in split() method ....

Spark By {Examples}

sparkbyexamples.com › home › pandas › split pandas dataframe by column value

Split Pandas DataFrame by Column Value - Spark By {Examples}

June 27, 2025 - To run some examples of split Pandas DataFrame by column value, let’s create Pandas DataFrame using data from a dictionary. import pandas as pd import numpy as np technologies= { 'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"], 'Fee' :[22000, 25000, 23000, 24000, 26000], 'Discount':[1000, 2300, 1000, 1200, 2500], 'Duration':['35days', '35days', '40days', '30days', '25days'] } df = pd.DataFrame(technologies) print("Create DateFrame:\n", df)

note.nkmk.me › home › python › pandas

pandas: Split string columns by delimiters or regular expressions | note.nkmk.me

March 26, 2023 - import re pat = re.compile(r'@.+\.') print(s_org.str.split(pat)) # A [aaa, com] # B [bbb, com] # C [ccc] # dtype: object ... pat is always treated as a regular expression pattern if regex=True, and as a normal string if regex=False. If you want to specify a one-character regular expression pattern or a two-or-more-character normal string, set True or False. Keep in mind that setting regex=False and specifying a compiled regular expression pattern for pat will cause an error. To get the split result as a pandas.DataFrame with multiple columns, set the expand argument to True.