UPDATE: memory saving method - first set a new index with a gap for a new row:

In [30]: df
Out[30]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

if we want to insert a new row between indexes 1 and 2, we split the index at position 2:

In [31]: idxs = np.split(df.index, 2)

set a new index (with gap at position 2):

In [32]: df.set_index(idxs[0].union(idxs[1] + 1), inplace=True)

In [33]: df
Out[33]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
3    D    E    1
4    E    F    1

insert new row with index 2:

In [34]: df.loc[2] = ['X','X',2]

In [35]: df
Out[35]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
3    D    E    1
4    E    F    1
2    X    X    2

sort index:

In [38]: df.sort_index(inplace=True)

In [39]: df
Out[39]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    X    X    2
3    D    E    1
4    E    F    1

PS you also can insert DataFrame instead of single row using df.append(new_df):

In [42]: df
Out[42]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

In [43]: idxs = np.split(df.index, 2)

In [45]: new_df = pd.DataFrame([['X', 'X', 10], ['Y','Y',11]], columns=df.columns)

In [49]: new_df.index += idxs[1].min()

In [51]: new_df
Out[51]:
  Col1 Col2  Col3
2    X    X    10
3    Y    Y    11

In [52]: df = df.set_index(idxs[0].union(idxs[1]+len(new_df)))

In [53]: df
Out[53]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
4    D    E    1
5    E    F    1

In [54]: df = df.append(new_df)

In [55]: df
Out[55]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
4    D    E    1
5    E    F    1
2    X    X   10
3    Y    Y   11

In [56]: df.sort_index(inplace=True)

In [57]: df
Out[57]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    X    X   10
3    Y    Y   11
4    D    E    1
5    E    F    1

OLD answer:

One (among many) way to achieve that would be to split your DF and concatenate it together with needed DF in desired order:

Original DF:

In [12]: df
Out[12]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

let's split it into two parts ([0:1], [2:end]):

In [13]: dfs = np.split(df, [2])

In [14]: dfs
Out[14]:
[  Col1 Col2 Col3
 0    A    B    1
 1    B    C    1,   Col1 Col2 Col3
 2    D    E    1
 3    E    F    1]

now we can concatenate it together with a new DF in desired order:

In [15]: pd.concat([dfs[0], pd.DataFrame([['C','D', 1]], columns=df.columns), dfs[1]], ignore_index=True)
Out[15]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    C    D    1
3    D    E    1
4    E    F    1
Answer from MaxU - stand with Ukraine on Stack Overflow
Top answer
1 of 1
5

UPDATE: memory saving method - first set a new index with a gap for a new row:

In [30]: df
Out[30]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

if we want to insert a new row between indexes 1 and 2, we split the index at position 2:

In [31]: idxs = np.split(df.index, 2)

set a new index (with gap at position 2):

In [32]: df.set_index(idxs[0].union(idxs[1] + 1), inplace=True)

In [33]: df
Out[33]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
3    D    E    1
4    E    F    1

insert new row with index 2:

In [34]: df.loc[2] = ['X','X',2]

In [35]: df
Out[35]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
3    D    E    1
4    E    F    1
2    X    X    2

sort index:

In [38]: df.sort_index(inplace=True)

In [39]: df
Out[39]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    X    X    2
3    D    E    1
4    E    F    1

PS you also can insert DataFrame instead of single row using df.append(new_df):

In [42]: df
Out[42]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

In [43]: idxs = np.split(df.index, 2)

In [45]: new_df = pd.DataFrame([['X', 'X', 10], ['Y','Y',11]], columns=df.columns)

In [49]: new_df.index += idxs[1].min()

In [51]: new_df
Out[51]:
  Col1 Col2  Col3
2    X    X    10
3    Y    Y    11

In [52]: df = df.set_index(idxs[0].union(idxs[1]+len(new_df)))

In [53]: df
Out[53]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
4    D    E    1
5    E    F    1

In [54]: df = df.append(new_df)

In [55]: df
Out[55]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
4    D    E    1
5    E    F    1
2    X    X   10
3    Y    Y   11

In [56]: df.sort_index(inplace=True)

In [57]: df
Out[57]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    X    X   10
3    Y    Y   11
4    D    E    1
5    E    F    1

OLD answer:

One (among many) way to achieve that would be to split your DF and concatenate it together with needed DF in desired order:

Original DF:

In [12]: df
Out[12]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

let's split it into two parts ([0:1], [2:end]):

In [13]: dfs = np.split(df, [2])

In [14]: dfs
Out[14]:
[  Col1 Col2 Col3
 0    A    B    1
 1    B    C    1,   Col1 Col2 Col3
 2    D    E    1
 3    E    F    1]

now we can concatenate it together with a new DF in desired order:

In [15]: pd.concat([dfs[0], pd.DataFrame([['C','D', 1]], columns=df.columns), dfs[1]], ignore_index=True)
Out[15]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    C    D    1
3    D    E    1
4    E    F    1
Top answer
1 of 2
42

Use the timeits, Luke!

Conclusion
List comprehensions perform the best on smaller amounts of data because they incur very little overhead, even though they are not vectorized. OTOH, on larger data, loc and numpy.where perform better - vectorisation wins the day.

Keep in mind that the applicability of a method depends on your data, the number of conditions, and the data type of your columns. My suggestion is to test various methods on your data before settling on an option.

One sure take away from here, however, is that list comprehensions are pretty competitive—they're implemented in C and are highly optimised for performance.


Benchmarking code, for reference. Here are the functions being timed:

def numpy_where(df):
  return df.assign(is_rich=np.where(df['salary'] >= 50, 'yes', 'no'))

def list_comp(df):
  return df.assign(is_rich=['yes' if x >= 50 else 'no' for x in df['salary']])

def loc(df):
  df = df.assign(is_rich='no')
  df.loc[df['salary'] > 50, 'is_rich'] = 'yes'
  return df
2 of 2
3

Another method is by using the pandas mask (depending on the use-case where) method. First initialize a Series with a default value (chosen as "no") and replace some of them depending on a condition (a little like a mix between loc[] and numpy.where()).

df['is_rich'] = pd.Series('no', index=df.index).mask(df['salary']>50, 'yes')

It is probably the fastest option. For example, for a frame with 10 mil rows, mask() option is 40% faster than loc option.1

I also updated the perfplot benchmark in cs95's answer to compare how the mask method performs compared to the other methods:


1: The benchmark result that compares mask with loc.

def mask(df):
    return df.assign(is_rich=pd.Series('no', index=df.index).mask(df['salary']>50, 'yes'))

df = pd.DataFrame({'salary': np.random.rand(10_000_000)*100})

%timeit mask(df)
# 391 ms ± 3.87 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)

%timeit loc(df)
# 558 ms ± 75.6 ms per loop (mean ± std. dev. of 10 runs, 100 loops each)
Discussions

python - Pandas - add value at specific iloc into new dataframe column - Stack Overflow
I have a large dataframe containing lots of columns. For each row/index in the dataframe I do some operations, read in some ancilliary ata, etc and get a new value. Is there a way to add that new ... More on stackoverflow.com
🌐 stackoverflow.com
Pandas/Python add a row based on condition - Stack Overflow
YY_MM_CD customerid pol_no type WE WP 2019-07 15680 1313145 new 3 89 2020-01 14672 1418080 renwd -8 223 2019-01 15681 1213143 new 4 ... More on stackoverflow.com
🌐 stackoverflow.com
Pandas - Add value to column IF row is in another
This is interesting. I'm not sure why pandas makes this so hard. Membership testing with pandas objects is just tricky. One way would be to convert each row of each df (main + subs) into a tuple so that the whole row is hashable (this will simplify membership testing), then convert the sub dfs each into a set of rows. Here's how to do that. dfs = {'main' : main_df.copy(), 'sub1' : sub_df1.copy(), 'sub2' : sub_df2.copy(), 'sub3' : sub_df3.copy(), ...} for df_name, df in dfs.items(): df = df.apply(tuple, axis=1) if df_name.startswith('sub'): df = set(df.values) dfs[df_name] = df From here, you can use df.apply to loop over the rows of your "tuplified" main df, and test the membership of each row against each sub df. This is probably best done using a custom function. Here's how to do that. def get_membership(main_row, dfs): for i, (df_name, sub_df) in enumerate(dfs.items(), start=1): if df_name == 'main': continue if main_row in sub_df: return i main_df['membership'] = dfs['main'].apply(get_membership, args=(dfs,)) I haven't tested this, but I think that main df rows which are not in any of the sub dfs will be assigned the value None, or possible NaN. You'll need to adjust the variable and column names here obviously, but that should give you the skeleton you need. Edit: Although I speculated that this will be faster than your implementation, I'll stop short of saying it's fast period. The actual membership testing of a single row should be approximately O(1) (or maybe 10*O(1) since there are 10 sub dfs). But the conversion of every row to a tuple and then of all rows to a set may be costly, depending on how many rows you have. I still think it will be faster than your solution though, which would still be a win. You'll just have to test and see. More on reddit.com
🌐 r/learnpython
8
2
March 11, 2022
How to add rows to the dataframe based on a condition?
I think there are 3 steps here. Look to the previous row and the next row. Based on condition. Add Rows based on the values. To accomplish the first step, you can create new columns that implement the shift function() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html df['prevrow'] = df.eventtype.shift() df['nextrow'] = df.eventtype.shift(-1) You also need to do the same thing for the user. df['prevuser'] = df.user.shift() df['nextuser'] = df.user.shift(-1) For the second step, filter the data where the condition you listed is true. I did not 100% follow your logic for the condition so maybe I got it mixed up but it could look something like this:. If the current event is 1 and there’s no previous events sameasprev = df[df. user == df.prevuser] sameasprev = df[sameasprev.prevrow == sameasprev.eventtype] sameasprev = sameasprev[sameasprev. eventtype == 1] If the current event is a 0 and there’s no next event sameasnext = df[df.user == df.nextuser] sameasnext = sameasnext[sameasnext.nextrow == df. eventtype] sameasnext = sameasnext[df. eventtype == 0] Finally you want to add the rows to the dataframe. You will need to modify the values as you required. sameasprev.eventtype = 0 sameasnext.eventtype = 1 newdf = pd.concat(df,sameasprev).reset_index() newdf = pd.concat(newdf,sameasnext).reset_index() More on reddit.com
🌐 r/learnpython
2
1
August 30, 2022
🌐
Dataquest
dataquest.io › blog › tutorial-add-column-pandas-dataframe-based-on-if-else-condition
Add a Column in a Pandas DataFrame Based on an If-Else Condition
March 6, 2023 - To accomplish this, we can use a function called np.select(). We’ll give it two arguments: a list of our conditions, and a correspding list of the value we’d like to assign to each row in our new column.
🌐
ProjectPro
projectpro.io › recipes › insert-new-column-based-on-condition-in-python
How to insert a new column based on condition in Python? -
August 23, 2023 - The apply() method shows you how to create a new column in a Pandas based on condition. The apply() method takes a function as an argument and applies that function to each row in the DataFrame.
🌐
Stack Overflow
stackoverflow.com › questions › 60985413 › pandas-python-add-a-row-based-on-condition
Pandas/Python add a row based on condition - Stack Overflow
Write the expiration date next to each customer id. Then when accessing the data just check if the current data is before the expiration date. ... For example. new_df = pd.DataFrame() df = YOURDATA groups = df.groupby("customerid") for group in groups: if len(group) < 2: #your condition df2 = pd.DataFrame( ADD YOUR DATA HERE) new_df.append(df2, ignore_index=True) at the end you can combine new_df and df with concat:https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
🌐
Reddit
reddit.com › r/learnpython › pandas - add value to column if row is in another
r/learnpython on Reddit: Pandas - Add value to column IF row is in another
March 11, 2022 -

Hi there,

I've got a little bit stuck with Pandas. I'm trying to add a new column to my database, and then give each row of the database a 1-10 value based on whether it's present in another df.

So my main dataframe is divided into 10 sub dataframes (df1, df2, df3...) and each row in the dataframe has a unique 'GroupID' value. I'd like to give every entry in the main dataframe a different value in a new column (called Segment), so if its present in df1, give it the value 1 for example.

I hope I've explained this as clear as possible, and many thanks in advance!

This is what I've written, but it's returning an error

if Attempt[Attempt['Group ID'].isin(df1['Group ID'])]: Attempt["Segment"] ='1' elif Attempt[Attempt['Group ID'].isin(df2['Group ID'])]: Attempt["Segment"] ='2' elif Attempt[Attempt['Group ID'].isin(df3['Group ID'])]: Attempt["Segment"] ='3' elif Attempt[Attempt['Group ID'].isin(df4['Group ID'])]: Attempt["Segment"] ='4' elif Attempt[Attempt['Group ID'].isin(df5['Group ID'])]: Attempt["Segment"] ='5' elif Attempt[Attempt['Group ID'].isin(df6['Group ID'])]: Attempt["Segment"] ='6' elif Attempt[Attempt['Group ID'].isin(df7['Group ID'])]: Attempt["Segment"] ='7' elif Attempt[Attempt['Group ID'].isin(df8['Group ID'])]: Attempt["Segment"] ='8' elif Attempt[Attempt['Group ID'].isin(df9['Group ID'])]: Attempt["Segment"] ='9' elif Attempt[Attempt['Group ID'].isin(df10['Group ID'])]: Attempt["Segment"] ='10'

Top answer
1 of 3
2
This is interesting. I'm not sure why pandas makes this so hard. Membership testing with pandas objects is just tricky. One way would be to convert each row of each df (main + subs) into a tuple so that the whole row is hashable (this will simplify membership testing), then convert the sub dfs each into a set of rows. Here's how to do that. dfs = {'main' : main_df.copy(), 'sub1' : sub_df1.copy(), 'sub2' : sub_df2.copy(), 'sub3' : sub_df3.copy(), ...} for df_name, df in dfs.items(): df = df.apply(tuple, axis=1) if df_name.startswith('sub'): df = set(df.values) dfs[df_name] = df From here, you can use df.apply to loop over the rows of your "tuplified" main df, and test the membership of each row against each sub df. This is probably best done using a custom function. Here's how to do that. def get_membership(main_row, dfs): for i, (df_name, sub_df) in enumerate(dfs.items(), start=1): if df_name == 'main': continue if main_row in sub_df: return i main_df['membership'] = dfs['main'].apply(get_membership, args=(dfs,)) I haven't tested this, but I think that main df rows which are not in any of the sub dfs will be assigned the value None, or possible NaN. You'll need to adjust the variable and column names here obviously, but that should give you the skeleton you need. Edit: Although I speculated that this will be faster than your implementation, I'll stop short of saying it's fast period. The actual membership testing of a single row should be approximately O(1) (or maybe 10*O(1) since there are 10 sub dfs). But the conversion of every row to a tuple and then of all rows to a set may be costly, depending on how many rows you have. I still think it will be faster than your solution though, which would still be a win. You'll just have to test and see.
2 of 3
2
While all of these other provided solutions might work, I would argue that they are “unpandonic”. This can be easily solved with a merge. import pandas as pd main = pd.DataFrame([[1, 99], [2, 88], [3, 77], [4, 66], [5, 55], [6, 44]], columns=['groupid', 'value']) sub1 = pd.DataFrame([[1, 9], [2, 8]], columns=['groupid', 'val']) sub2 = pd.DataFrame([[3, 7], [4, 6]], columns=['groupid', 'val']) sub3 = pd.DataFrame([[5, 5], [6, 4]], columns=['groupid', 'val']) subs = [sub1, sub2, sub3] subs = pd.concat([sub.assign(segment=i + 1) for i, sub in enumerate(subs)]) result = main.merge(subs[['groupid', 'segment']], on='groupid', how='left') If a groupid can show up in multiple subdfs and you want to assign the first segment, you can add this: result = result.drop_duplicates('groupid')
Find elsewhere
🌐
datagy
datagy.io › home › python posts › set pandas conditional column based on values of another column
Set Pandas Conditional Column Based on Values of Another Column • datagy
February 23, 2022 - Let's see how we can accomplish this using numpy's .select() method. Similar to the method above to use .loc to create a conditional column in Pandas, we can use the numpy .select() method.
🌐
Reddit
reddit.com › r/learnpython › how to add rows to the dataframe based on a condition?
r/learnpython on Reddit: How to add rows to the dataframe based on a condition?
August 30, 2022 -

I have a dataframe that, after grouping by User and Time, looks like below:

User | Time | Event type

————————————

A | xx:xx | 1

————————————

A | yy:yy | 0

————————————

B | xx:xx | 0

————————————

B | yy:yy | 1

————————————

B | zz:zz | 0

————————————

Now I want to check the current, the previous and the next event.

  1. If the current event is 1 and there’s no previous events, I want to add a 0 event manually. So a new row

  2. If the current event is a 0 and there’s no next event, I want to add a 1 event manually.

So basically i can have complete 0-1 sessions.

Is there any way to achieve this?

🌐
Python Forum
python-forum.io › thread-30289.html
Pandas, Assign a value in a row, to another column based on a condition
I'm trying to put a particular value in the data frame row to other rows also if it matches a condition. However i'm not getting what is expected. Can someone help to get the value as shown in the expected value column. Thanks in advance! ...
🌐
Like Geeks
likegeeks.com › home › python › pandas › add rows to pandas dataframe based on conditions
Add Rows to Pandas DataFrame Based on Conditions
July 6, 2024 - Learn how to add rows to a Pandas DataFrame based on specific conditions. Including multiple conditions, nested statements, and string matching.
Top answer
1 of 3
358

I think you can use loc if you need update two columns to same value:

df1.loc[df1['stream'] == 2, ['feat','another_feat']] = 'aaaa'
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2        aaaa         aaaa
c       2        aaaa         aaaa
d       3  some_value   some_value

If you need update separate, one option is use:

df1.loc[df1['stream'] == 2, 'feat'] = 10
print df1
   stream        feat another_feat
a       1  some_value   some_value
b       2          10   some_value
c       2          10   some_value
d       3  some_value   some_value

Another common option is use numpy.where:

df1['feat'] = np.where(df1['stream'] == 2, 10,20)
print df1
   stream  feat another_feat
a       1    20   some_value
b       2    10   some_value
c       2    10   some_value
d       3    20   some_value

EDIT: If you need divide all columns without stream where condition is True, use:

print df1
   stream  feat  another_feat
a       1     4             5
b       2     4             5
c       2     2             9
d       3     1             7

#filter columns all without stream
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1.loc[df1['stream'] == 2, cols ] = df1 / 2
print df1
   stream  feat  another_feat
a       1   4.0           5.0
b       2   2.0           2.5
c       2   1.0           4.5
d       3   1.0           7.0

If working with multiple conditions is possible use multiple numpy.where or numpy.select:

df0 = pd.DataFrame({'Col':[5,0,-6]})

df0['New Col1'] = np.where((df0['Col'] > 0), 'Increasing', 
                          np.where((df0['Col'] < 0), 'Decreasing', 'No Change'))

df0['New Col2'] = np.select([df0['Col'] > 0, df0['Col'] < 0],
                            ['Increasing',  'Decreasing'], 
                            default='No Change')

print (df0)
   Col    New Col1    New Col2
0    5  Increasing  Increasing
1    0   No Change   No Change
2   -6  Decreasing  Decreasing
2 of 3
5

You can do the same with .ix, like this:

In [1]: df = pd.DataFrame(np.random.randn(5,4), columns=list('abcd'))

In [2]: df
Out[2]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484 -0.905302 -0.435821  1.934512
3  0.266113 -0.034305 -0.110272 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

In [3]: df.ix[df.a>0, ['b','c']] = 0

In [4]: df
Out[4]: 
          a         b         c         d
0 -0.323772  0.839542  0.173414 -1.341793
1 -1.001287  0.676910  0.465536  0.229544
2  0.963484  0.000000  0.000000  1.934512
3  0.266113  0.000000  0.000000 -0.720599
4 -0.522134 -0.913792  1.862832  0.314315

EDIT

After the extra information, the following will return all columns - where some condition is met - with halved values:

>> condition = df.a > 0
>> df[condition][[i for i in df.columns.values if i not in ['a']]].apply(lambda x: x/2)
🌐
GeeksforGeeks
geeksforgeeks.org › pandas › python-creating-a-pandas-dataframe-column-based-on-a-given-condition
Python | Creating a Pandas dataframe column based on a given condition - GeeksforGeeks
October 30, 2025 - import pandas as pd df = pd.DataFrame({'Date': ['11/8/2011', '11/9/2011', '11/10/2011', '11/11/2011', '11/12/2011'], 'Event': ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']}) print(df) ... Date Event 0 11/8/2011 Music 1 11/9/2011 Poetry 2 11/10/2011 Music 3 11/11/2011 Comedy 4 11/12/2011 Poetry ... Date Event Price 0 11/8/2011 Music 1500 1 11/9/2011 Poetry 800 2 11/10/2011 Music 1500 3 11/11/2011 Comedy 800 4 11/12/2011 Poetry 800 ... def set_value(event, mapping): return mapping[event] event_dict = {'Music': 1500, 'Comedy': 1200, 'Poetry': 800} df['Price'] = df['Event'].apply(set_value, args=(event_dict,)) print(df)
🌐
Saturn Cloud
saturncloud.io › blog › how-to-update-values-in-a-specific-row-in-a-python-pandas-dataframe
How to update values in a specific row in a Python Pandas DataFrame | Saturn Cloud Blog
January 4, 2024 - If we need to update multiple values in the same row, we can use the .loc or .iloc methods to select the row as before, but this time we will pass a dictionary of column names and values to the .loc or .iloc methods. The keys of the dictionary should correspond to the column names, and the values should be the new values we want to set. import pandas as pd # create a sample DataFrame df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # update the values in the second row df.loc[1] = {'A': 10, 'B': 20} print(df)
🌐
Medium
medium.com › analytics-vidhya › pandas-how-to-change-value-based-on-condition-fc8ee38ba529
Pandas: How to change value based on condition | by Jack Dong | Analytics Vidhya | Medium
January 17, 2024 - Pandas’ loc can create a boolean mask, based on condition. It can either just be selecting rows and columns, or it can be used to filter dataframes. example_df.loc[example_df["column_name1"] condition, "column_name2"] = value
🌐
Pandas
pandas.pydata.org › docs › reference › api › pandas.DataFrame.add.html
pandas.DataFrame.add — pandas 3.0.1 documentation
Get Addition of dataframe and other, element-wise (binary operator add) · Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd
Top answer
1 of 4
2

You could use groupby to check if 'C' and 'D' are in the 'col_name' column and add them if not.

df = pd.DataFrame([{'owner':'svc','name':'dmn_dmn','col_name':'A','test_col1':1,'test_col2':'String1'},{'owner':'svc','name':'dmn_dmn','col_name':'B','test_col1':2,'test_col2':'String12'},{'owner':'svc','name':'dmn_dmn','col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},{'owner':'svc','name':'dmn_dmn','col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},{'owner':'svc','name':'time1','col_name':'E','test_col1':5,'test_col2':'String1123'}])

for g,g_hold in df.groupby('name'):
    if 'C' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},ignore_index=True)
    if 'D' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},ignore_index=True)

print(df.sort_values(['name','col_name']))

The code would end up looking something like this.

2 of 4
0

A better way is to use

import pandas as pd
df = pd.DataFrame({"owner": ["owner"] * 9,
               "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
               "col_name": ["A", "B", "C", "D", "A", "B", "A", "B", "D"]})
index = pd.MultiIndex.from_product([df.owner.unique(), df.name.unique(), df.col_name.unique()])
result = df.set_index(['owner', 'name', "col_name"]).reindex(index).reset_index()
print(result)
🌐
Spark By {Examples}
sparkbyexamples.com › home › pandas › pandas add column based on another column
Pandas Add Column based on Another Column - Spark By {Examples}
July 7, 2025 - In this article, I have explained how to add/append a column to the Pandas DataFrame based on the values of another column using multiple functions and also I explained how to use basic arithmetic operations and other functions. Happy learning!! ... Pandas select columns based on condition.