python partial string match

stackoverflow.com › questions › 47074958 › python-function-partial-string-match

Use contains for boolean mask and then numpy.where:

m = df['a'].str.contains('foo') & (df['b'] == 'bar')
print (m)
0     True
1    False
2    False
dtype: bool

df['new'] = np.where(m, 'yes', 'no')
print (df)
        a       b    c  new
0     foo     bar  baz  yes
1     bar     foo  baz   no
2  foobar  barfoo  baz   no

Or if need alo check column b for substrings:

m = df['a'].str.contains('foo') & df['b'].str.contains('bar')
df['new'] = np.where(m, 'yes', 'no')
print (df)
        a       b    c  new
0     foo     bar  baz  yes
1     bar     foo  baz   no
2  foobar  barfoo  baz  yes

If need custom function, what should be slowier in bigger DataFrame:

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

print (df.apply(somefunction, axis=1))
0    yes
1     no
2     no
dtype: object

def somefunction (row):
    if 'foo' in row['a']  and  'bar' in row['b']:
        return 'yes'
    return 'no'

print (df.apply(somefunction, axis=1))
0    yes
1     no
2    yes
dtype: object

Timings:

df = pd.concat([df]*1000).reset_index(drop=True)

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

In [269]: %timeit df['new'] = df.apply(somefunction, axis=1)
10 loops, best of 3: 60.7 ms per loop

In [270]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no')
100 loops, best of 3: 3.25 ms per loop

df = pd.concat([df]*10000).reset_index(drop=True)

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

In [272]: %timeit df['new'] = df.apply(somefunction, axis=1)
1 loop, best of 3: 614 ms per loop

In [273]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no')
10 loops, best of 3: 23.5 ms per loop

Answer from jezrael on Stack Overflow

Note.nkmk.me

note.nkmk.me › home › python

String Comparison in Python (Exact/Partial Match, etc.) | note.nkmk.me

April 29, 2025 - This operation is case-sensitive, as are other comparison operators and methods. Case-insensitive comparisons are discussed later. ... To check for partial matches, use the in operator, which determines if one string contains another string.

Stack Overflow

stackoverflow.com › questions › 47074958 › python-function-partial-string-match

pandas - Python function partial string match - Stack Overflow

Top answer

1 of 2

Use contains for boolean mask and then numpy.where:

m = df['a'].str.contains('foo') & (df['b'] == 'bar')
print (m)
0     True
1    False
2    False
dtype: bool

df['new'] = np.where(m, 'yes', 'no')
print (df)
        a       b    c  new
0     foo     bar  baz  yes
1     bar     foo  baz   no
2  foobar  barfoo  baz   no

Or if need alo check column b for substrings:

m = df['a'].str.contains('foo') & df['b'].str.contains('bar')
df['new'] = np.where(m, 'yes', 'no')
print (df)
        a       b    c  new
0     foo     bar  baz  yes
1     bar     foo  baz   no
2  foobar  barfoo  baz  yes

If need custom function, what should be slowier in bigger DataFrame:

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

print (df.apply(somefunction, axis=1))
0    yes
1     no
2     no
dtype: object

def somefunction (row):
    if 'foo' in row['a']  and  'bar' in row['b']:
        return 'yes'
    return 'no'

print (df.apply(somefunction, axis=1))
0    yes
1     no
2    yes
dtype: object

Timings:

df = pd.concat([df]*1000).reset_index(drop=True)

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

In [269]: %timeit df['new'] = df.apply(somefunction, axis=1)
10 loops, best of 3: 60.7 ms per loop

In [270]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no')
100 loops, best of 3: 3.25 ms per loop

df = pd.concat([df]*10000).reset_index(drop=True)

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

In [272]: %timeit df['new'] = df.apply(somefunction, axis=1)
1 loop, best of 3: 614 ms per loop

In [273]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no')
10 loops, best of 3: 23.5 ms per loop

2 of 2

Your exception is probably from the fact that you write

if row['a'].str.contains('foo')==True

Remove '.str':

if row['a'].contains('foo')==True

Discussions

Partial string matches in structural pattern matching - Ideas - Discussions on Python.org

Hi there, I’d love to be able to use partial string matches: match text: case "prefix_" + cmd: # checking prefix print("got", cmd) case "Hello " + last_name + ", " + first_name + "!": # more complex e… More on discuss.python.org

discuss.python.org

July 20, 2023

Python search strings with partial match

string1 = 'First/Second/Third/Fourth/Fifth' string2 = 'Second/Third/SomethingElse/Etc' Is there a native way in Python to search for partial matches stating at any point and ending at any point? In the examples above there is a partial match of 'Second/Third' between the two strings, then I ... More on forum.inductiveautomation.com

forum.inductiveautomation.com

March 9, 2023

python - How to retrieve partial matches from a list of strings - Stack Overflow

For approaches to retrieving partial matches in a numeric list, go to: How to return a subset of a list that matches a condition? Python: Find in list But if you're looking for how to retrieve pa... More on stackoverflow.com

stackoverflow.com

How to efficiently match strings between two big lists with python? (510.000.000 comparisons)

it results in 510.000.000 comparisons. This seems computational too expensive. Aside from all the ideas on how to do this this comment sticks out. 510_000_000 comparisons does not seem large. Is it something you do a lot? Does the data change frequently, is either set of data somewhat static? Just ran it on my machine, ran in 4 seconds on heavily duplicate data. More on reddit.com

r/learnpython

October 6, 2020

Videos