Try this one: def partialMatch(stringIn1, stringIn2): # Split the first string string1split = stringIn1.split('/') previousIndex = None listOut = [] for item in stringIn2.split('/'): # Try to find an index. Raises valueError if it doesn't exist. try: index = string1split.index(item)… Answer from JordanCClark on forum.inductiveautomation.com
🌐
Python.org
discuss.python.org › ideas
Partial string matches in structural pattern matching - Ideas - Discussions on Python.org
July 20, 2023 - Hi there, I’d love to be able to use partial string matches: match text: case "prefix_" + cmd: # checking prefix print("got", cmd) case "Hello " + last_name + ", " + first_name + "!": # more complex e…
🌐
Mindee
mindee.com › blog of mindee › partial string matching: jaccard, sub-string percentage, levenshtein distance and regex
Boost Data Precision with Partial String Matching Techniques in Python
July 8, 2025 - Using maximum length: A score of 100% is possible only when the two strings are exactly the same. Here is a python implementation of this method using difflib: from difflib import SequenceMatcher def longest_common_substring(s1: str, s2: str) -> str: """Computes the longest common substring of s1 and s2""" seq_matcher = SequenceMatcher(isjunk=None, a=s1, b=s2) match = seq_matcher.find_longest_match(0, len(s1), 0, len(s2)) if match.size: return s1[match.a : match.a + match.size] else: return "" def longest_common_substring_percentage(s1 : str, s2 : str) -> float: """Computes the longest common substring percentage of s1 and s2""" assert min(len(s1), len(s2)) > 0, "One of the given string is empty" return len(longest_common_substring(s1, s2))/min(len(s1), len(s2))
Top answer
1 of 2
3

Use contains for boolean mask and then numpy.where:

m = df['a'].str.contains('foo') & (df['b'] == 'bar')
print (m)
0     True
1    False
2    False
dtype: bool

df['new'] = np.where(m, 'yes', 'no')
print (df)
        a       b    c  new
0     foo     bar  baz  yes
1     bar     foo  baz   no
2  foobar  barfoo  baz   no

Or if need alo check column b for substrings:

m = df['a'].str.contains('foo') & df['b'].str.contains('bar')
df['new'] = np.where(m, 'yes', 'no')
print (df)
        a       b    c  new
0     foo     bar  baz  yes
1     bar     foo  baz   no
2  foobar  barfoo  baz  yes

If need custom function, what should be slowier in bigger DataFrame:

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

print (df.apply(somefunction, axis=1))
0    yes
1     no
2     no
dtype: object

def somefunction (row):
    if 'foo' in row['a']  and  'bar' in row['b']:
        return 'yes'
    return 'no'

print (df.apply(somefunction, axis=1))
0    yes
1     no
2    yes
dtype: object

Timings:

df = pd.concat([df]*1000).reset_index(drop=True)

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

In [269]: %timeit df['new'] = df.apply(somefunction, axis=1)
10 loops, best of 3: 60.7 ms per loop

In [270]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no')
100 loops, best of 3: 3.25 ms per loop

df = pd.concat([df]*10000).reset_index(drop=True)

def somefunction (row):
    if 'foo' in row['a'] and row['b'] == 'bar':
        return 'yes'
    return 'no'

In [272]: %timeit df['new'] = df.apply(somefunction, axis=1)
1 loop, best of 3: 614 ms per loop

In [273]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no')
10 loops, best of 3: 23.5 ms per loop
2 of 2
1

Your exception is probably from the fact that you write

if row['a'].str.contains('foo')==True

Remove '.str':

if row['a'].contains('foo')==True
🌐
Note.nkmk.me
note.nkmk.me › home › python
String Comparison in Python (Exact/Partial Match, etc.) | note.nkmk.me
April 29, 2025 - If they are equal, True is returned; otherwise, False is returned. print('abc' == 'abc') # True print('abc' == 'xyz') # False ... This operation is case-sensitive, as are other comparison operators and methods.
🌐
Finxter
blog.finxter.com › home › learn python blog › how to find a partial string in a python list?
How to Find a Partial String in a Python List? - Be on the Right Side of Change
September 10, 2022 - The most Pythonic way to find a ... s]. ... def partial(lst, query): return [s for s in lst if query in s] # Example 1: print(partial(['hello', 'world', 'python'], 'pyth')) # ['python'] # Example 2: print(partial(['aaa', 'aa', 'a'], 'a')) # ['aaa', 'aa', 'a'] # Example 3: ...
🌐
Medium
chestadhingra25.medium.com › partial-string-matching-and-deduplication-using-python-1d79d67a0b7c
Partial String Matching and DeDuplication using Python | by Chesta Dhingra | Medium
January 30, 2023 - In this article we’ll be looking for finding the duplicates among textual data via exact or partial match. The idea for this article came when I was working on a project and have to find the dedupes of some company names or addresses present in a dataframe. That makes the data quite inconsistent. Manual cleaning on small data is easy to accomplish but when we deal with big data then it is not going to be an easy task. In that case some python functions and library like fuzzywuzzy, multiprocessing comes in handy.
Find elsewhere
🌐
Reddit
reddit.com › r/learnpython › regex question - getting a partial match
r/learnpython on Reddit: Regex question - getting a partial match
October 30, 2018 -

Hi, for the life of me, I do not know why I am getting a partial match for this regex

I want to match and print out "FOO-2334" but I am only getting back "FOO"

It has something to do with the hyphen...I think.

Any hints please?

import re
myStr = "FOO-2334 is an id"
matches = re.findall(r'(FOO|BAR)-[\d]{4}', myStr)
for m in matches:
print (f"{m}")

Top answer
1 of 5
53
  • startswith and in, return a Boolean.
  • The in operator is a test of membership.
  • This can be performed with a list-comprehension or filter.
  • Using a list-comprehension, with in, is the fastest implementation tested.
  • If case is not an issue, consider mapping all the words to lowercase.
    • l = list(map(str.lower, l)).
  • Tested with python 3.11.0

filter:

  • Using filter creates a filter object, so list() is used to show all the matching values in a list.
l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))

# using in
result = list(filter(lambda x: wanted in x, l))

print(result)
[out]:
['threes']

list-comprehension

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = [v for v in l if v.startswith(wanted)]

# using in
result = [v for v in l if wanted in v]

print(result)
[out]:
['threes']

Which implementation is faster?

  • Tested in Jupyter Lab using the words corpus from nltk v3.7, which has 236736 words
  • Words with 'three'
    • ['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']
from nltk.corpus import words

%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
%timeit list(filter(lambda x: wanted in x, words.words()))
%timeit [v for v in words.words() if v.startswith(wanted)]
%timeit [v for v in words.words() if wanted in v]

%timeit results

62.8 ms ± 816 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
53.8 ms ± 982 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
56.9 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
47.5 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2 of 5
10

A simple, direct answer:

test_list = ['one', 'two','threefour']
r = [s for s in test_list if s.startswith('three')]
print(r[0] if r else 'nomatch')

Result:

threefour

Not sure what you want to do in the non-matching case. r[0] is exactly what you asked for if there is a match, but it's undefined if there is no match. The print deals with this, but you may want to do so differently.

🌐
Reddit
reddit.com › r/rstats › what's the best method of partial string matching?
r/rstats on Reddit: What's the best method of partial string matching?
May 4, 2022 -

I have a dataframe with a few million rows of names and accompanying columns with relevant info. I want to narrow down the dataframe to only include names from a list of 2,000 names. What's the best method of going about this when I have middle names and states to help distinguish between duplicate names?

Here's an example of the list of names:

John Smith Alabama R
John Smith Alabama
Jeremy Smith Washington P

What I want to do is first match the name and state to the dataframe if there is a middle initial match (the last letter in the list name if there is a middle name). If not, then I would just like to match by the name and state.

Here's what I tried so far:

df2 <- df[grep(paste(list_of_names, collapse = "|"), df$name_state_middle_initial),]

However, I'm only getting complete string matches with the above code. Any help would be great!

🌐
Narkive
tutor.python.narkive.com › ES985mq5 › partial-string-matching-in-list-comprehension
[Tutor] partial string matching in list comprehension?
One way is to use a helper function to do the test: In [1]: junkList =["interchange", "ifferen", "thru"] In [2]: lst = ["My skull hurts", "Drive the thruway", "Interchangability is not my forte"] In [3]: def hasJunk(s): ...: for junk in junkList: ...: if junk in s: ...: return True ...: return ...
🌐
Strengejacke
strengejacke.github.io › sjmisc › reference › str_find.html
Find partial matching and close distance elements in strings — str_find • sjmisc
str_find(string, pattern, precision = 2, partial = 0, verbose = FALSE) ... Character vector with string elements. ... String that should be matched against the elements of string.
🌐
CSDN
devpress.csdn.net › python › 62fd9a7e7e66823466192a32.html
How to retrieve partial matches from a list of strings_python_Mangs-Python
begins, ends, or contains) a certain string. But how can you return the element itself, instead of True or False ... The in operator is a test of membership. This can be performed with a list-comprehension or filter. Using a list-comprehension, with in, is the fastest implementation tested. If case is not an issue, consider mapping all the words to lowercase. ... Using filter creates a filter object, so list() is used to show all the matching values in a list.
🌐
Medium
medium.com › @MehrazHossainRumman › how-to-filter-data-by-partial-match-in-python-38b194df943e
How to Filter Data by Partial Match in Python | by Mehraz Hossain Rumman | Medium
August 11, 2024 - - Output: — The function returns a new list containing only the dictionaries that match the search criteria. Example Usage Let’s see how the function works in practice: result = filter_by_partial_match(data_list, ‘name’, ‘ali’) print(result)
🌐
Stack Overflow
stackoverflow.com › questions › 68089019 › how-to-get-partial-match-of-a-substring-in-a-list-of-string
python - How to get partial match of a substring in a list of string? - Stack Overflow
I wanted to see if the second list contains any word from the first list (even partial matches as it contains dot and hyphen). How can I achieve this? ... This is practically a duplicate of Check if substring is in a list of strings?, you just need to add a loop over the first list.
Top answer
1 of 3
8

You can use the almost-ready-to-be-everyones-regex package with fuzzy matching:

>>> import regex
>>> bigString = "AGAHKGHKHASNHADKRGHFKXXX_I_AM_THERE_XXXXXMHHGRFSAHGSKHASGKHGKHSKGHAK"
>>> regex.search('(?:I_AM_HERE){e<=1}',bigString).group(0)
'I_AM_THERE'

Or:

>>> bigString = "AGAH_I_AM_HERE_RGHFKXXX_I_AM_THERE_XXX_I_AM_NOWHERE_EREXXMHHGRFS"
>>> print(regex.findall('I_AM_(?:HERE){e<=3}',bigString))
['I_AM_HERE', 'I_AM_THERE', 'I_AM_NOWHERE']

The new regex module will (hopefully) be part of Python3.4

If you have pip, just type pip install regex or pip3 install regex until Python 3.4 is out (with regex part of it...)


Answer to comment Is there a way to know the best out of the three in your second example? How to use BESTMATCH flag here?

Either use the best match flag (?b) to get the single best match:

print(regex.search(r'(?b)I_AM_(?:ERE){e<=3}', bigString).group(0))
# I_AM_THE

Or combine with difflib or take a levenshtein distance with a list of all acceptable matches to the first literal:

import regex

def levenshtein(s1,s2):
    if len(s1) > len(s2):
        s1,s2 = s2,s1
    distances = range(len(s1) + 1)
    for index2,char2 in enumerate(s2):
        newDistances = [index2+1]
        for index1,char1 in enumerate(s1):
            if char1 == char2:
                newDistances.append(distances[index1])
            else:
                newDistances.append(1 + min((distances[index1],
                                             distances[index1+1],
                                             newDistances[-1])))
        distances = newDistances
    return distances[-1]

bigString = "AGAH_I_AM_NOWHERE_HERE_RGHFKXXX_I_AM_THERE_XXX_I_AM_HERE_EREXXMHHGRFS"
cl=[(levenshtein(s,'I_AM_HERE'),s) for s in regex.findall('I_AM_(?:HERE){e<=3}',bigString)]

print(cl)
print([t[1] for t in sorted(cl, key=lambda t: t[0])])

print(regex.search(r'(?e)I_AM_(?:ERE){e<=3}', bigString).group(0))

Prints:

[(3, 'I_AM_NOWHERE'), (1, 'I_AM_THERE'), (0, 'I_AM_HERE')]
['I_AM_HERE', 'I_AM_THERE', 'I_AM_NOWHERE']
2 of 3
0

Here is a bit of a hacky way to do it with difflib:

from difflib import *

window = len(smallString) + 1  # allow for longer matches
chunks = [bigString[i:i+window] for i in range(len(bigString)-window)]
get_close_matches(smallString,chunks,1)

Output:

['_I_AM_THERE']
🌐
Reddit
reddit.com › r/learnpython › how to efficiently match strings between two big lists with python? (510.000.000 comparisons)
r/learnpython on Reddit: How to efficiently match strings between two big lists with python? (510.000.000 comparisons)
October 6, 2020 -

I am facing the problem of a very long running for loop.

There are two python lists (A and B):

A contains around 170.000 strings with lengths between 1 and 100 characters. B contains around 3.000 strings with the same length variety.

Now i need to find items from list A which contain one item from list B.

Considering that each string from A needs to be compared with each string from B it results in 510.000.000 comparisons. This seems computational too expensive.

What possibilities are there to speed things up?
I don't want to stop after the first match as there could be more matches. The goal is to store all matches in some new variable/db.

Pseudo-code:

A = []  # length: 170.000 (strings)
B = []  # length: 3.000 (strings)

for item in A:
    for element in B:
        if element in item:
            print("store the item which contains the element to db")

# Some sample content
A[0] = "This is some random text in which I want to find words"
A[1] = "It is just some random text"
...
B[0] = "text"
B[1] = "some random text"
...