You may try using word-boundaries around your text. Something like:
\bhello\b
You can find the demo of the above regex in here.
Sample implementation in Python
import re
def find_string(file_name, word):
with open(file_name, 'r') as a:
for line in a:
line = line.rstrip()
if re.search(r"\b{}\b".format(word),line):
return True
return False
if find_string('myfile.txt', 'hello'):
print("found")
else:
print("not found")
You can find the sample run of the above implementation in here.
Answer from user7571182 on Stack OverflowYou may try using word-boundaries around your text. Something like:
\bhello\b
You can find the demo of the above regex in here.
Sample implementation in Python
import re
def find_string(file_name, word):
with open(file_name, 'r') as a:
for line in a:
line = line.rstrip()
if re.search(r"\b{}\b".format(word),line):
return True
return False
if find_string('myfile.txt', 'hello'):
print("found")
else:
print("not found")
You can find the sample run of the above implementation in here.
Do you want something like this? Sorry if not
import re
with open('regex.txt', 'r') as a:
word = "hello"
for line in a:
line = line.rstrip()
if re.search(r"({})".format(word), line):
print(f'{line} ->>>> match!')
else:
print(f'{line} ->>>> not match!')
text file:
hello world #does not match
hello #match
test here
teste hello here
[output]
hello world #does not match ->>>> match!
hello #match ->>>> match!
test here ->>>> not match!
teste hello here ->>>> match!
Videos
You can use the word-boundaries of regular expressions. Example:
import re
s = '98787This is correct'
for words in ['This is correct', 'This', 'is', 'correct']:
if re.search(r'\b' + words + r'\b', s):
print('{0} found'.format(words))
That yields:
is found
correct found
For an exact match, replace \b assertions with ^ and $ to restrict the match to the begin and end of line.
Use the comparison operator == instead of in then:
if text == 'This is correct':
print("Correct")
This will check to see if the whole string is just 'This is correct'. If it isn't, it will be False
Goal:
I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python.
I'd also like to know the location of these occurrences in the longer string.
To define what a close match is, I'd like to set a threshold, e.g. number of edits if using the edit distance as the metric.
I'd also like the code to give a matching score (the one that is likely used to determine if a candidate substring is over the matching threshold I set).
How can I do so in Python?
Example:
long_string = """1. Bob likes classical music very much. 2. This is classic music! 3. This is a classic musical. It has a lot of classical musics. """ query_string = "classical music"
I'd like the Python code to find "classical music" and possibly "classic music", "classic musical" and "classical musics" depending on the string matching threshold I set.
Research: I found Checking fuzzy/approximate substring existing in a longer string, in Python? but the question focuses on the best match only (i.e., not all occurrences) and answers either also focuses on the best match or don't work on multi-word query strings (since the question only had a single-word query strings, or return some incorrect score (doesn't get a perfect score even for an exact match).
you could use regular expression for this, reading the strings an defining the rules for that, in this particular case, either there is a separator or the end of the string so the following code might solve your problem:
import re
# Sample string representing the text to search
string = "featureSetCombination: 1 \n featureSetCombination: 10"
re.findall("featureSetCombination:[1-9][$|\s|.|,|;]", string)
>> ['featureSetCombination:1,']
as you can see it finds the first occurrence but not the second
Have you looked into string method "find"? Here is a tutorial from W3School. It gives useful examples of the syntax for using this method: https://www.w3schools.com/python/ref_string_find.asp
If the sequence is as you listed in your question, the Python "find" method will give you the first result that matches the search criteria. You can end the string with a dot and specify that the end is a dot in the string method to find the exact match. I hope this help!
Alternatively, I would look into Regex for more creative problem-solving solutions.
For this kind of thing, regexps are very useful :
import re
print(re.findall('\\blocal\\b', "Hello, locally local test local."))
// ['local', 'local']
\b means word boundary, basically. Can be space, punctuation, etc.
Edit for comment :
print(re.sub('\\blocal\\b', '*****', "Hello, LOCAL locally local test local.", flags=re.IGNORECASE))
// Hello, ***** locally ***** test *****.
You can remove flags=re.IGNORECASE if you don't want to ignore the case, obviously.
Below you can use simple function.
def find_word(text, search):
result = re.findall('\\b'+search+'\\b', text, flags=re.IGNORECASE)
if len(result)>0:
return True
else:
return False
Using:
text = "Hello, LOCAL locally local test local."
search = "local"
if find_word(text, search):
print "i Got it..."
else:
print ":("
This is another approach to take to complete your task, it may be helpful although it doesn't follow your current approach very much.
The test.txt file I fed as input had four sentences:
This is a special cat. And this is a special dog. That's an average cat. But better than that loud dog.
When you run the program, include the text file. In command line, that'd look something like:
python file.py test.txt
This is the accompanying file.py:
import fileinput
key = raw_input("Please enter the word you with to search for: ")
#print "You've selected: ", key, " as you're key-word."
with open('test.txt') as f:
content = str(f.readlines())
#print "This is the CONTENT", content
list_of_sentences = content.split(".")
for sentence in list_of_sentences:
words = sentence.split(" ")
for word in words:
if word == key:
print sentence
For the keyword "cat", this returns:
That is a special cat
That's an average cat
(note the periods are no longer there).
I think if you, in the strings outside text, put spaces like this:
'(.+) ' + text + ' (.+)'
That would do the trick, if I correctly understand what is going on in the code.
Hi, I don't get why when I use str.contains to get exact matches from a list of keywords, the output still contains partial matches. Here is an extract of what I have (I'm only including one keyword in the list for the example):
keyword= ['SE.TER.ENRL']
subset = df[df['Code'].str.contains('|'.join(keyword), case=False, na=False)]
Output: ['SE.TER.ENRL' 'SE.TER.ENRL.FE' 'SE.TER.ENRL.FE.ZS']
Does anyone know how to get around this?
Thanks!