Simple string operation:
mywords = ("xxx", "yyy", "zzz")
all(x in mystring for x in mywords)
If word boundaries are relevant (i. e. you want to match zzz but not Ozzzy):
import re
all(re.search(r"\b" + re.escape(word) + r"\b", mystring) for word in mywords)
Answer from Tim Pietzcker on Stack OverflowSimple string operation:
mywords = ("xxx", "yyy", "zzz")
all(x in mystring for x in mywords)
If word boundaries are relevant (i. e. you want to match zzz but not Ozzzy):
import re
all(re.search(r"\b" + re.escape(word) + r"\b", mystring) for word in mywords)
I'd use all and re.search for finding matches.
>>> words = ('xxx', 'yyy' ,'zzz')
>>> text = "sdfjhgdsf zzz sdfkjsldjfds yyy dfgdfgfd xxx"
>>> all([re.search(w, text) for w in words])
True
python - How can I create a regex from a list of words? - Stack Overflow
regex - How to match any string from a list of strings in regular expressions in python? - Stack Overflow
python - List of all words matching regular expression - Stack Overflow
regex - Python regular expression match multiple words anywhere - Stack Overflow
I'd like to be able to match strings which meet these criteria:
['foo' OR 'bar' OR 'Python'] AND ['me', OR 'you' OR 'we']
Use lookaheads. ^(?=.*foo|.*bar|.*Python)(?=.*me|.*you|.*we)
Add \b around the words (e.g. \bfoo\b) if you want them as isolated words, otherwise you get matches like fool)
https://regex101.com/r/dnqSjr/1
If you want to use regex you have to construct the regex string in your code from the lists.
You have to sting all the words together using regex '|' or - and that might not be the most efficient solution depending on the length of the word lists.
But lets start with the base regex:
\bAWORD\b
Will match "AWORD". \b means word boundary, meaning we don't match partial words. In sted of AWORD we can use a list here: (word1|word2|...ect).
This list you can construct in with python, like so:
import re
word_list1 = ['foo', 'bar', 'Python']
word_list2 = ['me', 'you', 'we']
words1 = '|'.join(word_list1)
words2 = '|'.join(word_list2)
regex = r'\b(?:{})\b'
test_str = "foo is a me word"
return (re.search(regex.format(words1), test_str) and
re.search(regex.format(words2), test_str)) != None
.format just inserts the '|' spectated words into the regex in place of '{}'. I am sure the is a more "pythonic" way of doing this, but this is the regex way. :)
Yes, I believe this is possible.
To get you started, this is how I would break down the problem.
Calculate the root by finding the longest possible string that matches the start of all of the declined values:
>>> root = ''
>>> for c in hablar['yo']:
... if all(v.startswith(root + c) for v in hablar.itervalues()):
... root += c
... else:
... break
...
>>> root
'habl'
Whatever's left of the words makes a list of endings.
>>> endings = [v[len(root):] for v in hablar.itervalues()]
>>> print endings
['abas', 'aba', 'abais', 'aba', '\xc3\xa1bamos', 'aban', 'abas']
You may then want to weed out the duplicates:
>>> unique_endings = set(endings)
>>> print unique_endings
set(['abas', 'abais', '\xc3\xa1bamos', 'aban', 'aba'])
Then join these endings together with pipes:
>>> conjoined_endings = '|'.join(unique_endings)
>>> print conjoined_endings
abas|abais|ábamos|aban|aba
Forming the regular expression is a simple matter combining the root and the conjoined_endings string in parentheses:
>>> final_regex = '{}({})'.format(root, conjoined_endings)
>>> print final_regex
habl(abas|abais|ábamos|aban|aba)
I think you need to have a less clever approach
>>> x={
... 'yo': 'hablaba',
... 'tú': 'hablabas',
... 'él': 'hablaba',
... 'nosotros': 'hablábamos',
... 'vosotros': 'hablabais',
... 'ellos': 'hablaban',
... 'vos': 'hablabas',
... }
>>> x
{'t\xc3\xba': 'hablabas', 'yo': 'hablaba', 'vosotros': 'hablabais', '\xc3\xa9l': 'hablaba', 'nosotros': 'habl\xc3\xa1bamos', 'ellos': 'hablaban', 'vos': 'hablabas'}
>>> x.values
<built-in method values of dict object at 0x20e6490>
>>> x.values()
['hablabas', 'hablaba', 'hablabais', 'hablaba', 'habl\xc3\xa1bamos', 'hablaban', 'hablabas']
>>> "|".join(x.values())
'hablabas|hablaba|hablabais|hablaba|habl\xc3\xa1bamos|hablaban|hablabas'
If you just join the hash values with an alternation operator then it should do what you want
Join the list on the pipe character |, which represents different options in regex.
string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."
print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)
Output: ['fun']
You cannot use match as it will match from start.
Using search you will get only the first match. So use findall instead.
Also use lookahead if you have overlapping matches not starting at the same point.
regex module has named lists (sets actually):
#!/usr/bin/env python
import regex as re # $ pip install regex
p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
print('matched')
Here words is just a name, you can use anything you like instead.
.search() methods is used instead of .* before/after the named list.
To emulate named lists using stdlib's re module:
#!/usr/bin/env python
import re
words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
print('matched')
re.escape() is used to escape regex meta-characters such as .*? inside individual words (to match the words literally).
sorted() emulates regex behavior and it puts the longest words first among the alternatives, compare:
>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']
>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolor sit? amet...')
['Lorem', 'ipsum', 'dolor', 'amet']
Take note that in Python 3, where all strings are Unicode, this will also find words that use non-ASCII letters:
>>> import re
>>> myre = re.compile(r"\w{4,}")
>>> myre.findall('Lorem, ipsum! dolör sit? amet...')
['Lorem', 'ipsum', 'dolör', 'amet']
In Python 2, you'd have to use
>>> myre = re.compile(r"\w{4,}", re.UNICODE)
>>> myre.findall(u'Lorem, ipsum! dolör sit? amet...')
[u'Lorem', u'ipsum', u'dol\xf6r', u'amet']
That is a tipical use case for list comprehensions in Python, which can be used for filtering:
text = 'Lorem ipsum dolor sit amet'
result = [word for word in pattern.findall(text) if len(word) > 3]
You've got a few problems there.
First, matches are case-sensitive unless you use the IGNORECASE/I flag to ignore case. So, 'AND' doesn't match 'and'.
Also, unless you use the VERBOSE/X flag, those spaces are part of the pattern. So, you're checking for 'AND ', not 'AND'. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader' is going to match…), and really, you probably wanted \b, not a space (otherwise a sentence starting with 'And another thing' isn't going to match).
Finally, if you think you need .* before and after your pattern and $ and ^ around it, there's a good chance you wanted to use search, findall, or finditer, rather than match.
So:
>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']

Debuggex Demo
Try this:
>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']
a|b means match either a or b
\b represents a word boundary
re.findall(pattern, string) returns an array of all instances of pattern in string
You need to turn your fruit list into the string apple|banana|peach|plum|pineapple|kiwi so that it is a valid regex. The following should do this for you:
fruit_list = ['apple', 'banana', 'peach', 'plum', 'pineapple', 'kiwi']
fruit = re.compile('|'.join(fruit_list))
As ridgerunner pointed out in comments, you will probably want to add word boundaries to the regex, otherwise the regex will match on words like plump since they have a fruit as a substring.
fruit = re.compile(r'\b(?:%s)\b' % '|'.join(fruit_list))
Lastly, if the strings in fruit_list could contain special characters, you will probably want to use re.escape.
'|'.join(map(re.escape, fruit_list))
As you want exact matches, no real need for regex imo...
fruits = ['apple', 'cherry']
sentences = ['green apple', 'yellow car', 'red cherry']
for s in sentences:
if any(f in s for f in fruits):
print s, 'contains a fruit!'
# green apple contains a fruit!
# red cherry contains a fruit!
EDIT: If you need access to the strings that matched:
from itertools import compress
fruits = ['apple', 'banana', 'cherry']
s = 'green apple and red cherry'
list(compress(fruits, (f in s for f in fruits)))
# ['apple', 'cherry']