Since your capture groups define explicitly one character on either side of the common word, it's looking for space word space and then when it doesn't find another space, it fails.
In this case, since you don't want to match all the characters word boundary's would catch (period, apostrophe, etc.) you need to use a bit of trickery with lookaheads, lookbehinds, and non-capture groups. Try this:
(?:^|(?<= ))(one|common|word|or|another)(?:(?= )|$)
http://regex101.com/r/cM9hD8
Word boundaries are still simpler to implement, so for reference sake, you could also do this (though it would include ', ., etc.).
\b(one|common|word|or|another)\b
Answer from brandonscript on Stack OverflowSince your capture groups define explicitly one character on either side of the common word, it's looking for space word space and then when it doesn't find another space, it fails.
In this case, since you don't want to match all the characters word boundary's would catch (period, apostrophe, etc.) you need to use a bit of trickery with lookaheads, lookbehinds, and non-capture groups. Try this:
(?:^|(?<= ))(one|common|word|or|another)(?:(?= )|$)
http://regex101.com/r/cM9hD8
Word boundaries are still simpler to implement, so for reference sake, you could also do this (though it would include ', ., etc.).
\b(one|common|word|or|another)\b
You can use (?:[\s]|^)(one|common|word|or|another)(?=[\s]|$) instead.
It will not match one's , someone ,etc...
Check DEMO
I'd like to be able to match strings which meet these criteria:
['foo' OR 'bar' OR 'Python'] AND ['me', OR 'you' OR 'we']
Use lookaheads. ^(?=.*foo|.*bar|.*Python)(?=.*me|.*you|.*we)
Add \b around the words (e.g. \bfoo\b) if you want them as isolated words, otherwise you get matches like fool)
https://regex101.com/r/dnqSjr/1
If you want to use regex you have to construct the regex string in your code from the lists.
You have to sting all the words together using regex '|' or - and that might not be the most efficient solution depending on the length of the word lists.
But lets start with the base regex:
\bAWORD\b
Will match "AWORD". \b means word boundary, meaning we don't match partial words. In sted of AWORD we can use a list here: (word1|word2|...ect).
This list you can construct in with python, like so:
import re
word_list1 = ['foo', 'bar', 'Python']
word_list2 = ['me', 'you', 'we']
words1 = '|'.join(word_list1)
words2 = '|'.join(word_list2)
regex = r'\b(?:{})\b'
test_str = "foo is a me word"
return (re.search(regex.format(words1), test_str) and
re.search(regex.format(words2), test_str)) != None
.format just inserts the '|' spectated words into the regex in place of '{}'. I am sure the is a more "pythonic" way of doing this, but this is the regex way. :)
regular expressions: matching all words containing a specific list of letters - Emacs Stack Exchange
Regular Expression to find certain words in a document - regular-expressions - Drafts Community
regex - How to match any string from a list of strings in regular expressions in python? - Stack Overflow
regex - Match any one item in a list - Stack Overflow
I suggest bookmarking the MSDN Regular Expression Quick Reference
you want to achieve a case insensitive match for the word "rocket" surrounded by non-alphanumeric characters. A regex that would work would be:
\W*((?i)rocket(?-i))\W*
What it will do is look for zero or more (*) non-alphanumeric (\W) characters, followed by a case insensitive version of rocket ( (?i)rocket(?-i) ), followed again by zero or more (*) non-alphanumeric characters (\W). The extra parentheses around the rocket-matching term assigns the match to a separate group. The word rocket will thus be in match group 1.
UPDATE 1:
Matt said in the comment that this regex is to be used in python. Python has a slightly different syntax. To achieve the same result in python, use this regex and pass the re.IGNORECASE option to the compile or match function.
\W*(rocket)\W*
On Regex101 this can be simulated by entering "i" in the textbox next to the regex input.
UPDATE 2 Ismael has mentioned, that the regex is not quite correct, as it might match "1rocket1". He posted a much better solution, namely
(?:^|\W)rocket(?:$|\W)
I think the look-aheads are overkill in this case, and you would be better off using word boundaries with the ignorecase option,
\brocket\b
In other words, in python:
>>> x="rocket's"
>>> y="rocket1."
>>> c=re.compile(r"\brocket\b",re.I) # with the ignorecase option
>>> c.findall(y)
[]
>>> c.findall(x)
['rocket']
You're looking for something that can be found by a regexp (a word), but which should additionally obey some constraint.
In this case the constraint is a form of subset-relation:
(defun string-subset-p (s1 s2)
"Return t, if S1 is a subset of S2, when viewed as char-sets."
(let ((s2-chars (append s2 nil)))
(cl-every (lambda (ch)
(memq ch s2-chars))
(append s1 nil))))
When put together (in the most trivial way):
(defun search-word-containg-chars-forward (chars)
(interactive "sChars: ")
(while (and (re-search-forward "\\w+")
(not (string-subset-p chars (match-string 0))))))
More efficient implementations for the string-subset-p function are
left as an exercise to the reader. Though, chances are, that it won't
really matter.
Here's one way to implement some equivalent to the "AND"ing of regexp needed for this specific application.
The word at point is first character sorted so that dollars becomes adllors in a temporary buffer. That temporary sorted string is then matched with occurrence of any optional alphabet followed by d, followed by any optional alphabet followed by l, followed by any optional alphabet followed by s, followed by any optional alphabet. If that match is true, the word is highlighted, else a message is displayed.
To do this over the whole buffer, do M-x my/match-word-whole-buffer.
(defun my/match-word ()
"Matches words containing all chars d, l, s in any order: dollars solid
Match will fail if a word is missing any of those characters. e.g. dollar"
(interactive)
(let ((this-word (thing-at-point 'word)); get the word at point
(match))
(with-temp-buffer
(insert this-word)
(sort-regexp-fields nil "\\w" "\\&" (point-min) (point-max)) ; sort chars in word
(beginning-of-buffer)
;; Now that the chars are sorted alphabetically, you can search for
;; the letters in alphabetical order: d, l, s
(if (looking-at "\\w*[d]+\\w*[l]+\\w*[s]+\\w*")
(setq match t)
(setq match nil)))
(when match
(highlight-symbol-at-point))))
(defun my/match-word-whole-buffer ()
(interactive)
(beginning-of-buffer)
(forward-word)
(while (not (eobp))
(when (string-match "\\w\\{3,\\}" (thing-at-point 'word))
(my/match-word))
(forward-word)))
Join the list on the pipe character |, which represents different options in regex.
string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."
print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)
Output: ['fun']
You cannot use match as it will match from start.
Using search you will get only the first match. So use findall instead.
Also use lookahead if you have overlapping matches not starting at the same point.
regex module has named lists (sets actually):
#!/usr/bin/env python
import regex as re # $ pip install regex
p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
print('matched')
Here words is just a name, you can use anything you like instead.
.search() methods is used instead of .* before/after the named list.
To emulate named lists using stdlib's re module:
#!/usr/bin/env python
import re
words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
print('matched')
re.escape() is used to escape regex meta-characters such as .*? inside individual words (to match the words literally).
sorted() emulates regex behavior and it puts the longest words first among the alternatives, compare:
>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']
A simple solution would be to just remove elements from the set as you find matches, and check if the set is empty at the end.
for (String targetValue : targetValues)
if (result.containsIgnoreCase(targetValue))
targetValues.remove(targetValue);
Boolean matchesAll = !targetValues.isEmpty();
As a regular expression, you could do this:
Pattern p = Pattern.compile('(?i)(one|two|three)');
Set<String> expected = new Set<String>{'one','two','three'};
Set<String> matches = new Set<String>();
Matcher m = p.matcher(result);
while(m.find()) {
matches.add(m.group(0).toLowerCase());
}
if(matches == expected) {
// All matches were found //
}