fnmatch is pretty simple in Python -- however it will output "True" whether there is 1, or 100 words, between the words you've put the wildcard between.
I'd like to be more narrow than this -- and be able to use some kind of wildcard searching library that let's me specify HOW MANY words I want to be wildcards.
So if I used: "the * cat", it would ONLY include single words like "the ugly cat" or "the furry cat"
But if I used something like: "the ** cat", it would include ONLY two words like "the very ugly cat" or "the extremely furry cat"
Is there any python library that allows this kind of fine-tuned wildcard functionality?
Thanks!
Use fnmatch:
import fnmatch
lst = ['this','is','just','a','test']
filtered = fnmatch.filter(lst, 'th?s')
If you want to allow _ as a wildcard, just replace all underscores with '?' (for one character) or * (for multiple characters).
If you want your users to use even more powerful filtering options, consider allowing them to use regular expressions.
Regular expressions are probably the easiest solution to this problem:
import re
regex = re.compile('th.s')
l = ['this', 'is', 'just', 'a', 'test']
matches = [string for string in l if re.match(regex, string)]
Videos
Regex, like the accepted answer suggests, is one way of handling the problem. Although, if you need a simpler pattern (such as Unix shell-style wildcards), then the fnmatch built in library can help:
Expressions:
*- matches everything?- matches any single character[seq]- matches any character inseq[!seq]- matches any character not inseq
So for example, trying to find anything that would match with localhost:
import fnmatch
my_pattern = "http://localhost*"
name_to_check = "http://localhost:8080"
fnmatch.fnmatch(name_to_check, my_pattern) # True
The nice part of this is that / is not considered a special character, so for filename/URL matching this works out quite well without having to pre-escape all slashes!
It looks like you're essentially implementing a subset of regular expressions. Luckily, Python has a library for that built-in! If you're not familiar with how regular expressions (or, as their friends call them, regexes) work, I highly recommend you read through the documentation for them.
In any event, the function re.search is, I think, exactly what you're looking for. It takes, as its first argument, a pattern to match, and, as its second argument, the string to match it in. If the pattern is matched, search returns an SRE_Match object, which, conveniently, has a #start() method that returns the index at which the match starts.
To use the data from your example:
import re
start_index = re.search(r'x.z', 'xxxxxgzg').start()
Note that, in regexes, . - not * -- is the wildcard, so you'll have to replace them in the pattern you're using.
I would suggest using the input string and replace to generate a simple regular expression:
>>> '1**0*'.replace('*', '[01]')
'1[01][01]0[01]'
Now that can be used in whatever way you want:
>>> import re
>>> pattern = '1**0*'.replace('*', '[01]')
>>> bool(re.match(pattern, '00000'))
False
>>> bool(re.match(pattern, '10000'))
True
If you aren't familiar with regular expressions, you might want to read a tutorial or two. But the fundamental idea is that any one of the characters between brackets is allowed. So a [01] matches either a 1 or a 0, as you requested in your question.
I'd use zip instead of Regular Expressions. It lines up all of the elements of both strings and lets you loop through each pair.
def verify(pat, inp):
for n,h in zip(pat, inp):
if n == '*':
if h not in ('0', '1'):
return False
elif h not in ('0', '1'):
return False
elif n != h:
return False
return True
# Example use:
>>> verify('**1', '001')
True
>>> verify('**1', '101')
True
>>> verify('**1', '000')
False
A shorter way of doing it by @DSM.
def verify(n, h):
return all(c0 == c1 or (c0 == '*' and c1 in '01') for c0, c1 in zip(n, h))
# or even shorter
verify = lambda n,h: all(c0 == c1 or (c0 == '*' and c1 in '01') for c0, c1 in zip(n, h))