regex match list of strings c

stackoverflow.com › questions › 33406313 › how-to-match-any-string-from-a-list-of-strings-in-regular-expressions-in-python

regex - How to match any string from a list of strings in regular expressions in python? - Stack Overflow

codegolf.stackexchange.com › questions › 204955 › resource-for-generating-shortest-regex-given-a-list-of-strings

1 of 5

Join the list on the pipe character |, which represents different options in regex.

string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."

print re.findall(r"(?=("+'|'.join(string_lst)+r"))", x)

Output: ['fun']

You cannot use match as it will match from start. Using search you will get only the first match. So use findall instead.

Also use lookahead if you have overlapping matches not starting at the same point.

2 of 5

regex module has named lists (sets actually):

#!/usr/bin/env python
import regex as re # $ pip install regex

p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
    print('matched')

Here words is just a name, you can use anything you like instead.
.search() methods is used instead of .* before/after the named list.

To emulate named lists using stdlib's re module:

#!/usr/bin/env python
import re

words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
    print('matched')

re.escape() is used to escape regex meta-characters such as .*? inside individual words (to match the words literally).
sorted() emulates regex behavior and it puts the longest words first among the alternatives, compare:

>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']

Stack Exchange

tips - Resource for generating shortest regex given a list of strings - Code Golf Stack Exchange

1 of 2

The Machine Learning Lab at the University of Trieste, Italy, wrote a web app to solve this exact problem, available at: Automatic Generation of Regular Expressions from Examples based on "genetic programming" and published a paper about it. Based on the fact that it had to come from the Machine Learning Lab of a university in Italy, and was worth publishing a paper about, it is probably a pretty hard problem to solve. Doesn't sound like the type of question for Code Golf SE, but I'm new here, so I wouldn't be the expert on that.

2 of 2

I would approach this problem by searching for common subsequences in each pair of strings in the list, and then building regular expressions that include those subsequences. This would be somewhat like the Re-Pair algorithm, but it would build a grammar from a set of strings instead of one string.

For instance, these strings:

["red wolf", "red fox", "gray wolf", "gray fox"]

could be combined into two regular expressions:

"red (wolf|fox)", "gray (wolf|fox)"

which can be combined into one regular expression:

"(red|gray) (wolf|fox)"

But this problem is probably NP-complete, since it is similar to the smallest grammar problem.

Discussions

How can use RegEx to match a list of strings in C#? - Stack Overflow

I need to find all the regex matches from a list of strings. For example, I need to be able to take the string "This foo is a foobar" and match any instances of either "foo" or "bar". What would the More on stackoverflow.com

stackoverflow.com

November 11, 2011

c# - How to filter a list of strings matching a pattern - Stack Overflow

I have a list of strings (file names actually) and I'd like to keep only those that match a filter expression like: \*_Test.txt. What would be the best to achieve this? Here is the answer that I ... More on stackoverflow.com

stackoverflow.com

regex - Match list of words without the list of chars around - Stack Overflow

I have this regex (?:$|^| )(one|common|word|or|another)(?:$|^| ) which matches fine unless the two words are next to each other. One one's more word'word common word or another word more another ... More on stackoverflow.com

stackoverflow.com

c# - Is it possible to write a regex that does one search then uses its results to do another search? - Software Engineering Stack Exchange

Basically, if an item matches the pattern in the first list, I want to search for that letter prefix and number suffix in the second list. Is there a term for using regex to do this kind of search? I know that I can just write my own search by using string manipulation to get the letter and ... More on softwareengineering.stackexchange.com

softwareengineering.stackexchange.com

stackoverflow.com › questions › 20306367 › how-do-you-find-all-matches-in-regexes-with-c

regex - How do you find all matches in regexes with C? - Stack Overflow

Assuming you're using POSIX regcomp/regexec, each call to regexec will only find the first match in the string. To find the next, use the end position of the first match (the 0th entry of the regmatch_t array filled by regexec) as an offset to apply to the string before searching it again. Repeat until you have no more matches. You can write a function to do this if you want.

oreilly.com › library › view › regular-expressions-cookbook › 9781449327453 › ch03s11.html

The C standard library (as specified by ISO/IEC 9899) does not include a regular expression module, so you will need to use an external library. Good choices include regex.h from GNU libc, as detailed in /questions/635756 and PCRE, as detailed in /questions/1421785.

O'Reilly

3.10. Retrieve a List of All Matches - Regular Expressions Cookbook, 2nd Edition [Book]

August 27, 2012 - Construct a Regex object if you want to use the same regular expression with a large number of strings: Dim RegexObj As New Regex("\d+") Dim MatchList = RegexObj.Matches(SubjectString)

Authors Jan GoyvaertsSteven Levithan

Published 2012

Pages 609

stackoverflow.com › questions › 414528 › how-can-use-regex-to-match-a-list-of-strings-in-c

How can use RegEx to match a list of strings in C#? - Stack Overflow

1 of 1

I'm a little unsure of what your actual question is. To match "foo" or "bar", you'd simply want "foo|bar" for your pattern. If you want to do this to a list of strings, you'd likely want to check each string individually—you could join the strings first and check that, but I'm not sure this would be of much use. If you want to get the exact text that matched your pattern, you should surround the pattern in parentheses—such as "([fg]oo|[bt]ar)", which would match "foo", "goo", "bar", or "tar"—then use the Groups property of the Match object to retrieve these captures, so you can determine exactly which word matched. Groups[1] is the first captured value (that is, the value in the first set of parentheses in your pattern). Groups[0] is the entire match. You can also name your captures—"(?<word>[fg]oo|[bt]ar)"—and refer to them by name—Groups["word"]. I would recommend reading through the documentation on regular expression language elements.

As for sanitizing the input, there is no input that will "break" the regex. It might prevent a match, but that's really kinda what regexes are all about, isn't it?

stackoverflow.com › questions › 7013461 › how-to-filter-a-list-of-strings-matching-a-pattern

c# - How to filter a list of strings matching a pattern - Stack Overflow

You probably want to use a regular expression for this if your patterns are going to be complex....

you could either use a proper regular expression as your filter (e.g for your specific example it would be new Regex(@"^.*_Test\.txt$") or you could apply a conversion algorithm.

Either way you could then just use linq to apply the regex.

for example

var myRegex=new Regex(@"^.*_Test\.txt$");
List<string> resultList=files.Where(myRegex.IsMatch).ToList();

Some people may think the above answer is incorrect, but you can use a method group instead of a lambda. If you wish the full lamda you would use:

var myRegex=new Regex(@"^.*_Test\.txt$");
List<string> resultList=files.Where(f => myRegex.IsMatch(f)).ToList();

or non Linq

List<string> resultList=files.FindAll(delegate(string s) { return myRegex.IsMatch(s);});

if you were converting the filter a simple conversion would be

 var myFilter="*_Test.txt";
 var myRegex=new Regex("^" + myFilter.Replace("*",".*") +"$");

You could then also have filters like "*Test*.txt" with this method.

However, if you went down this conversion route you would need to make sure you escaped out all the special regular expression chars e.g. "." becomes @".", "(" becomes @"(" etc.......

Edit -- The example replace is TOO simple because it doesn't convert the . so it would find "fish_Textxtxt" so escape atleast the .

string myFilter="*_Test.txt";
foreach(char x in @"\+?|{[()^$.#") {
  myFilter = myFilter.Replace(x.ToString(),@"\"+x.ToString());
}
Regex myRegex=new Regex(string.Format("^{0}$",myFilter.Replace("*",".*")));

developer.mozilla.org › en-US › docs › Web › JavaScript › Guide › Regular_expressions

Have you tried LINQ:

List<string> resultList = files.Where(x => x.EndsWith("_Test.txt")).ToList();

or if you are running this on some old/legacy .NET version (< 3.5):

List<string> resultList = files.FindAll(delegate(string s) { 
    return s.EndsWith("_Test.txt"); 
});

Mozilla

Regular expressions - JavaScript - MDN Web Docs

2 weeks ago - For example, to match a single "a" followed by zero or more "b"s followed by "c", you'd use the pattern /ab*c/: the * after "b" means "0 or more occurrences of the preceding item." In the string "cbbabbbbcdebc", this pattern will match the substring "abbbbc". The following pages provide lists of the different special characters that fit into each category, along with descriptions and examples.

stackoverflow.com › questions › 21448139 › match-list-of-words-without-the-list-of-chars-around

regex - Match list of words without the list of chars around - Stack Overflow

softwareengineering.stackexchange.com › questions › 309562 › is-it-possible-to-write-a-regex-that-does-one-search-then-uses-its-results-to-do

1 of 2

Since your capture groups define explicitly one character on either side of the common word, it's looking for space word space and then when it doesn't find another space, it fails.

In this case, since you don't want to match all the characters word boundary's would catch (period, apostrophe, etc.) you need to use a bit of trickery with lookaheads, lookbehinds, and non-capture groups. Try this:

(?:^|(?<= ))(one|common|word|or|another)(?:(?= )|$)

http://regex101.com/r/cM9hD8

Word boundaries are still simpler to implement, so for reference sake, you could also do this (though it would include ', ., etc.).

\b(one|common|word|or|another)\b

2 of 2

It will not match one's , someone ,etc...

Check DEMO

Find elsewhere

Google Bing Mojeek

Stack Exchange

c# - Is it possible to write a regex that does one search then uses its results to do another search? - Software Engineering Stack Exchange

scaler.com › home › topics › how to use regex in c?

1 of 1

Regular expressions work on strings, not on a "string list" and not multiple string lists. Wherever you need to process more than one string, you will typically need some environmental code to do the processing. For your example, this code has to apply the regex to every element of the first list, then collect the results and use this results to process the second list.

Said that, the usual approach to apply a regexp to a list of strings is to concatenate them by a separator character like "newline". To concatenate two lists and distinguish them, you would need at least a special "magic" character or word for separating the first list from the second, which is not part of the list. Using such magic can cause some maintenance headaches if you are not very careful, nevertheless by combining this with backreferences, this can be used to solve your problem.

For example, numbered backreferences like \1 to \9 refer to other capturing groups found before. Lets assume you used "###" as a separator for the two lists, a regexp along the lines of

  ^([A-Z])\W*([0-9]+)$.*###.*^(\1\W*\+\+\W*\2)$
    ^         ^          ^     ^           ^
    |         |          |     |           |
   first    second       |    backrefs to first
   group    group        |    or second group
                       lists
                       separator

might be a first approximation for what you are looking for (beware of bugs, I did not test it). Put this into a global regexp search, then it should produce all pairs of matches which fit to your constraints.

As a final remark: the resulting code may be very compact, nevertheless harder to maintain (and probably slower) than a more explicit solution where you process the two lists individually.

Scaler

How to use regex in C?- Scaler Topics

May 4, 2023 - Talking about POSIX is a widely known library in the C language and most of its classes are present inside the regex.h header file and are primarily used for the implementation of regular expressions. Let's have a look at the below table, here we are having various POSIX classes and with respect to them, their character equivalent representations and their follow-up descriptions are given explaining what each class will return as a match. ... [:cntrl:] - It looks for the match of control characters in the given target string.

reddit.com › r/regex › match string only if part of a list

r/regex on Reddit: match string only if part of a list

December 3, 2024 -

**** RESOLVED ****

Hi,

I’m not sure if this is possible:

I’m looking for specific strings that contain an "a" with this regex: (flavour is c# (.net))

([^\s]+?)a([^\s]+?)\b

but they should only match if the found word is part of a list. Some kind of opposite of negative lookbehind.

So the above regex captures all kind of strings with "a" in them, but it should only match if the string is part of

"fass" or "arbecht" as I need to replace the a by some other string.

example: it should match "verfassen" or "verarbeit" but not "passen"

Best regards,

Pascal

Edit: Solution:

These two versions work fine and credits and many thanks go to:

u/gumnos: \b(?=\S*(?:fass|arbeit))(\S*?)a(\S*)\b

u/rainshifter (with some editing to match what I really need): (?<=(?:\b(?=\w*(?:fass|arbeit))|\G(?<!^))\w*)(\S*?)a(\S*)\b

I'm not sure why "verarbeit" should match as it doesn't contain either "fass" or "arbecht"…

Perhaps “verarbeit” in the example above, should be “verarbecht”; a typo error?

Quora

quora.com › What-is-matching-a-list-of-regex-to-every-string-in-a-list-one-by-one-Python

What is matching a list of regex to every string in a list one by one (Python)? - Quora

Answer (1 of 2): The answer is (as usual) you just do that. Suppose I have a list of regular expression objects made with re.compile called pats and a list of strings called strs. Then: [code]for s in strs: x = [p.match(s) for p in pats] ... do something with x [/code]x is a list of Nones or ...

stackoverflow.com › questions › 15238468 › c-regular-expressions-extracting-the-actual-matches

regex - C Regular Expressions: Extracting the Actual Matches - Stack Overflow

en.wikipedia.org › wiki › Regular_expression

1 of 2

There are quite a lot of regular expression packages, but yours seems to match the one in POSIX: regcomp() etc.

The two structures it defines in <regex.h> are:

regex_t containing at least size_t re_nsub, the number of parenthesized subexpressions.
regmatch_t containing at least regoff_t rm_so, the byte offset from start of string to start of substring, and regoff_t rm_eo, the byte offset from start of string of the first character after the end of substring.

Note that 'offsets' are not pointers but indexes into the character array.

The execution function is:

int regexec(const regex_t *restrict preg, const char *restrict string, size_t nmatch, regmatch_t pmatch[restrict], int eflags);

Your printing code should be:

for (int i = 0; i <= r.re_nsub; i++)
{
    int start = m[i].rm_so;
    int finish = m[i].rm_eo;
//  strcpy(matches[ind], ("%.*s\n", (finish - start), p + start));  // Based on question
    sprintf(matches[ind], "%.*s\n", (finish - start), p + start);   // More plausible code
    printf("Storing:  %.*s\n", (finish - start), matches[ind]);     // Print once
    ind++;
    printf("%.*s\n", (finish - start), p + start);                  // Why print twice?
}

Note that the code should be upgraded to ensure that the string copy (via sprintf()) does not overflow the target string — maybe by using snprintf() instead of sprintf(). It is also a good idea to mark the start and end of a string in the printing. For example:

    printf("<<%.*s>>\n", (finish - start), p + start);

This makes it a whole heap easier to see spaces etc.

[In future, please attempt to provide an MCVE (Minimal, Complete, Verifiable Example) or SSCCE (Short, Self-Contained, Correct Example) so that people can help more easily.]

This is an SSCCE that I created, probably in response to another SO question in 2010. It is one of a number of programs I keep that I call 'vignettes'; little programs that show the essence of some feature (such as POSIX regexes, in this case). I find them useful as memory joggers.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <regex.h>

#define tofind    "^DAEMONS=\\(([^)]*)\\)[ \t]*$"

int main(int argc, char **argv)
{
    FILE *fp;
    char line[1024];
    int retval = 0;
    regex_t re;
    regmatch_t rm[2];
    //this file has this line "DAEMONS=(sysklogd network sshd !netfs !crond)"
    const char *filename = "/etc/rc.conf";

    if (argc > 1)
        filename = argv[1];

    if (regcomp(&re, tofind, REG_EXTENDED) != 0)
    {
        fprintf(stderr, "Failed to compile regex '%s'\n", tofind);
        return EXIT_FAILURE;
    }
    printf("Regex: %s\n", tofind);
    printf("Number of captured expressions: %zu\n", re.re_nsub);

    fp = fopen(filename, "r");
    if (fp == 0)
    {
        fprintf(stderr, "Failed to open file %s (%d: %s)\n", filename, errno, strerror(errno));
        return EXIT_FAILURE;
    }

    while ((fgets(line, 1024, fp)) != NULL)
    {
        line[strcspn(line, "\n")] = '\0';
        if ((retval = regexec(&re, line, 2, rm, 0)) == 0)
        {
            printf("<<%s>>\n", line);
            // Complete match
            printf("Line: <<%.*s>>\n", (int)(rm[0].rm_eo - rm[0].rm_so), line + rm[0].rm_so);
            // Match captured in (...) - the \( and \) match literal parenthesis
            printf("Text: <<%.*s>>\n", (int)(rm[1].rm_eo - rm[1].rm_so), line + rm[1].rm_so);
            char *src = line + rm[1].rm_so;
            char *end = line + rm[1].rm_eo;
            while (src < end)
            {
                size_t len = strcspn(src, " ");
                if (src + len > end)
                    len = end - src;
                printf("Name: <<%.*s>>\n", (int)len, src);
                src += len;
                src += strspn(src, " ");
            }
        }
    } 
    return EXIT_SUCCESS;
}

This was designed to find a particular line starting DAEMONS= in a file /etc/rc.conf (but you can specify an alternative file name on the command line). You can adapt it to your purposes easily enough.

2 of 2

Since g++ regex is bugged until who knows when, you can use my code instead (License: AGPL, no warranty, your own risk, ...)

/**
 * regexp (License: AGPL3 or higher)
 * @param re extended POSIX regular expression
 * @param nmatch maximum number of matches
 * @param str string to match
 * @return An array of char pointers. You have to free() the first element (string storage). the second element is the string matching the full regex, then come the submatches.
*/
char **regexp(char *re, int nmatch, char *str) {
  char **result;
  char *string;
  regex_t regex;
  regmatch_t *match;
  int i;

  match=malloc(nmatch*sizeof(*match));
  if (!result) {
    fprintf(stderr, "Out of memory !");
    return NULL;
  }

  if (regcomp(&regex, re, REG_EXTENDED)!=0) {
    fprintf(stderr, "Failed to compile regex '%s'\n", re);
    return NULL;
  }

  string=strdup(str);
  if (regexec(&regex,string,nmatch,match,0)) {
#ifdef DEBUG
    fprintf(stderr, "String '%s' does not match regex '%s'\n",str,re);
#endif
    free(string);
    return NULL;
  }

  result=malloc(sizeof(*result));
  if (!result) {
    fprintf(stderr, "Out of memory !");
    free(string);
    return NULL;
  }

  for (i=0; i<nmatch; ++i) {
    if (match[i].rm_so>=0) {
      string[match[i].rm_eo]=0;
      ((char**)result)[i]=string+match[i].rm_so;
#ifdef DEBUG
      printf("%s\n",string+match[i].rm_so);
#endif                                                                                                                                                                                                                                                   
    } else {                             
      ((char**)result)[i]="";            
    }
  }

  result[0]=string;                      

  return result;                         

}

Wikipedia

Regular expression - Wikipedia

February 28, 2026 - A match is made, not when all the atoms of the string are matched, but rather when all the pattern atoms in the regex have matched. The idea is to make a small pattern of characters stand for a large number of possible strings, rather than compiling a large list of all the literal possibilities.

History Patterns Basic concepts Formal language theory Syntax Patterns for non-regular languages Implementations and running times Unicode Language support Uses Examples Induction

Pyladiespdx

pyladiespdx.github.io › listcomps

Get Comfortable with List Comprehensions and Regex | Pyladiespdx

group() : won’t change matching behavior, but gives you the ability to extract the pattern captured inside as a logical group ex: would give you the ability to extract either the email name or email host from the pattern match, where: ... findall() : a very useful re module function that finds all of the matches and returns them as a list of strings

stackoverflow.com › questions › 4604522 › regexp-in-c-match-group

regex - Regexp in C - match group - Stack Overflow

I assume your regex_match is some combination of regcomp and regexec. To enable grouping, you need to call regcomp with the REG_EXTENDED flag, but without the REG_NOSUB flag (in the third argument).

regex_t compiled;
regcomp(&compiled, "(match1)|(match2)|(match3)", REG_EXTENDED);

Then allocate space for the groups. The number of groups is stored in compiled.re_nsub. Pass this number to regexec:

size_t ngroups = compiled.re_nsub + 1;
regmatch_t *groups = malloc(ngroups * sizeof(regmatch_t));
regexec(&compiled, str, ngroups, groups, 0);

Now, the first invalid group is the one with a -1 value in both its rm_so and rm_eo fields:

size_t nmatched;
for (nmatched = 0; nmatched < ngroups; nmatched++)
    if (groups[nmatched].rm_so == (size_t)(-1))
        break;

nmatched is the number of parenthesized subexpressions (groups) matched. Add your own error checking.

You could have them give you a array of strings that contain your regexps and test each one of them.

//count is the number of regexps provided
int give_me_number_of_regex_group(const char *needle,const char** regexps, int count ){
  for(int i = 0; i < count; ++i){
    if(regex_match(needle, regexp[i])){
      return i;
    }
  }
  return -1; //didn't match any
}

or am i overseeing something?

stackoverflow.com › questions › 1085083 › how-to-use-regular-expressions-in-c

regex - How to use regular expressions in C - Stack Overflow