Monday, 22 May 2017

Generating a list of values a regex COULD match in Python



I'm trying to use a regex as an input, and from there generate all the possible values that the regex would match.



So, for example, if the regex is "three-letter words starting with a, and ending in c," then the code would generate a list with the values [aac, abc, acc, adc, a1c....].



Is there an easy way to do this? I'm using python.


Answer




Here's a brute force solution that should work. It has a running time of O(L^max_length) (where L is the size of the alphabet), so use it at your own risk.



def all_matching_strings(alphabet, max_length, regex):
"""Find the list of all strings over 'alphabet' of length up to 'max_length' that match 'regex'"""

if max_length == 0: return

L = len(alphabet)
for N in range(1, max_length+1):
indices = [0]*N

for z in xrange(L**N):
r = ''.join(alphabet[i] for i in indices)
if regex.match(r):
yield(r)

i = 0
indices[i] += 1
while (i indices[i] = 0
i += 1

if i
return


example usage:



alphabet = 'abcdef1234567890'
import re
regex = re.compile('f*[1-3]+$')

for r in all_matching_strings(alphabet, 5, regex):
print r


which would output all strings up to length 5, starting with a sequence of f's, and then a non empty sequence of 1-3, then ending:



1
2
3
f1

11
21
31
f2
12
22
32
f3
13
23

33
ff1
[more output omitted...]

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...