Wednesday 24 February 2016

java - Using capturing groups with reluctant, greedy, and possessive quantifiers



I was practicing regular expressions of java in the tutorial of Oracle. In order to understand greedy, reluctant, and possessive quantifiers better, I created some examples. My question is how those quantifiers work while capturing groups. I didn't understand using quantifiers in that manner, for example, reluctant quantifier looks as if it doesn't work at all. Also, I searched a lot in the Internet and only saw expressions like (.*?). Is there a reason why people usually use quantifiers with that syntax, not something like "(.foo)??"?



Here is the reluctant example:





Enter your regex: (.foo)??



Enter input string to search: xfooxxxxxxfoo



I found the text "" starting at index 0 and ending at index 0.



I found the text "" starting at index 1 and ending at index 1.



I found the text "" starting at index 2 and ending at index 2.




I found the text "" starting at index 3 and ending at index 3.



I found the text "" starting at index 4 and ending at index 4.



I found the text "" starting at index 5 and ending at index 5.



I found the text "" starting at index 6 and ending at index 6.



I found the text "" starting at index 7 and ending at index 7.




I found the text "" starting at index 8 and ending at index 8.



I found the text "" starting at index 9 and ending at index 9.



I found the text "" starting at index 10 and ending at index 10.



I found the text "" starting at index 11 and ending at index 11.



I found the text "" starting at index 12 and ending at index 12.




I found the text "" starting at index 13 and ending at index 13.




For reluctant, shouldn't it show "xfoo" for index 0 and 4 ? And here is the possessive one:




Enter your regex: (.foo)?+



Enter input string to search: afooxxxxxxfoo




I found the text "afoo" starting at index 0 and ending at index 4



I found the text "" starting at index 4 and ending at index 4.



I found the text "" starting at index 5 and ending at index 5.



I found the text "" starting at index 6 and ending at index 6.



I found the text "" starting at index 7 and ending at index 7.




I found the text "" starting at index 8 and ending at index 8.



I found the text "xfoo" starting at index 9 and ending at index 13.



I found the text "" starting at index 13 and ending at index 13.




And for possessive, shouldn't it try the input only for one time ? I'm really confused especially by this one because of trying every possibility.




Thanks in advance !


Answer



The regex engine checks (basically) every character of your string one by one, starting from the left, trying to make them fit in your pattern. It returns the first match it finds.



A reluctant quantifier applied to a subpattern means that the regex engine will give priority to (as in, try first) the following subpattern.



See what happens step by step with .*?b on aabab:



aabab # we try to make '.*?' match zero '.', skipping it directly to try and 
^ # ... match b: that doesn't work (we're on a 'a'), so we reluctantly

# ... backtrack and match one '.' with '.*?'
aabab # again, we by default try to skip the '.' and go straight for b:
^ # ... again, doesn't work. We reluctantly match two '.' with '.*?'
aabab # FINALLY there's a 'b'. We can skip the '.' and move forward:
^ # ... the 'b' in '.*?b' matches, regex is over, 'aab' is a general match


In your pattern, there's no equivalent to the b. The (.foo) is optional, the engine gives priority to the following part of the pattern.



Which is nothing, and that matches an empty string: an overall match is found, and it's always an empty string.







Regarding the possessive quantifiers, you're confused about what they do. They have no direct incidence on the number of matches: it's not clear chat tool you use to apply your regex but it looks for global matches and that's why it doesn't stop at the first match.



See http://www.regular-expressions.info/possessive.html for more info on them.



Also, as HamZa pointed out, https://stackoverflow.com/a/22944075 is becoming a great reference for regex related questions.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...