Sunday 1 May 2016

java - What exactly does .*? do in regex? ".*?([a-m/]*).*"




For ".*?([a-m/]*).*" matching the string "fall/2005", I thought the ".*" will match any character 0 or more times. However, since there is a ? following .*, it only matches for 0 or 1 repetitions. So I thought .*? will match 'f' but I'm wrong.



What is wrong in my logic?


Answer



The ? here acts as a 'modifier' if I can call it like that and makes .* match the least possible match (termed 'lazy') until the next match in the pattern.



In fall/2005, the first .*? will match up to the first match in ([a-m/]*), which is just before f. Hence, .*? matches 0 characters so that ([a-m/]*) will match fall/ and since ([a-m/]*) cannot match anymore, the next part of the pattern .* matches what's left in the string, meaning 2005.




In contrast to .*([a-m/]*).*, you would have .* match as much as possible first (meaning the whole string) and try to go back to make the other terms match. Except that the problem is with the other quantifiers being able to match 0 characters as well, so that .* alone will match the whole string (termed 'greedy').






Maybe a different example will help.



.*ab


In:




aaababaaabab


Here, .* will match as much characters as possible and then try to match ab. Thus, .* will match aaababaaab and the remainder will be matched by ab.



.*?ab


In:




aaababaaabab


Here, .*? will match as little as possible until it can match ab in that regex. The first occurrence of ab is here:



aaababaaabab
^^



And so, .*? matches aa while ab will match ab.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...