For ".*?([a-m/]*).*"
matching the string "fall/2005"
, I thought the ".*"
will match any character 0 or more times. However, since there is a ?
following .*
, it only matches for 0 or 1 repetitions. So I thought .*?
will match 'f'
but I'm wrong.
What is wrong in my logic?
Answer
The ?
here acts as a 'modifier' if I can call it like that and makes .*
match the least possible match (termed 'lazy') until the next match in the pattern.
In fall/2005
, the first .*?
will match up to the first match in ([a-m/]*)
, which is just before f
. Hence, .*?
matches 0 characters so that ([a-m/]*)
will match fall/
and since ([a-m/]*)
cannot match anymore, the next part of the pattern .*
matches what's left in the string, meaning 2005
.
In contrast to .*([a-m/]*).*
, you would have .*
match as much as possible first (meaning the whole string) and try to go back to make the other terms match. Except that the problem is with the other quantifiers being able to match 0 characters as well, so that .*
alone will match the whole string (termed 'greedy').
Maybe a different example will help.
.*ab
In:
aaababaaabab
Here, .*
will match as much characters as possible and then try to match ab
. Thus, .*
will match aaababaaab
and the remainder will be matched by ab
.
.*?ab
In:
aaababaaabab
Here, .*?
will match as little as possible until it can match ab
in that regex. The first occurrence of ab
is here:
aaababaaabab
^^
And so, .*?
matches aa
while ab
will match ab
.
No comments:
Post a Comment