Tuesday, 12 July 2016

get value of all perl regex capture groups



The issue: I'm coding a library which receives user supplied regex which contains unknown number of capture groups to be run against other input, and I want to extract value of all capture groups concatenated in one string (for further processing elsewhere).



It is trivial if number of capture groups is known in advance, as I just specify them:



#!/usr/bin/perl -w
my $input = `seq -s" " 100 200`;

my $user_regex =
qr/100(.*)103(.*)107(.*)109(.*)111(.*)113(.*)116(.*)120(.*)133(.*)140(.*)145/;

if ($input =~ $user_regex) { print "$1 $2 $3 $4 $5 $6 $7 $8 $9 $10\n"; }


correctly produces (ignore the extra whitespace):



 101 102   104 105 106   108   110   112   114 115   117 118 119 
121 122 123 124 125 126 127 128 129 130 131 132

134 135 136 137 138 139 141 142 143 144


However, if there are more than 10 capture groups I lose data if I don't modify the code. As the number of capture groups is unknown, currently I go with hundreds of manually specified matches ("$1" to "$200") under no warnings pragma and hope it is enough, but it does not seem particularity clean or robust.



Ideally, I'd like something which works like values %+ does for named capture groups, but for non-named capture groups. Is it possible in perl 5.24? Or what less kludgy approach would you recommend for retrieving content of all numbered capture groups?


Answer



Maybe you can just capture into an array?



my @captured = $input =~ $user_regexp;

if( @captured ) { print join " ", @captured; print "\n"; }


If you absolutely must use the numbered capture variables, use eval:



my $input = "abc";
my $re = qr/(.)(.)(.)/;
if( $input =~ $re){
my $num = 1;
print "captured \$$num = ". eval("\$$num") ."\n" and $num++

while eval "defined \$$num";
}


Or just:



my $input = "abc";
my $re = qr/(.)(.)(.)/;
if( $input =~ $re){
my $num = 1;

print "captured \$$num = $$num\n" and $num++ while defined $$num;
}


...but this last example with scalar references doesn't work under use strict.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...