simple regex question
Danny
daniel.robert at acm.org
Thu Jan 11 16:06:42 EST 2007
Quoting Dwight E Chadbourne <dwighte.chadbourne at stopandshop.com>:
> Hi all. I want the 20 digit hash in this text.
>
> d5:filesd20:xxxxxxxxxxxxxxxxxxxxd8:completei2e10:downloadedi0e10:incompletei4e
> 4:name12:xxxxxxxxxxxxee5:flagsd20:min_request_intervali3600eee
>
> How do I get only the xxxxxxxxxxxxxxxxxxxx and not the preceding
> identifier?
I can't give a definitive answer without knowing if there's either
always 20 "x"s, or if you just want the full text in between the
second and third ':'.
So, using regex:
1) assuming 20 characters, immediately following after the second ':'
^([^:]*:){2}(.{20}).*$
This will set your desired value in the backreference #2, so if you
were using perl (assuming your original content was in '$string')
$string =~ s/^([^:]+:){2}(.{20}).*$/\2/;
2) The full text between the second and third ':'
^([^:]*:){2}([^:]*):.*$
Again, this will put everything between the second and third ':' into
backreferece #2, to be used in the same fashion as the previous example.
One of the other responders mentioned using 'awk' via the command-line
to isolate the content between the second and third ':'. You could
use 'cut' to accomplish the same thing.
echo
"d5:filesd20:xxxxxxxxxxxxxxxxxxxxd8:completei2e10:downloadedi0e10:incompletei4e" | cut -d':'
-f3
This specifies that the field delimiter is ':' and that you want the
third field isolated.
I hope this was helpful,
-Danny Robert
daniel.robert at acm.org
P.S.: This is my first post to this user list having moved to boston
about a year ago. Just thought I'd say "hi".
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Discuss
mailing list