[TriLUG] [hopefully] quickie help with a regex
Kevin Hunter
hunteke at earlham.edu
Thu Aug 12 00:05:20 EDT 2010
At 4:58pm -0500 Tue, 10 Aug 2010, wejii wrote:
> This seems to work for ascii letters, digits, and whitespace -
>
> preg_match_all('/\'[a-zA-Z0-9\s]*\'/', $input, $result);
>
> Other characters like&, $, @, ... may be included if you want.
> I am uncomfortable using the dot "." feature for several reasons. Here is one:
>
> From O'Reilly's "Regular Expression Cookbook"
>
> "Dot abuse
> The dot is the most abused regular expression feature. ‹\d\d. \d\d. \d\d› is not a good
> way to match a date. It does match 05/16/08 just fine, but it also matches 99/99/99.
> Worse, it matches 12345678."
You've just said this in so many words, but I'd like to expand on what
the Regex Cookbook has said. "Dot abuse" is just that, abuse. However,
it's not abuse when one needs to use it. The real culprit is folks not
taking an extra 5 minutes to think about what they're matching, and
exactly characterizing the problem. (As usual, PEBKAC, right? Lazy
programmers.)
When I want to match /whatever/ is between two quotes, then this *is*
what I want:
/"(.*?)"/s
But if I only want a subset of everything, then I need to specify that:
/"([\w\s])*?"/ # easy method
/"([A-Za-z0-9\s]*?)"/ # equivalent
The example the Regex Cookbook gives is correct because it's talking
about mis-use of the dot operator, *not* that the dot is a bad or
ill-thought-out regex operator.
Kevin
More information about the TriLUG
mailing list