[TriLUG] [hopefully] quickie help with a regex

Kevin Hunter hunteke at earlham.edu
Thu Aug 12 00:05:20 EDT 2010


At 4:58pm -0500 Tue, 10 Aug 2010, wejii wrote:
> This seems to work for ascii letters, digits, and whitespace -
>
>     preg_match_all('/\'[a-zA-Z0-9\s]*\'/', $input, $result);
>
> Other characters like&, $, @, ... may be included if you want.
> I am uncomfortable using the dot "." feature for several reasons. Here is one:
>
>  From O'Reilly's "Regular Expression Cookbook"
>
> "Dot abuse
> The dot is the most abused regular expression feature. ‹\d\d. \d\d. \d\d› is not a good
> way to match a date. It does match 05/16/08 just fine, but it also matches 99/99/99.
> Worse, it matches 12345678."

You've just said this in so many words, but I'd like to expand on what 
the Regex Cookbook has said.  "Dot abuse" is just that, abuse.  However, 
it's not abuse when one needs to use it.  The real culprit is folks not 
taking an extra 5 minutes to think about what they're matching, and 
exactly characterizing the problem.  (As usual, PEBKAC, right?  Lazy 
programmers.)

When I want to match /whatever/ is between two quotes, then this *is* 
what I want:

/"(.*?)"/s

But if I only want a subset of everything, then I need to specify that:

/"([\w\s])*?"/           # easy method
/"([A-Za-z0-9\s]*?)"/    # equivalent

The example the Regex Cookbook gives is correct because it's talking 
about mis-use of the dot operator, *not* that the dot is a bad or 
ill-thought-out regex operator.

Kevin




More information about the TriLUG mailing list