[Dev] perl: how to parse quoted comma separated values?

Dana Smith dev@trilug.org
Mon, 14 Jan 2002 14:00:54 -0500


1) According to Perl Cookbook, there's Text::Parsewords.

2) On google I found Text::CSV

3) The book "Mastering Regular Expressions" contains this magic:

sub parse_csv {
   my $text = shift;      # record containing comma-separated values
   my @new  = ();

   # the first part groups the phrase inside the quotes.
   # see explanation of this pattern in MRE
   push(@new, $+) while $text =~ m{"([^\"\\]*(?:\\.[^\"\\]*)*)",? |  ([^,]+),? | ,}gx;
   push(@new, undef) if substr($text, -1,1) eq ',';
   return @new;      # list of values that were comma-separated
}

> Dana L. Smith
> Alternate Route Studios
> (919) 531-4116
> Dana.Smith@altroutestudios.com
> http://www.altroutestudios.com
> 


-----Original Message-----
From: Jeremy P [mailto:jeremyp@pobox.com]
Sent: Monday, January 14, 2002 1:09 PM
To: dev@trilug.org
Subject: [Dev] perl: how to parse quoted comma separated values?



Hi TriLug dev types,

Hope this question is appropriate for this forum.  I'm not a real
developer-type, but I use scripting to help various system admin tasks.  
For a particular task, I need to parse in records from a CSV
(comma-separated values) file using perl.  The data files have records
like this:

"King, Jr","Martin","Luther","etc"
"Washington","George",,"foo"

I'd like to each record into an array, getting rid of the double-quotes
and the ',' delimiters in the process.

Obviously a simple statement like this won't work:
	@fields = split(',');
because of the fields with commas in the data ("King, Jr")

I can't do things that match on '","' because that wouldn't apply to the
empty fields or the first and last fields.

Anyone done simple parsing like this before?  Is there something glaringly
obvious that I'm missing?  I poked around on CPAN and there are a bunch of
CSV routines, including even a DBD driver, but that seems way overkill.  
Maybe I should just delve into using one of those modules.

Thanks for any advice,

Jeremy Portzer
Durham Tech Community College

_______________________________________________
Dev mailing list
Dev@trilug.org
http://www.trilug.org/mailman/listinfo/dev