[TriLUG] Awk question

James Olin Oden james.oden at gmail.com
Tue Aug 7 16:02:03 EDT 2007


>
> Of course, this just begs a competition^H^H^H^Hcomparison
> between awk versions and perl versions (and hey, why not
> other versions too!).  It would be very cool to actually get some
> hard numbers on a problem like this that would answer
> the question of which is faster: awk or perl.
#!/usr/bin/perl
my $f = shift;
my %keyMap;
my @keyList;

open(FILE, $f) || die "Could not open file: $!";
while($line = <FILE>) {
        chomp($line);
        my @fields = split(/\s*-\s*/, $line);
        my $key = $fields[0];
        if(!$keyMap{$key}) {
                $keyMap{$key} = {
                        'key'   => $key,
                        'a'     => $fields[1],
                        'b'     => $fields[2],
                        'c'     => $fields[3],
                        'total' => $fields[4],
                };
                push @keyList, ($keyMap{$key});
        } else {
                $keyMap{$key}->{'total'} += $fields[4];
        }
}

foreach my $key (@keyList) {
        printf("%s - %d - %d - %d - %0.2f\n",
                $key->{'key'},
                $key->{'a'},
                $key->{'b'},
                $key->{'c'},
                $key->{'total'}
        );
}

###################

Hacked to togethor, no comments, but it works.  My feelings are that
as it the data files get larger and when there are more duplicate keys
(such that you would have multiple records to total) it will become
more efficient, due to the use of the hash to retrieve the record for
adding to the total (hashes due some sort of btree lookup on their
keys).

An awk script would need to have its data presorted by key, which
involves another process dragging down the efficiency.  If you didn't
do this, without a hash/associative array data type, you would have to
search through the list until you found the right key to add the total
too.  But I am not an awk expert ("awk" is for "awkward" (-;).

I'm sure there are simpler more efficient perl scripts too that could be made.

On the point of efficiency though, one thing I'm not doing is
validating the data.  It would be easy to add this, and I think to
some degree that is just as important as effiency, at least in some
cases (but maybe not yours).

Cheers...james
>
> Cheers,
> Tanner
>
> --
> Tanner Lovelace
> clubjuggler at gmail dot com
> http://wtl.wayfarer.org/
> (fieldless) In fess two roundels in pale, a billet fesswise and an
> increscent, all sable.
> --
> TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ  : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
>



More information about the TriLUG mailing list