[TriLUG] MySQL - Perl

Mon Jul 9 17:55:31 EDT 2007

Hi All,
I have a question regarding an issue we are having with Perl & MySQL
and I was wondering if anyone wanted to take a stab at it.

We have a Perl script that parses text files we receive daily (around
1K to 25K records) into a MySQL database.

The queries we issue from Perl perform the following actions: (All
queries are limited to the records we just imported.)

* Delete duplicate records within the database that have the same
batch_id, product_code, and account_value, and retain only one record.

* Examine the remaining records for sets that have the same batch_id
and product_code, total the product_cost field for the set, place that
amount in the record with the numerically highest account_value, then
delete all of the other records.

* Export all of the remaining records to a text file.

* Mark the records as 'processed' within the database.

The problem is this:  With this exact scenario, we imported around
15,000 records today.  After deleting duplicates and consolidating
records, we exported around 5,000.  Perl & MySQL took around 2 hrs to
perform this task.

Now I could have this all wrong, however, I feel that this operation
should have taken far, FAR, less time.  And, to compound things, our
volume on this account is about to increase by around 150%.

The delete_duplicates routine takes over an hour to run, and the
summarize_accounts and export code takes over an hour.  The import is
very quick.

So, I was wondering if anyone would like to throw an idea out there as
to what they believe the most efficient (read 'speedy') method would
be to perform these tasks.

Best Regards,
Mark.