[TriLUG] Clusters, performance, etc...
mfreeze at gmail.com
Mon Nov 7 20:48:36 EST 2005
You guys are way ahead of me on some of the hardware questions... However,
to try and answer some of them:
I have a script that controls the following actions:
1. Runs a c++ program that I wrote that opens a text file (the 50 - 100 MB
file that I mentioned), reads each line sequentially and splits the data
into two output files after performing numerous tasks to the data. (e.g.
checking the validity of the zip code, making sure it matches the state,
calculating amounts due, etc...
2. Makes the second file into a dbase file
3. Runs another c++ program on the first file that examines each record in
the file and compares it to another database (using proprietary code
libraries supplied by our software vendor) that corrects any bad info in the
address, adds a zip+4, adds carrier route info, etc...
4. Looks for another text file to process
5. Appends all processed text files together
6. Appends all dbase files into one
As I said in my previous post, each 100MB text file takes about 1 hr to
run. Most of this time is spent on step 3.
So, would clustering speed up this sometimes 3 - 4 hr process?
More information about the TriLUG