[TriLUG] Clusters, performance, etc...
Michael Alan Dorman
mdorman at debian.org
Mon Nov 7 15:25:40 EST 2005
Mark Freeze <mfreeze at gmail.com> writes:
> I have a friend who runs a business like mine and we have the same
> basic setup. We normally receive files from customers that may be 50
> to 100 MB. We run programs on these files that parse text, create
> databases, purge records, and so on. Normal database
> stuff. Converting and parsing records with the software that I have
> written usually runs for about 1 hour on the larger files and we may
> have 2 or 3 of these files each time a customer trasmits data to us.
You haven't given enough information to even make a good guess. To
make a good assessment, you would need to know if:
1. Are there dependencies between those files---that is, must you
process A.txt before B.txt before C.txt?
2. Is there some shared resource that would be required by all
systems doing processing---that is, would all the data from all
three have to be stored in a single database, or is the data for
each totally independent?
> My friend says that he is considering clustering Linux boxes
> together to improve the speed of the processing and he figures that
> he can cut processing time in half. Now I may be in for a public
> spanking, but I did not think that clustering would have that much
> of an effect on this type of operation.
It could. It very much depends on the nature of the job(s).
If your jobs are loosely coupled---that is, they don't have
dependencies and they don't make demands on the same resources at the
same time---then throwing more machines at the process could scale
Now how well it might scale is going to depend on what your current
bottleneck is, etc.
> Also, he is not talking about clustering new, workhorse p4
> machines... He is talking about clustering up about 4 or 5 p3 & p4
> machines that he has as spares. From the things that I have read
> (including the link that someone posted the other day) I think that
> he has a misconception of clustering.
> Am I way off base? Will clustering have this dramatic of an effect?
Without more information, it's impossible to say.
The piano is firewood, Times Square is a dream -- Tom Waits
More information about the TriLUG