[TriLUG] Building a beowulf (or other) cluster

Justis Peters jtrilug at indythinker.com
Mon Mar 28 12:43:07 EDT 2011


On 03/28/2011 11:07 AM, Joseph Mack NA3T wrote:
> On Mon, 28 Mar 2011, Ron Kelley wrote:
>
>> I would like to install some sort of distributed process management 
>> tool so we can enable N-nodes to run the computations simultaneously.
>
> You only use a beowulf if the job cannot fit inside a single 
> machine/node. This usually means that the job needs more memory than a 
> single node holds. If this is your situation, you then recode the app 
> to use the nodes in parallel. This usually means using mpi or omp.
>
> If each job can be run in a single node, then you need a job farm 
> (lots of machines with a job dispatcher).
Ron,

I agree with Joe's take on your issue. You said, "Our processing happens 
in batch jobs and can easily be run on multiple servers at the same 
time." That sounds like an "embarrassingly parallel workload" 
(http://en.wikipedia.org/wiki/Embarrassingly_parallel), which is good news.

There are probably hundreds of solutions to your goal. Until we have 
more details, I'll begin by pointing you to Amazon's EC2. It provides 
simple tools to quickly scale up the size of your cluster. No need to 
buy hardware. You only pay for the time you use: http://aws.amazon.com/ec2/

When you say the project, "runs computational algorithms against some 
database data (sort of data mining)", it triggers a number of questions 
for me. What format is your data in? Is it already in a DBMS? How large 
is the data set? Can it be easily replicated between all the worker 
nodes? Do you need to update the data during calculations? Do other 
worker nodes need to also see those updates? Do you need features from 
the DBMS, such as indexes and aggregate functions, that would be a lot 
of work to replicate in external code? If so, how frequently do you need 
to use those features? Is your DBMS likely to become the bottleneck?

Best of luck with your project. Keep us posted.

Kind regards,
Justis



More information about the TriLUG mailing list