[Bioclusters] Newbie question: simple low-admin non-threaded Debian-based cluster solution?

Thu Jan 20 18:47:07 EST 2005

Hello

If anyone can review the below and suggest a way to go, or even better
something I have gotten completely wrong, it would be much appreciated!

Thanks
John

Hardware:

Ten HP Proliant nodes, one DL380 and nine DL140.  Each node has two
3.2Ghz Xeon processors.  They do not have a dedicated switch; the
infrastructure folks say they want to implement this using a VLAN.  We
have some performance concerns here but have agreed to give it a try.

User characteristics:

The users are biostatisticians who typically program in R; they often
use plug-in R modules like bioconductor.  They always want the newest
version of R right away.  Also they may also write programs in C or
Fortran.  Data files are usually small.  Nothing fancy like BLAST, etc.

User concerns:

Users require a Linux clustering environment which enables them to
interact with the cluster as though it were a single system (via ssh or
X) but which will distribute compute-intensive jobs across nodes.  As
the code is by and large not multithreaded, it is expected that each job
will be farmed out to an idle compute node and probably stay there until
it is done.   That's fine.  In other words, to use all twenty CPUs we
will need twenty concurrent jobs.

Administration concerns:

The cluster must require the absolute minimum of configuration and
maintenance, because I've got to do it and I'm hardly ever around these
days.

Other concerns:

Users and administrators alike have a preference for Debian Linux over
other distributions.  Users also have an aversion to non-free software.
Either or both of these considerations could be overridden if the
reasons were pressing.

Cluster software requirements:

(1)    The cluster must have a mean of deploying Linux to the nodes and
keeping their configurations (including updates to the operating system
and applications, lists of users, printers, etc.) in synchronization.
(2)    The cluster must have a means of transparently distributing jobs
to idle CPUs.  It's not necessarily to actively rebalance this when a
job has started - it's okay if, once tied to a node, it stays there.

Potential solutions:

We like the look of NPACI Rocks but its non-Debian-ness makes it a last
resort only.  What we would really like to try is a Debian version of
NPACI Rocks; in its absence we will probably have to use two separate
packages to fulfil the requirements of #1 and #2 above.

Sensible options for #1 seem to be:
(1)       SystemImager (www.systemimager.org
<http://www.systemimager.org/> )
(2)       FAI (http://www.informatik.uni-koeln.de/fai/), maybe also
involving the use of cfengine2 (http://www.iu.hio.no/cfengine/)

SystemImager is the better-established product and looks to be simpler
to set up than FAI and/or cfengine2, in both of which the learning curve
looks steep.  However, FAI seems more elegant and more like the idea of
"NPACI Rocks Debian" that we're looking for, implying that once set up
FAI/cfengine2 will require less ongoing maintenance.

Sensible options for #2 seem to be:

(1)    OpenMosix
(2)    OpenPBS
(3)    Sun GridEngine N1

Note: all of the above have commercial versions; we'd be reluctant to
consider them unless it means big savings in administration time and
effort.  We get the impression OpenMosix (and, to a lesser extent,
OpenPBS) have question marks over how much time and resources the people
maintaining these products have, suggesting bugs, instability and not
keeping up with kernel/library updates, etc.  Sun GridEngine seems more
robust but does not seem to have a big Debian user base.

What do you all should we try first?

Thanks!
John

John Speakman

Manager, Clinical Research Systems

Memorial Sloan-Kettering Cancer Center

307 East 63rd Street, New York NY 10021 USA

+1 646 735 8187 - SpeakmaJ at mskcc.org

     =====================================================================

     Please note that this e-mail and any files transmitted with it may be 
     privileged, confidential, and protected from disclosure under 
     applicable law. If the reader of this message is not the intended 
     recipient, or an employee or agent responsible for delivering this 
     message to the intended recipient, you are hereby notified that any 
     reading, dissemination, distribution, copying, or other use of this 
     communication or any of its attachments is strictly prohibited.  If 
     you have received this communication in error, please notify the 
     sender immediately by replying to this message and deleting this 
     message, any attachments, and all copies and backups from your 
     computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://bioinformatics.org/pipermail/bioclusters/attachments/20050120/e28a6fd2/attachment-0001.htm