[Bioclusters] Newbie question: simple low-admin non-threaded
Debian-based cluster solution?
Tim Cutts
tjrc at sanger.ac.uk
Fri Jan 21 04:57:16 EST 2005
On 20 Jan 2005, at 11:47 pm, Speakman, John
H./Epidemiology-Biostatistics wrote:
> Ten HP Proliant nodes, one DL380 and nine DL140. Each node has two
> 3.2Ghz Xeon processors. They do not have a dedicated switch; the
> infrastructure folks say they want to implement this using a VLAN. We
> have some performance concerns here but have agreed to give it a try.
You might well end up with performance problems unless the nodes have
plenty of local storage so that the jobs don't have to hammer NFS.
VLANs are great, but just wait until the infrastructure people start
whining that the backups are failing because the switch is full of your
cluster traffic. :-)
>
> User characteristics:
>
> The users are biostatisticians who typically program in R; they often
> use plug-in R modules like bioconductor. They always want the newest
> version of R right away. Also they may also write programs in C or
> Fortran. Data files are usually small. Nothing fancy like BLAST,
> etc.
I can see why that gives you a leaning towards Debian.
> User concerns:
>
> Users require a Linux clustering environment which enables them to
> interact with the cluster as though it were a single system (via ssh
> or X) but which will distribute compute-intensive jobs across nodes.
> As the code is by and large not multithreaded, it is expected that
> each job will be farmed out to an idle compute node and probably stay
> there until it is done. That’s fine. In other words, to use all
> twenty CPUs we will need twenty concurrent jobs.
Getting R to work seamlessly on a cluster is not trivial, unless the
users are using R in batch mode. Getting R to work interactively and
still have the graphical parts of it work correctly is less easy. X
authentication starts to be a problem. It's unfortunate that R doesn't
have some sort of client-server architecture which would allow the hard
work to occur on a cluster node, and the interactive stuff be handled
locally by a client.
> Administration concerns:
>
> The cluster must require the absolute minimum of configuration and
> maintenance, because I’ve got to do it and I’m hardly ever around
> these days.
cfengine is good for this, as you've already suggested. We use
cfengine for managing configuration information on our Linux desktops,
although not on our cluster nodes.
> Other concerns:
>
> Users and administrators alike have a preference for Debian Linux over
> other distributions.
How nice to have such clued-up users. I envy you.
> Potential solutions:
>
> We like the look of NPACI Rocks but its non-Debian-ness makes it a
> last resort only. What we would really like to try is a Debian
> version of NPACI Rocks; in its absence we will probably have to use
> two separate packages to fulfil the requirements of #1 and #2 above.
>
> Sensible options for #1 seem to be:
> (1) SystemImager (www.systemimager.org)
> (2) FAI (http://www.informatik.uni-koeln.de/fai/), maybe also
> involving the use of cfengine2 (http://www.iu.hio.no/cfengine/)
>
> SystemImager is the better-established product and looks to be simpler
> to set up than FAI and/or cfengine2, in both of which the learning
> curve looks steep. However, FAI seems more elegant and more like the
> idea of “NPACI Rocks Debian” that we’re looking for, implying that
> once set up FAI/cfengine2 will require less ongoing maintenance.
We use cfengine to configure both Red Hat and Debian machines here. We
use FAI for Debian desktop installs (we're in the process of converting
all desktop Linux boxes from Red Hat 9 to Debian). Yes, it does have
some difficulties to start with, but it's *really* fast. A node about
the same speed as yours goes from bare tin to installed and running in
about 2 minutes.
We have a few Debian developers on site (I'm one of them), and one
approach that really helps here and is easy to set up is that we now
have a local APT repository for packages created locally, including
in-house packaging of proprietary or in-house software to make it
easier to splat out across hundreds of machines. FAI can be trivially
pointed at such a repository as well as our main Debian mirror.
>
> Sensible options for #2 seem to be:
>
> (1) OpenMosix
> (2) OpenPBS
> (3) Sun GridEngine N1
>
> Note: all of the above have commercial versions; we’d be reluctant to
> consider them unless it means big savings in administration time and
> effort. We get the impression OpenMosix (and, to a lesser extent,
> OpenPBS) have question marks over how much time and resources the
> people maintaining these products have, suggesting bugs, instability
> and not keeping up with kernel/library updates, etc. Sun GridEngine
> seems more robust but does not seem to have a big Debian user base.
GridEngine is pretty good. I think the lack of Debian support is
probably a licence issue - I'm not sure SGE is DFSG-compliant. As
others have said, MOSIX or OpenSSI will probably be easier for your
users to cope with, especially if they want to run R interactively. At
the end of the day, cluster users can't remain ignorant of the
implementation of the systems they are using; if they remain ignorant,
they will do things which (a) run very slowly and/or (b) bring the
cluster or the network down. Accidental NFS abuse being the most
common way they do this, in my experience.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the Bioclusters
mailing list