[Bioclusters] Local blast server, beowulf vs mosix

Tue, 5 Mar 2002 00:22:09 -0500 (EST)

Mike Coleman wrote:
>
> I've been thinking about setting up a Beowulf and giving these issues a lot
> of thought.  The advantage of going "diskless" is primarily administrative,
> I think.  Basically, you need some way of keeping your clients in sync
> as you make changes over time.  There seem to be only two good ways to
> do this: share everything over NFS, or replicate changes.  SystemImager
> is an interesting take on the latter, and I like the core rsync idea,
> but it looks more complicated than I was hoping for, and so I'm leaning
> towards "diskless".
>
> (By "diskless", I mean not that the hosts have no disks, but that no (or
> almost no) state is stored there.  So, for example, the disks could be wiped
> and mounted on /tmp at each boot.)
>

Diskless in the true diskless sense is good for several reasons:

  - One less component that can fail and cause downtime
  - Less initial cost per system
  - You can setup one centralized image on your controlling node;
    keeping systems synced is as simple as keeping the image updated

There are of course situations where you might want this alternative
implementation of diskless, such as wanting to cache application data
locally, or to have scratch space for large data output.  But in either
form, "diskless" makes the management a bit more manageable.  But in
most cases, disks in each node is just wasted space, especially at
40+GB nowadays that come standard in these nodes.  I suppose you could
always buy machines with disks in them, and then take them out to be
placed in a SAN array that hangs off the controller node/does your
NFS.

Getting back to Danny's original inquiry (whether to use beowulf
or mosix), I think it's important to understand the differences between
the available clustering solutions and methodologies.  There are
numerous ways to implement beowulf style clusters, with MOSIX(1) as
one of them.  MOSIX falls into the Single System Image (SSI) model,
where processes start on a node (be it the controlling node or
a compute node) and are transparently migrated back and forth
between systems (this is done based on various metrics, such as load
average, manual intervention, etc.)  Using MOSIX depends on your
needs.  Is the code you're running parallel?  Threaded?  Serial?
In general, MOSIX can theoretically handle all these situations
pretty well, but if your needs fall into one specific category, there
might be more optimized solutions for you.

The Scyld Beowulf software(2) (from the guys that started this whole
beowulf trend) also falls into the SSI model of clustering.  The advantage
of using their software is that they tie in some of the other aspects that
most people need/want, such as batching and queuing, and provide a
web-based management interface for it all.

I've setup and managed several clusters over the last few years.
The most recent was a 32 node cluster I designed and implemented
for the Univ. of Chicago Geophysics department.  I chose Debian
as the Linux distribution, and used a package called Fully Automated
Installer (FAI)(3) to deal with building the nodes.  Essentially, one
can add a new, fully functioning node to the cluster inside of 10
minutes.  A new node will netboot, and the FAI scripts take care
of partitioning, installing a set of packages based on cfengine
classes (you can base classes on just about anything, and it includes
hooks for setting up diskless clients),  and copying over any custom
configs or other setup that is necessary.  This is similar to kickstart
under Red Hat, or Jumpstart under Solaris.  I don't think MOSIX provides
any means of building the nodes initially, but Scyld probably does (it
provides a system image, similar to the way SystemImager is setup I
think).  Note however that this only handles one aspect of setting up a
cluster--the initial software installation.  It does nothing for the
longer term management aspects, and you're left figuring out how to deal
with that on your own.

This also says nothing about the actual hardware to pick, and the type
of interconnect (10/100/1000 base-T, Myrinet, Scali, etc.).

I could go on and on about why I picked Debian, or why to
use one distribution over the other, or which software to pick and
why, but that would require writing a book.  There are already some
really good books out there on the subject.  The best advice I can give
to anyone who is thinking about setting up a cluster is to do a lot
of research and first gain and understanding of what's available to you.
Don't rush.  Define your present and potentially future needs, and pick
a solution that best solves your problems.  Clusters can't solve
everything though, so make sure it's a problem that warrants the time
and effort you'll spend in putting one together.  If you're not the
one doing the actual implemention, be nice to the sysadmin(s) that
are (buy them beer when they have it working). :)

(1) http://www.mosix.org/
(2) http://www.scyld.com/
(3) http://www.informatik.uni-koeln.de/fai/

Some other general links:

http://www.linuxnetworx.com/  (I'd recommended this solution for anyone
                               who's serious about a cluster.  These guys
                               are extremely professional)

http://foundries.sourceforge.net/clusters/   (lots of links & software)

-phillip

------------------------------------
Phillip Smith
UNIX Systems Administrator
Center for Genomics & Bioinformatics
Jordan Hall 153
Indiana University, Bloomington

 Phone: 812-856-5081
E-mail: psmith@bio.indiana.edu
------------------------------------