[Bio-Linux] condor cluster

Fri Aug 25 05:28:06 EDT 2006

On 8/24/06, Jose Andres <jaa53 at cornell.edu> wrote:
> Hi all,
>
> I'd like to use Biolinux to establish a local network of computers
> (most of them running Windows or Mac OS)  that will allow us to use
> some applications  (e.g. omegaMap, Paup) on a "grid mode" or as in a
> serial node cluster. If I understood some of the documentation I've
> been reading you can do it using the version of condor implemented in
> our bio-linux machines. Is that right?   Does anyone have experience
> on this type of networks?  Are they difficult to set up?  I am quite
> naive with this issues and I'd really  appreciate any comments/
> suggestions you might have.

Hi Jose,

You can certainly use your Bio-Linux workstation as the central
manager for a Condor pool - the necessary software is already
installed.  I run a (large) mixed Condor pool at Newcastle with a
Linux central manager.

You can send jobs to your Windows and Mac OS machines from the
Bio-Linux machine, this is not a problem, you can specify which
resources a job needs (OS, architechture, installed programs) using
Condor requirements expressions.  By and large the only jobs you are
going to send to the Windows machines are Java jobs, or Windows
executables.  Similarly the only things you are going to send to Linux
machines are Linux executables, Java jobs or shell/Perl scripts.  The
same goes for OS X - OS X executables, Java jobs or shell/Perl
scripts.

Unless you have Windows, UNIX and OS X versions of PAUP you're most
likely to be limited to running it on a single OS.  Most likely OS X
if the researchers here are anything to go by ;) Omegamap appears to
be Windows only anyway.   However you're best to sticking with
applications that can be scripted - or run from the command line.
Whilst it's possible to launch GUI applications under Condor, you're
effectively unable to interact with them.

Condor is relatively easy to set up, but can be occasionally
frustrating to troubleshoot due to the rather uninformative errors it
first seems to generate.  Once you have the central manager machine
running, it is very simple to set up clients to join the pool.  If you
have a lot of Windows machines managed by Active Directory, you'll
probably want an MSI build of Condor - your local IT people should be
able to help you with this.

Condor is backed by an excellent community mailing list
(https://lists.cs.wisc.edu/mailman/listinfo/condor-users)

Currently the OS X and Windows builds do not support 'checkpointing' -
allowing jobs to be stopped and restarted part of the way through the
computation, so this may impact on your turnover of jobs.

The best advice I can give is to configure the central manager on your
Bio-Linux machine (which shouldn't require much effort) and then get a
'sacrificial' Windows and Mac machine, and set them up as 'execute'
nodes.  For the purposes of testing, it's best that these machines are
infrequently used, otherwise you will be sat waiting for your
submitted jobs to complete until people stop using the machines. In
practice as your pool becomes larger this becomes less of a problem -
but you'll definitely want to do your initial testing on a small,
completely idle, pool of machines.

If I can help you any further, please get in touch, I spend a fair p

-- 
Senior Research Associate, Bioinformatics Support Unit,
Institute for Cell and Molecular Biosciences,
Faculty of Medical Sciences, Framlington Place,
University of Newcastle upon Tyne,
Newcastle, NE2 4HH
Tel: +44 (0)191 222 7253  (Leech offices: Rooms M.2046/M.2046A)
Tel: +44 (0)191  246 4833 (Devonshire offices: Rooms G.25/G.26)
Website: http://bioinf.ncl.ac.uk/support/