[Bioclusters] Clusters for bioinformatics... Some numbers or statistics?

jfreeman jfreeman@variagenics.com
Thu, 30 Aug 2001 14:00:20 -0400

Hi Ivo,

Ivo Grosse wrote:
> Hi Jim,
> > Possibly killed, (possibly run out of /tmp space or disk space in
> > general, benign DOS attack, etc.), and it depends on what kind of risk
> > you are willing to take with your clients data while the batch process
> > is sharing the machine with them and vice versa.
> Which risk?

See above for some of the examples, all the problems of someone else's
code running on your machine...
> > way, who in your group is willing to have a batch process run the
> > background on their systems idle cycles while they are working on their
> > own projects?
> Most people are willing.  I mean, "working on our projects" has two
> components:
> 1. type code, papers, emails, ...; compile code; test-run small code;
> simple data analysis or visualization (more, less, gnuplot, xmgrace,
> ...); netscape.
> 2. run (longer) code.
> Usually, we run type-1 jobs on our local machines and submit type-2
> jobs to the queue.  While reading your (and typing this) email, there
> is a job running in the background on my machine.
> Do yu think that is not safe?  What could be the problems?

Does the dean of your department, PI, Provost, or their administrators
run these background jobs?

In a corporate environment you have the money to purchase a cluster and
users who need their computers to be secure and predictable during day
to day operation.  A cluster is a small amount of money compared to,
potentially, screwing up the operation of your business.  With all that
said a system like Condor, thank you Jeremy, may address most of these
technical issues, and could be a good approach.  The accounting
department probably won't want jobs running on their systems regardless,
and rightly so.  

If you have two classes of users who do and do not mind having this done
to their systems, great.  If you have a computer lab for students which
has bursty small cpu footprint usage (type 1), this would be a good
place for your system.  I would love to know how it splits up in the
academic universe.
> > If you have racks of unused old Sun's and SGI's and want
> > to make a new cluster without clients I think this is a different
> > problem.
> Well, we also have those piles of old machines, but my question is the
> opposite.  We have dozens of above-750 MHz P3s that are "abused" as
> typewriters for code, papers, emails, ..., and one option would be to
> upgrade their RAM and use them as a small network of workstations.
> An alternative opion is to build a small, say, 16-node beowulf cluster
> (with one master and 15 slave nodes and a simple 10/100 ethernet
> connection and a 16-port switch) with Mosix (or whatever) installed.
> My question is: what would be the advantage of having the beowulf
> cluster?
> Ivo

In the universe you describe where your background jobs behave well
consistently and you have a group of users who don't mind / don't know
that this is happening and you already have the computers it is worth
the try, and Condor sounds like a good approach.  The cost of the new
computers (easy to define) vs. risk to your clients data, processes, and
systems (harder to define) has to be balanced.

Good luck,


> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters