[Bioclusters] [long-ish] Advice on getting started with clustering, LSF, Xserve?

Simon Twigger bioclusters@bioinformatics.org
Wed, 6 Nov 2002 12:21:03 -0600

Hi there,

I stumbled across this mailing list today searching through some  
bioperl archives and I'm hoping that someone out there can point me in  
the right direction to get myself up to speed on bioclusters, both on  
the hardware and software side. Im more from the bio-side of  
bioinformatics and Im trying to understand more of the nitty gritty  
informatics/computer part!

We've been writing bioinformatics software in perl/java for a while and  
we've got Oracle and MySQL databases and we run all the usual  
genome/sequence analysis packages (blast, blat, etc) plus some of our  
own annotation pipelines. Historically we've been running these on  
multiple machines but not really in a cluster with robust load  
management software or any significant modifications to how we write  
our code to enable it to scale in a multiprocessor environment. I'm  
trying to find out better ways to use our Sun, Compaq and (probably)  
MacOS machines, how to get them all working together to handle both  
genomic and proteomic analyses and how to modify our existing and new  
code to work in this environment.

I'd love to find some sort of 'bioclustering for dummies' that outlines  
the usual solutions and approaches, also on the software side something  
that describes the fundamentals of writing perl and java to exploit  
clusters and even some simple examples/test packages that I could play  
with to get my feet wet.

A few specific things that Im thinking about, perhaps people can  
comment on my rationale
We have a variety of platforms and it would be great to make them all  
play together - LSF appears to be a good solution to handle load  
balancing on a heterogeneous set of servers (we have Sun, Compaq and  
will probably add Xserves into the mix), from my reading the downside  
is the price ($400 per server was a price I saw quoted on the list).  
ease of administration seems to be another pro for LSF which is a big  
thing as we just want it to work, we dont really want to babysit this  
stuff - what sort of sysadmin commitment is needed to make this work?

Im personally interested in trying the Xserve, the storage capacity,  
speed, price, etc. all make it attractive as an alternative to our  
traditional options. Oracle is coming out for OS X (and the developer  
release is running on my Powerbook as we speak) so that's another good  
thing. Im doing all my development on a G4 with 10.2 and its great, any  
thoughts/experiences with using Xserve in the mix with other platforms  
and Xserve vs intel solutions?

Many thanks for any help anyone can give a newbie in the field!


Simon Twigger, Ph.D.
Assistant Professor, Bioinformatics Research Center

Medical College of Wisconsin
8701 Watertown Plank Road,
Milwaukee, WI, 53226
tel. 414-456-8802, fax 414-456-6595