[Bioclusters] Newbies

Mon, 13 May 2002 07:12:44 -0700

> xpscreens wrote:
> 
> Hi, I just read about this in Wired and I am quite interested.  Could
> you share any information to get a newbie started on setting up a
> cluster for playing around with this, or give me a link to something
> useful?  Thanks

I have not yet seen the Wired article, but I think I was among those who
plugged the bioclusters list. If this is the first of many general
interest questions, then perhaps a short bioclusters FAQ and/or a
"clustering BLAST at home mini-how-to" are in order. It is, however,
important to point out that most hobbyist needs can be met by either
using publicly available services or running free software on a single
workstation (see http://bioinformatics.org/software/index.php3 and
http://www.cvbig.org/tools/). Depending on your Linux skills and
familiarity with bioinformatics, you may want to start with the O'Reilly
book "Developing Bioinformatics Computer Skills" by Gibas and Jambeck.
Clustering itself is a specialized subset of skills including hardware,
system administration, programming and familiarity with biological
goals.

I am a biologist by training and have relatively little experience with
high performance computing as compared with others on this list. That
said, I've built out one small cluster at work (currently 15 nodes) and
three tiny clusters (4-8 nodes) at mine and other people's homes. These
were all of the "embarrassingly parallel" variety for batching NCBI
BLAST and/or InterPro. The home systems turned out to be decent hobbyist
tools and are as simple as they come:

private 100 Mb network
Master node: two NICs, NFS, DHCP, NCBI BLAST, Perl wrappers
Slaves: open to rexec, NCBI BLAST, Perl wrappers

The reference databases are equally divided among the nodes, but the
queries and results are stored on the master node (either flat or in a
database). Each node runs and independent instance of BLAST against each
query and parses locally to keep net traffic down (your needs may vary).

I like using PostgreSQL as a relational database management system, but
many hobbyists will be satisfied with flat files. I've also used Apache
with PHP and Samba to allow for browser access and Windows file system
access, respectively, but prefer custom Perl scripts and the command
line for batching and parsing my own projects.

--
Eric Engelhard - www.cvbig.org - www.sagresdiscovery.com