[Bioclusters] Newbies

13 May 2002 09:33:26 -0600

Hi, this is a bit of an unrelated question.

I am a cmpt sci major who developed a multi-threaded version of clustalw
1.82 - it runs on multi-cpu anything that runs a posix compliant unix.
It can be downloaded in source code from
http://bioinfo.pbi.nrc.ca/clustalw-smp/

It would not be difficult at all to make it cluster-based, a colleague
of mine did it within a matter of few days for play and testing (not in
releasable form though), however, to this day I havent got a clue as to
how widely used is clustalw among bio/bioinformatics community? Would it
be of _any_ interest to make it cluster based? If so, I just might
decide to spend that extra time and go that extra mile... :)

Thanks,
Ognen

On Mon, 2002-05-13 at 08:12, Eric Engelhard wrote:
> I have not yet seen the Wired article, but I think I was among those who
> plugged the bioclusters list. If this is the first of many general
> interest questions, then perhaps a short bioclusters FAQ and/or a
> "clustering BLAST at home mini-how-to" are in order. It is, however,
> important to point out that most hobbyist needs can be met by either
> using publicly available services or running free software on a single
> workstation (see http://bioinformatics.org/software/index.php3 and
> http://www.cvbig.org/tools/). Depending on your Linux skills and
> familiarity with bioinformatics, you may want to start with the O'Reilly
> book "Developing Bioinformatics Computer Skills" by Gibas and Jambeck.
> Clustering itself is a specialized subset of skills including hardware,
> system administration, programming and familiarity with biological
> goals.
> 
> I am a biologist by training and have relatively little experience with
> high performance computing as compared with others on this list. That
> said, I've built out one small cluster at work (currently 15 nodes) and
> three tiny clusters (4-8 nodes) at mine and other people's homes. These
> were all of the "embarrassingly parallel" variety for batching NCBI
> BLAST and/or InterPro. The home systems turned out to be decent hobbyist
> tools and are as simple as they come:
> 
> private 100 Mb network
> Master node: two NICs, NFS, DHCP, NCBI BLAST, Perl wrappers
> Slaves: open to rexec, NCBI BLAST, Perl wrappers
> 
> The reference databases are equally divided among the nodes, but the
> queries and results are stored on the master node (either flat or in a
> database). Each node runs and independent instance of BLAST against each
> query and parses locally to keep net traffic down (your needs may vary).
> 
> I like using PostgreSQL as a relational database management system, but
> many hobbyists will be satisfied with flat files. I've also used Apache
> with PHP and Samba to allow for browser access and Windows file system
> access, respectively, but prefer custom Perl scripts and the command
> line for batching and parsing my own projects.
> 
> --
> Eric Engelhard - www.cvbig.org - www.sagresdiscovery.com