[Bioclusters] BLAST/ PBS / Grid Engine

Jeff Layton bioclusters@bioinformatics.org
Fri, 17 May 2002 20:58:41 -0400


Chris Dagdigian wrote:

> Ok,
>

(snip)

>
>
>
> <topic change>
>
> I'm not partial to the current blade servers because many of them (like
> the RLX system I played with 6 months ago) end up using cheezy 4200 RPM
> laptop disk drives as the main OS disk. Cheezy laptop drives are simply
> not fast enough to deal with the large number of IO-bound applications
> that bio people typically run. In particular I'd hate to run BLAST on
> one of them.

However, one of the good things RLX does is provide two cheezy
disk drives. You can run RAID-0 across them to gain back some
performance. We were doing some tests on a RLX chassis and
were going to test this. However, the box had to ship to it's true
customer so my play time was over.


Jeff


>
>
> I do know that James Cuff over at the Ensembl.org project (operaters of
> a badass genome annotation cluster) has been running a serious "blade
> server bake-off" and he has promised to reveal the results of his tests.
>
> James and the folks over at the Sanger Centre / Welcome Trust Campus
> have a _seriously hardcore_ IT infrastructure and I'm really interested
> to hear what they think of the blade tests that they have been running.
>
> -Chris
>
> Steve Pittard wrote:
> > First the question:
> >
> > Does someone know of certain combinations of load management software and
> > OS (e.g. PBS on Scyld or LSF on RedHAt) which have are particulalrly good
> > at helping one manage web based Blast submissions ?
> >
> > Now the context:
> >
> > I've been offerring a web based blast service
> > to my local user community. Its a small emulation of what one finds
> > at the NCBI Blast site. We currently have an NCBI-ish web front end
> > for some perl scripts which perform the blast and return the results. All
> > in all pretty usable stuff except that demand has driven up the load
> > averages on my server (a 2XCPU Dell poweredge w Red HAt 7.2). Several
> > searches of "nr" can slow things down quite rapidly.
> >
> > So I've begun experimenting with OpenPBS to smoothe the load
> > on the server and keep it running well. So far so good but since
> > I don't have a cluster cluster yet, I haven't experimented with passing
> > off jobs to other nodes.
> >
> > Knowing that Blast (as distributed by NCBI)
> > is not parallel I think that the best
> > I can do for the web based queries is to let PBS assign
> > the blast jobs to less busy PBS nodes to avoid the logjam.
> > I'm fairly certain that no load sofatware (PBS, Grid Engine,
> > LSF) can take Blast (or more generally any  non-parallel app)
> > and spread out its CPU needs amongst the cluster. Is this
> > assessment correct ?
> >
> > I realize that for batch blasting that many people "chop
> > up" the database over the nodes, formtdb the chunks, and
> > blast the queries against these chunks. Perl scripts
> > like disperse.pl also segment the larger Blast into more
> > manageable pieces. But this isn't scalable for Web queries
> > that might occur several times a minute. So In my situation
> > I have the Dbs (e.g. nr, swissprot, plant, etc ) "formatdbed"
> > on a server disk with the ultimate intention of having it
> > on cluster nodes perhaps with NFS over gigabit.
> >
> > RLX technologies sells an LSF based
> > "Blast server" which is aimed sqaurely at the "I want to blast
> > thousands of sequences  at once" batch blast market though
> > , again, what I'm doing is not really that since my blast requests
> > come in over the web on a frequent basis. But I've been working
> > with them a bit on my particular situation.
> >
> > Anyway I have been looking at other "proper" cluster systems
> > and have been wondering which setup would best benefit
> > the type of Blasting that I'm interested in. Strongly
> > related to this question is the type of load management
> > software to use and on what platform. I've been using PBS
> > on Red Hat and so far so good but have heard good things
> > about LSF and Grid Engine.
> >
>
> --
> Chris Dagdigian, <dag@sonsorol.org>
> Independent life science IT & research computing consulting
> Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
> Work: http://BioTeam.net PGP KeyID: 83D4310E  Yahoo IM: craffi
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> http://bioinformatics.org/mailman/listinfo/bioclusters