[Bioclusters] BLAST job time estimates

Micha Bayer bioclusters@bioinformatics.org
08 Jun 2004 15:01:18 +0100


On Tue, 2004-06-08 at 14:17, Joe Landman wrote:
> On Tue, 2004-06-08 at 07:12, Micha Bayer wrote:
> 
> [...]
> 

> > 
> > To me the number of page faults suggests clearly that the db is not in
> > memory. Does that mean I cannot ever get the db into memory and on Linux
> > all BLAST searches will take a huge performance hit because of this?
> 
> No, this means that your job is to short for meaningful measurement. 
> Are you sure it is not failing?  0.34s total execution time makes me
> quite suspicious.
> 
> try
> 
> 	strace blastall -p blastn -d ecoli.nt -i test.txt -o test.out
> 
> and see if it generates an error.  Also, look in test.out to make sure
> it worked.
> 

No , the job works fine. The output is fine and strace does not generate
any errors.

I get lots of page faults when I run queries against nr and nt.


> > > See above.  How large are your databases?
> > 
> > I plan to run the queries against the standard nr and nt databases and
> > perhaps whole chromosome dbs as well. nt is currently about 2.6 gb, nr
> > about 600 mb.
> 

What is it you count for the database size? Do you count index sizes?

The nr database from ftp.ncbi.nlm.nih.gov/blast/db/ is currently 588348k
compressed (2nd June version), this uncompresses into a 1.4 gb tar file
which untars into 7 index files of about 1.6 gb altogether.


> nr last I downloaded it on May 20th, is 906.8 x 10**6 bytes (~907 MB). 
> When you uncompress nt, it is much larger.  If you have 1 GB ram, you
> want to target about 1/3 to 1/2 GB for the index size.  For nr and nt,
> try using 
> 
> 	-v 300 
> 
> on the formatdb command line.  Should give you 3 nr segments, and many
> nt segments.

I must try this. How do you refer to the segments when you call BLAST?
Does the syntax change or do you simply do separate runs against each
segment in turn?

cheers
Micha



-- 
--------------------------------------------------
Dr Micha M Bayer
Grid Developer, BRIDGES Project
National e-Science Centre, Glasgow Hub
246c Kelvin Building
University of Glasgow
Glasgow G12 8QQ
Scotland, UK
Email: michab@dcs.gla.ac.uk
Project home page: http://www.brc.dcs.gla.ac.uk/projects/bridges/
Personal Homepage: http://www.brc.dcs.gla.ac.uk/~michab/
Tel.: +44 (0)141 330 2958