> Even then, after a good fit, there are still multiple factors that would > influence run time. The biggest factor would be the index database size > as compared to the available memory size. If you overflow local ram, > the mmap function will flush pages, and you will introduce disk I/O for > your indices, which could be a significant performance inhibitor, > depending upon how much I/O is needed. > > To alleviate this, use the "-v N" switch on formatdb, where N is a size > in MB. This fragments the database index into approximately N megabyte > size segments. This gives you a similar effect to the optimization > technique named "blocking". This issue is still a source of great confusion for me. I started a thread about this earlier on this list and have managed to confuse myself even more about it since. The BLAST manual says that databases can be loaded into memory but there does not seem to be a way of forcing this - it seems to be up to the OS to decide whether it loads the db into memory or not. On my machine here (Linux RH9) it does not seem to load the database into memory regardless of its size. I have tried the time command recently with my BLAST runs, which conveniently also records page faults, and I get the following output when I run a query against ecoli.nt (which is pathetically small, a few mb tops, and should easily fit into my 1gb memory): >/usr/bin/time -v -- blastall -p blastn -d ecoli.nt -i test.txt -o test.out Command being timed: "blastall -p blastn -d ecoli.nt -i test.txt -o test.out" User time (seconds): 0.01 System time (seconds): 0.02 Percent of CPU this job got: 8% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.34 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 792 Minor (reclaiming a frame) page faults: 621 Voluntary context switches: 0 Involuntary context switches: 0 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 To me the number of page faults suggests clearly that the db is not in memory. Does that mean I cannot ever get the db into memory and on Linux all BLAST searches will take a huge performance hit because of this? Where does that leave things like mpiBLAST which gets its performance increase from the db fitting into memory? Maybe someone can shed some light on this...... > See above. How large are your databases? I plan to run the queries against the standard nr and nt databases and perhaps whole chromosome dbs as well. nt is currently about 2.6 gb, nr about 600 mb. Micha -- -------------------------------------------------- Dr Micha M Bayer Grid Developer, BRIDGES Project National e-Science Centre, Glasgow Hub 246c Kelvin Building University of Glasgow Glasgow G12 8QQ Scotland, UK Email: michab@dcs.gla.ac.uk Project home page: http://www.brc.dcs.gla.ac.uk/projects/bridges/ Personal Homepage: http://www.brc.dcs.gla.ac.uk/~michab/ Tel.: +44 (0)141 330 2958