[Bioclusters] Average Pairwise Scores for reciprocal Blast hits

Tim Cutts tjrc at sanger.ac.uk
Thu Jun 2 09:45:07 EDT 2005


On 2 Jun 2005, at 1:58 pm, Jason Stajich wrote:

> To avoid mapping everything in memory try using DB_File to tie the  
> hash to a file - you still use the hash as normal but all the fetch/ 
> store calls are done through BerkeleyDB (or equivalent) on the  
> flatfile. But you'll have to adjust your strategy of Hashes-of- 
> Hashes since you can only do key-value storage and not HoH.

You can if you tie the hash-of-hashes to MLDBM.  I use this on our  
cluster to cache version information on each node in the cluster;  
there's a central MySQL database which keeps track of which nodes  
have what version of which blastable database.  To have all 1000+  
farm nodes hammering that to death all the time was clearly silly, so  
the users have a perl module which keeps a cached copy of the data  
from the MySQL database in a gdbm file, using MLDBM to store a small  
data structure for each file.  If the blastable file has been  
modified more recently than the gdbm file, the module contacts the  
MySQL database for updated version information, and stores it.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233



More information about the Bioclusters mailing list