[Bioclusters] mpiformatdb problem

Susan Chacko bioclusters@bioinformatics.org
Thu, 4 Mar 2004 14:18:23 -0500

Has anyone successfully built the human genome db with mpiformatdb? Is  
there some special gotcha because there are very few, very large  
sequences (25 sequences in 3 Gb)?

I'm using mpiBLAST 1.2.1, with the latest NCBI Toolkit (4 Feb 2004).  
Other nucleotide dbs build ok with mpiformatdb, but when I try to build  
the genome in 25 pieces (for 25 sequences), I consistently don't get  
the 00 piece. i.e. the directory contains #.nsq, #.nin and #.nhr for  
every piece except 00, where I only see chr_all.fa.00.nin

I've tried:
- applying the patch (patch-NCBIToolbox_Nov14_2003), just in case,  
though the docs imply that it is only important for > 100 fragments, so  
I can't see why it would help in this situation. Only two hunks of the  
patch 'took'. Still didn't get the missing 00 files.
- using an older version of the NCBI Toolkit (Oct 2003).

mpiformatdb command:
mpiformatdb -f ~/mpiblast.conf -N 25 -i chr_all.fa -p F

The formatdb.log says:
Version 2.2.8 [Jan-05-2004]
Started database file "/fdb/genome/human-aug2003/chr_all.fa"
Closing volume /data/susanc/mpiblast//chr_all.fa with 0 sequences, 0  
letters(.nsq file = 6158034
0 bytes; .nhr file = 0 bytes)
FDBFinish: Empty nucleotide database...
Version 2.2.8 [Jan-05-2004]
Started database file "/fdb/genome/human-aug2003/chr_all.fa"
Closing volume /data/susanc/mpiblast//chr_all.fa.01 with 1 sequences,  
246,127,941 letters(.nsq f
ile = 122496350 bytes; .nhr file = 67 bytes)
Formatted 1 sequences in volume 1

We're new to mpiblast (testing it out by user request), so all  
suggestions appreciated.

Susan Chacko.
Susan Chacko
Helix Systems
12B/2N207                                                        Ph:  
National Institutes of Health                           Fax:  
Bethesda, MD 20814                                      Email: