[Biodevelopers] Blast Database format specification

Chris Dunnigan dunnigan14 at gmail.com
Fri Feb 29 17:25:36 EST 2008


Hello everyone,

I was wondering if anyone here knew, or could point me to a location which
describes the database format produced by formatdb. The only reference I
seem to have found is
http://blast.wustl.edu/blast/ncbi20ntfmt.html . While I have based some code
on reading the binary blast databases using this peudo code
provided, I feel it is not exactly correct. First I cannot see a way
to determine the end of a sequence. The sequence file offsets provide
the starting points of all the sequences,
but there can be empty 2 bit pairs at the end, and how does one
determine if they are adenine bases or just empty.
The above link says that there should be a "magic" byte to signify the
end of the sequence but have not found this to be the case.

Anyway I was wondering
if anyone could give a better description of the binary format used,
or point me to a place that might be helpful.

Thank you!!

Chris


More information about the Biodevelopers mailing list