[Biodevelopers] Blast Database format specification

Dan Bolser dan.bolser at gmail.com
Sat Mar 1 06:55:54 EST 2008


On 29/02/2008, Chris Dunnigan <dunnigan14 at gmail.com> wrote:
> Hello everyone,
>
>  I was wondering if anyone here knew, or could point me to a location which
>  describes the database format produced by formatdb. The only reference I
>  seem to have found is
>  http://blast.wustl.edu/blast/ncbi20ntfmt.html . While I have based some code
>  on reading the binary blast databases using this peudo code
>  provided, I feel it is not exactly correct. First I cannot see a way
>  to determine the end of a sequence. The sequence file offsets provide
>  the starting points of all the sequences,
>  but there can be empty 2 bit pairs at the end, and how does one
>  determine if they are adenine bases or just empty.
>  The above link says that there should be a "magic" byte to signify the
>  end of the sequence but have not found this to be the case.
>
>  Anyway I was wondering
>  if anyone could give a better description of the binary format used,
>  or point me to a place that might be helpful.

You could try emailing the ncbi mailing list - unfortunately the
posts/replies to that list are not publicly archived (perhaps why you
couldn't find any information?).

I asked them to make some public mailing lists, but they refused.

info at ncbi.nlm.nih.gov
blast-help at ncbi.nlm.nih.gov


Other than that, can you read the source code?


Dan.


--
Chat to the experts irc://irc.freenode.net/#bioinformatics



More information about the Biodevelopers mailing list