[Biodevelopers] Blast Database format specification
Dan Bolser
dan.bolser at gmail.com
Sat Mar 1 06:55:54 EST 2008
On 29/02/2008, Chris Dunnigan <dunnigan14 at gmail.com> wrote:
> Hello everyone,
>
> I was wondering if anyone here knew, or could point me to a location which
> describes the database format produced by formatdb. The only reference I
> seem to have found is
> http://blast.wustl.edu/blast/ncbi20ntfmt.html . While I have based some code
> on reading the binary blast databases using this peudo code
> provided, I feel it is not exactly correct. First I cannot see a way
> to determine the end of a sequence. The sequence file offsets provide
> the starting points of all the sequences,
> but there can be empty 2 bit pairs at the end, and how does one
> determine if they are adenine bases or just empty.
> The above link says that there should be a "magic" byte to signify the
> end of the sequence but have not found this to be the case.
>
> Anyway I was wondering
> if anyone could give a better description of the binary format used,
> or point me to a place that might be helpful.
You could try emailing the ncbi mailing list - unfortunately the
posts/replies to that list are not publicly archived (perhaps why you
couldn't find any information?).
I asked them to make some public mailing lists, but they refused.
info at ncbi.nlm.nih.gov
blast-help at ncbi.nlm.nih.gov
Other than that, can you read the source code?
Dan.
--
Chat to the experts irc://irc.freenode.net/#bioinformatics
More information about the Biodevelopers
mailing list