[BiO BB] Re: [ssml] Parsing taxonomy from blast output

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Fri Apr 1 12:40:15 EST 2005


On Fri, 1 Apr 2005, Goel, Manisha wrote:

>Hi All,
>
>I need to parse the blast ouput to get the taxonomy information. 
>If I could get the taxonomy nodes associted with each gi number .. This
>would also work.

Yeah, this data is here...

ftp://ftp.ncbi.nih.gov/pub/taxonomy/

See...

ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid.readme

"The gi_taxid_prot.dmp is about 17 MB and contains two columns:  the
protein's gi  and taxid."

You can then use the 'taxdump' to get the names.dmp (for the names) and
nodes.dmp (for the structure of the taxonomic tree) files (if you need
them).

See...

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump_readme.txt

All the best,
Dan.


>I have been trying SEALS taxonomy commands but somehow quite a few
>sequences turn up "not_retrieved", although we have tried updating the
>database etc.
>I do not want to use the BLAST web server because I have too many files
>to run.
>Please suggest any program/script that might be useful.
>
>Thanks,
>-Manisha
>




More information about the BBB mailing list