[Bioclusters] [Fwd: Correction: [blast-announce] #032 - Deletion of BLAST FASTA
DB files]
Joseph Landman
bioclusters@bioinformatics.org
05 Mar 2003 09:54:28 -0500
Folks:
In case you haven't seen this, this represents a large change to the
way the databases will be distributed.
Punchline: No more FASTA db's directly available. Now you get
preformatted db's, and you need to use fastacmd to recover the FASTA
files.
Joe
-----Forwarded Message-----
From: Scott McGinnis <mcginnis@ncbi.nlm.nih.gov>
Subject: Correction: [blast-announce] #032 - Deletion of BLAST FASTA DB files
Date: 05 Mar 2003 08:47:12 -0500
[The title of this announce should have been #032]
[blast-announce] #032 - Deletion of BLAST FASTA DB files
The new BLAST version 2.2.2 introduced version 4 of the BLAST
databases with enhanced functionality. With this new version
of BLAST NCBI has begun creating pre-formatted databases.
These are located in the FTP directory
ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/. It is no
longer necessary to download the BLAST FASTA database files
and format them for Standalone BLAST.
Therefore, NCBI will begin phasing out the FASTA versions of the BLAST
database files. The BLAST FASTA Database files will be removed 60 days
from 03/05/03 (05/05/03). At that time the FTP directory
ftp://ftp.ncbi.nih.gov/blast/db/ will contain only the BLAST
pre-formatted database files and the subdirectory
/blast/db/FormattedDatabases/ will be removed.
The pre-formatted BLAST database may be incompatible with
third party programs which require FASTA sequences only.
However, FASTA sequences can be parsed from the pre-formatted
database files with the "fastacmd" program. This program
comes with the Standalone BLAST binaries
(ftp://ftp.ncbi.nih.gov/blast/executables/). For example:
fastacmd -d <database_name-o <output_file-D T -c T
Even though the NCBI FASTA database files will be phased out,
the "formatdb" program will still be able to convert custom
FASTA database files into BLAST formatted databases, as long
as the files follow the correct syntax (See:
ftp://ftp.ncbi.nih.gov/blast/db/README).
If you have any questions please contact blast-help@ncbi.nlm.nih.gov
PLEASE NOTE: Older versions of the BLAST executable may not
be compatible with this new database format. Please upgrade
your BLAST standalone executables before using the new
pre-formatted databases.
Notes on Version 4 of the BLAST databases
-----------------------------------------
Version 4 of the BLAST databases address some important
shortcomings of the current (version 4) databases:
1.) Version 3 does not handle ambiguity characters correctly
if a database sequence is longer than about 16 million bases
which may lead to incorrect results. The new version does.
2.) Version 3 only allows one volume of a BLAST database to
contain at most about 4 billion bases. The new databases
allows that to be much larger.
The new databases keep the sequence descriptors in a structured format
(ASN.1) and some new information has been put into those
fields. The new information is:
1.) taxid. This integer specifies the taxonomy of the
sequence and will allow greater flexibility in how taxonomic
information is presented in a future version of BLAST.
2.) link bits. These specify whether LinkOut information
about the database sequence is available and permits the
addition of a gif with a link to the relevant page. In a
future version of BLAST.
3.) membership bits. These specify that a given gi in a
database also belongs to a subset database. An example of
this relationship is the EST's database. Est contains all
EST's, but also comprises est_human, est_mouse and
est_others; with the new membership bit it will be possible
to search any of the subset est databases with only the main
est database and two other small files (an alias file and an
"oidlist"). This can reduce the amount of disk-space and
memory needed by half in this case.
------------- End Forwarded Message -------------
--
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman@scalableinformatics.com
web: http://scalableinformatics.com
phone: +1 734 612 4615