[Bioclusters] [Fwd: Correction: [blast-announce] #032 - Deletion of BLAST FASTA DB files]

Joseph Landman bioclusters@bioinformatics.org
05 Mar 2003 09:54:28 -0500


Folks:

  In case you haven't seen this, this represents a large change to the
way the databases will be distributed.  

  Punchline:  No more FASTA db's directly available.  Now you get
preformatted db's, and you need to use fastacmd to recover the FASTA
files.

Joe

-----Forwarded Message-----

From: Scott McGinnis <mcginnis@ncbi.nlm.nih.gov>
Subject: Correction: [blast-announce] #032 - Deletion of BLAST FASTA DB files
Date: 05 Mar 2003 08:47:12 -0500

[The title of this announce should have been #032]

[blast-announce] #032 - Deletion of BLAST FASTA DB files

The new BLAST version 2.2.2 introduced version 4 of the BLAST 
databases with enhanced functionality. With this new version 
of BLAST NCBI has begun creating pre-formatted databases. 
These are located in the FTP directory 
ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/. It is no 
longer necessary to download the BLAST FASTA database files 
and format them for Standalone BLAST.

Therefore, NCBI will begin phasing out the FASTA versions of the BLAST
database files. The BLAST FASTA Database files will be removed 60 days
from 03/05/03 (05/05/03). At that time the FTP directory
ftp://ftp.ncbi.nih.gov/blast/db/ will contain only the BLAST
pre-formatted database files and the subdirectory
/blast/db/FormattedDatabases/ will be removed.

The pre-formatted BLAST database may be incompatible with 
third party programs which require FASTA sequences only. 
However, FASTA sequences can be parsed from the pre-formatted 
database files with the "fastacmd" program. This program 
comes with the Standalone BLAST binaries 
(ftp://ftp.ncbi.nih.gov/blast/executables/). For example:

	fastacmd -d <database_name-o <output_file-D T -c T

Even though the NCBI FASTA database files will be phased out, 
the "formatdb" program will still be able to convert custom 
FASTA database files into BLAST formatted databases, as long 
as the files follow the correct syntax (See: 
ftp://ftp.ncbi.nih.gov/blast/db/README).

If you have any questions please contact blast-help@ncbi.nlm.nih.gov

PLEASE NOTE: Older versions of the BLAST executable may not 
be compatible with this new database format.  Please upgrade 
your BLAST standalone executables before using the new 
pre-formatted databases.

Notes on Version 4 of the BLAST databases
-----------------------------------------

Version 4 of the BLAST databases address some important 
shortcomings of the current (version 4) databases:

1.) Version 3 does not handle ambiguity characters correctly 
if a database sequence is longer than about 16 million bases 
which may lead to incorrect results.  The new version does.

2.) Version 3 only allows one volume of a BLAST database to 
contain at most about 4 billion bases.  The new databases 
allows that to be much larger.

The new databases keep the sequence descriptors in a structured format
(ASN.1) and some new information has been put into those 
fields.  The new information is:

1.) taxid.  This integer specifies the taxonomy of the 
sequence and will allow greater flexibility in how taxonomic 
information is presented in a future version of BLAST.

2.) link bits.  These specify whether LinkOut information 
about the database sequence is available and permits the 
addition of a gif with a link to the relevant page.  In a 
future version of BLAST.

3.) membership bits.  These specify that a given gi in a 
database also belongs to a subset database.  An example of 
this relationship is the EST's database.  Est contains all 
EST's, but also comprises est_human, est_mouse and 
est_others; with the new membership bit it will be possible 
to search any of the subset est databases with only the main 
est database and two other small files (an alias file and an 
"oidlist").  This can reduce the amount of disk-space and 
memory needed by half in this case.


------------- End Forwarded Message -------------
-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman@scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615