Folks: In case you haven't seen this, this represents a large change to the way the databases will be distributed. Punchline: No more FASTA db's directly available. Now you get preformatted db's, and you need to use fastacmd to recover the FASTA files. Joe -----Forwarded Message----- From: Scott McGinnis <mcginnis@ncbi.nlm.nih.gov> Subject: Correction: [blast-announce] #032 - Deletion of BLAST FASTA DB files Date: 05 Mar 2003 08:47:12 -0500 [The title of this announce should have been #032] [blast-announce] #032 - Deletion of BLAST FASTA DB files The new BLAST version 2.2.2 introduced version 4 of the BLAST databases with enhanced functionality. With this new version of BLAST NCBI has begun creating pre-formatted databases. These are located in the FTP directory ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/. It is no longer necessary to download the BLAST FASTA database files and format them for Standalone BLAST. Therefore, NCBI will begin phasing out the FASTA versions of the BLAST database files. The BLAST FASTA Database files will be removed 60 days from 03/05/03 (05/05/03). At that time the FTP directory ftp://ftp.ncbi.nih.gov/blast/db/ will contain only the BLAST pre-formatted database files and the subdirectory /blast/db/FormattedDatabases/ will be removed. The pre-formatted BLAST database may be incompatible with third party programs which require FASTA sequences only. However, FASTA sequences can be parsed from the pre-formatted database files with the "fastacmd" program. This program comes with the Standalone BLAST binaries (ftp://ftp.ncbi.nih.gov/blast/executables/). For example: fastacmd -d <database_name-o <output_file-D T -c T Even though the NCBI FASTA database files will be phased out, the "formatdb" program will still be able to convert custom FASTA database files into BLAST formatted databases, as long as the files follow the correct syntax (See: ftp://ftp.ncbi.nih.gov/blast/db/README). If you have any questions please contact blast-help@ncbi.nlm.nih.gov PLEASE NOTE: Older versions of the BLAST executable may not be compatible with this new database format. Please upgrade your BLAST standalone executables before using the new pre-formatted databases. Notes on Version 4 of the BLAST databases ----------------------------------------- Version 4 of the BLAST databases address some important shortcomings of the current (version 4) databases: 1.) Version 3 does not handle ambiguity characters correctly if a database sequence is longer than about 16 million bases which may lead to incorrect results. The new version does. 2.) Version 3 only allows one volume of a BLAST database to contain at most about 4 billion bases. The new databases allows that to be much larger. The new databases keep the sequence descriptors in a structured format (ASN.1) and some new information has been put into those fields. The new information is: 1.) taxid. This integer specifies the taxonomy of the sequence and will allow greater flexibility in how taxonomic information is presented in a future version of BLAST. 2.) link bits. These specify whether LinkOut information about the database sequence is available and permits the addition of a gif with a link to the relevant page. In a future version of BLAST. 3.) membership bits. These specify that a given gi in a database also belongs to a subset database. An example of this relationship is the EST's database. Est contains all EST's, but also comprises est_human, est_mouse and est_others; with the new membership bit it will be possible to search any of the subset est databases with only the main est database and two other small files (an alias file and an "oidlist"). This can reduce the amount of disk-space and memory needed by half in this case. ------------- End Forwarded Message ------------- -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman@scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615