Folks: FASTA will continued to be offered, but it will be moved. You will likely have to adjust your download scripts. Joe -- Joseph Landman, Ph.D Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://scalableinformatics.com phone: +1 734 612 4615 -----Forwarded Message----- From: Scott McGinnis <mcginnis@ncbi.nlm.nih.gov> To: blast-announce@ncbi.nlm.nih.gov Subject: [blast-announce] [blast-announce #033] Relocation of BLAST database files on FTP server Date: 26 Mar 2003 13:17:36 -0500 Moving of BLAST FASTA Database files. Based upon input from the user community we will continue to offer FASTA files. However, we will be reorganizing our FTP site in order to allow easier access to the preformatted BLAST databases that users of NCBI BLAST should be using. For users of standalone BLAST the NCBI offers preformatted BLAST databases already for downloading, so that there is no need to download FASTA files (from ftp://ftp.ncbi.nih.gov/blast/db/) and run formatdb on them. This offers several advantages to users who mostly need these files to produce BLAST databases: 1.) no need to have disk space for both FASTA files and BLAST databases at the same time. 2.) no need to use CPU cycles to uncompress the FASTA files and run formatdb on them. 3.) the original FASTA file, individual sequences, or even parts of individual sequences within the FASTA file can be recovered using the utility fastacmd that is packaged with the NCBI BLAST executable archives (see below for details). 4.) somewhat smaller bandwidth on the FTP downloads, allowing them to take place faster. 5.) taxonomic and related source information (for individual entries in the database) is implanted in the BLAST databases (this is not available in the FASTA files). Some of this information may be useful for formatting, some can be recovered by fastacmd (see below). As most users need only the BLAST databases they will be moved up one level, from their current location of ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/, and the FASTA files will be moved down a level to ftp://ftp.ncbi.nih.gov/blast/db/FASTA. The new FASTA directory, containing the files, will appear by March 31, 2003. The FASTA files will be removed from the "db" directory on April 8, 2003. At that point BLAST databases will start appearing the the "db" directory. Note that the procedure at the NCBI is to produce the BLAST databases directly from our relational databases, then produce the FASTA files from the BLAST databases (using fastacmd). This means that the FASTA files could be available on our FTP site up to three hours after the BLAST databases have appeared. Notes on fastacmd: ------------------ 1.) fastacmd can print a summary of database statistics: ncbifastacmd -d nt -I Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0,1 or 2 HTGS sequences) 1,655,079 sequences; 7,754,000,938 total letters File name: /usr/ncbi/db/blast/nt Date: Jan 14, 2003 2:55 AM Version: 4 Longest sequence: 27,890,790 bp 2.) fastacmd can dump a FASTA file from a blast database using the -D option: ncbifastacmd -d nt -D nt.fsa 3.) fastacmd can dump out only part of a sequence (handy for very long sequences): ncbifastacmd -d nt -s 555 -L0,32 gi|555:1-32 B.taurus microsatellite DNA (624bp) ACCTCCACTAGCTTTGTTTGTAGTGATGCTCT 4.) fastacmd can print taxonomic information for a given sequence if that BLAST database came from ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/ (this information is not in a FASTA file so formatdb cannot add this). ncbifastacmd -d nt -s 555 -T NCBI sequence id: gi|555|emb|X65215.1|BTMISATN NCBI taxonomy id: 9913 Common name: cow Scientific name: Bos taurus ------------- End Forwarded Message -------------