[BiO BB] Using blast to compare genomes: a warning

Eitan Rubin lsrubin at wicc.weizmann.ac.il
Sun Sep 7 20:19:57 EDT 2003


Hi,

  I used BLAST to compare genomes some time ago, and got lots of short, poor, 
statistiaclly significant similarities. I than met Altchul in a conferance, 
and asked him how to tell real similarities from chance ones. He said: "don't 
use BLAST for DNA. We did a poor job on the statistics for DNA - who thought 
people will actually use it?". When I asked what to do he said "use FASTA - 
bill did a better job with DNA".

  Eitan

>Message: 3
>Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT)
>From: "Tristan Fiedler" <tfiedler at rsmas.miami.edu>
>To: bio_bulletin_board at bioinformatics.org
>Cc: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] Unigene FormatDB error on Mac OS X
>Reply-To: bio_bulletin_board at bioinformatics.org
>
>Thanks all for the hardware tips!
>
>I am currently setting up a blast run of marine sequences against Ciona
>intestinalis from Unigene.  A couple of bugs, which  maybe someone has
>overcome :
>
>
>
>1.  In the file which I downloaded 'File: Cin.seq.uniq.Z', the various
>entries in this FASTA format file have headers such as :
>
>>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full
>insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12
>14
>ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA
>AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA
>A
>
>
>How can I get a more descriptive/functional definition of the various
>unigene clusters?
>
>2.  I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona'
>command.  Is this correct?  In the formatdb logfile, how can the following
>errors be corrected (if necessary) :
>
>/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log
>
>========================[ Sep 4, 2003  4:53 PM ]========================
>Version 2.2.6 [Apr-09-2003]
>Started database file "Cin.seq.uniq"
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r")
>failed
>NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed
>NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed
>NOTE: [000.000] No number of link bits used found in config  file. Ignoring
>NOTE: [000.000] No number of membership bits used found in config file.
>Ignoring
>Formatted 13699 sequences in volume 0
>
>Thank you all very much for the assistance!!!
>
>Cheers,
>
>Tristan Fiedler
>
>--__--__--
>
>Message: 4
>Date: Thu, 4 Sep 2003 17:12:39 -0400 (EDT)
>From: "Tristan Fiedler" <tfiedler at rsmas.miami.edu>
>To: bio_bulletin_board at bioinformatics.org
>Cc: bio_bulletin_board at bioinformatics.org
>Subject: [BiO BB] Unigene FormatDB error on Mac OS X
>Reply-To: bio_bulletin_board at bioinformatics.org
>
>Thanks all for the hardware tips!
>
>I am currently setting up a blast run of marine sequences against Ciona
>intestinalis from Unigene.  A couple of bugs, which  maybe someone has
>overcome :
>
>
>
>1.  In the file which I downloaded 'File: Cin.seq.uniq.Z', the various
>entries in this FASTA format file have headers such as :
>
>>gnl|UG|Cin#S6667694 Ciona intestinalis cDNA, clone:cits020m24, full
>insert sequence. /gb=AK117037 /gi=23589844 /ug=Cin.3 /len=12
>14
>ATCAGATTAAAACATCGTCCATCGTTAGAGTTTATAATTTACATGTTTGAAAAAGTTTAA
>AATGCCTTCAAATAAACCAATTGTTAAGGATATCCCAAGAAAATGTGGCGTTCCTAGAGA
>A
>
>
>How can I get a more descriptive/functional definition of the various
>unigene clusters?
>
>2.  I used the 'formatdb -i Cin.seq.uniq -p F -o T -n unigene_ciona'
>command.  Is this correct?  In the formatdb logfile, how can the following
>errors be corrected (if necessary) :
>
>/Users/tfiedler/Desktop/blast.darwin/UNIGENE_DOWNLOAD% more formatdb.log
>
>========================[ Sep 4, 2003  4:53 PM ]========================
>Version 2.2.6 [Apr-09-2003]
>Started database file "Cin.seq.uniq"
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Library/Preferences/formatdb.cnf","r") failed
>NOTE: CoreLib [002.003]
>FileOpen("/Users/tfiedler/Desktop/blast.darwin/Resources/formatdb.cnf","r")
>failed
>NOTE: CoreLib [002.003] FileOpen(".formatdbrc","r") failed
>NOTE: CoreLib [002.003] FileOpen("/Users/tfiedler/.formatdbrc","r") failed
>NOTE: [000.000] No number of link bits used found in config  file. Ignoring
>NOTE: [000.000] No number of membership bits used found in config file.
>Ignoring
>Formatted 13699 sequences in volume 0
>
>Thank you all very much for the assistance!!!
>
>Cheers,
>
>Tristan Fiedler
>
>
>--__--__--
>
>_______________________________________________
>BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>
>
>End of BiO_Bulletin_Board Digest





More information about the BBB mailing list