[BiO BB] Re: All-again-all protein sequence comparison (Iddo Friedberg)

Hongyu Zhang forward at hongyu.org
Fri Dec 17 13:24:06 EST 2004


>
> I wouldn't go with the strategy of having  one
> genome as a database, and
> another as a query pool, because that would skew
> your BLAST statistics
> to give you false-positive hits. I would go with the
> all-vs-all pairwise
> BLAST.
>

The problem with all-vs-all pairwise comparison is that it will be
slower than the strategy of using one genome as a database and the
other as the query. The statistics issue, I think, only comes when you
do reciprocal BLASTs, ie., blast genome A agaist B and then genome B
against A, then you probably will get two slightly different E-values
for the same pair of sequeneces. The problem, however, can be mostly
circumvented by setting the database size the same in both BLAST
directions (parameter "-z" in NCBI-BLAST and "Z=" in WU-BLAST)

--
Hongyu Zhang, Ph.D.
Computational biologist
Ceres Inc.






More information about the BBB mailing list