[Biodevelopers] Re: BLAST asymmetrical

Fri Jan 19 09:59:18 EST 2007

> I want all homologies.
>
>> Or even Smith-Waterman which will take a while to run.
>
> Do you know of a program that can calculate SW on a pair of genomes?

This may be a semantic confusion on my part, but here's my answer to  
that specific question:

If you really want the single best *global* alignment between two  
multi megabase sequences, yes SW is the way to go, and yes, it will  
take a really long time.  On the other hand, I've never met anyone  
who really, seriously cares about monolithic, global alignments of  
chromosomes.  Go down that road, and the next question will be "why  
can't we just run clustalw on whole chromosomes?"  Yes, of course you  
could ... but it'll be really slow and not very useful.

Note:  This is not an invitation to the accelerator people in the  
audience to offer me a *faster* clustalw or SW.  I'm trying to steer  
people toward *better* uses of the tools.  You might as well work on  
multi-gigabyte cut-and-paste buffers so that I can stuff whole  
genomes into the NCBI web interface.

On the other hand, if you want the best gene sized (a few kilobase)  
matches from within that pair of megabase sequences, it's a different  
question.  You're going to wind up chopping each sequence into  
overlapping chunks and running an all against all search of some  
sort.  The chunk size will be determined by how large you think the  
introns and exons in your genes are.  An even more clever approach  
might involve doing preliminary gene calls with a gene finding  
program like Glimmer, and then starting the all against all search  
from those hits.

Chromosome vs. chromosome BLAST answers the question "is there a  
decent hit to any part of this chromosome in that other one".  The  
answer, broadly speaking, will be "yes, there is a statistically  
significant match there."

If you want homologous genes, you're going to have to do a bit more  
work than just running a single program to get The Answers.

-Chris Dwan