[Bioclusters] distributed blasting of genomes and WASHU blast

Joe Landman bioclusters@bioinformatics.org
Thu, 13 Feb 2003 11:10:20 -0500 (EST)


As usual, Chris is more eloquent than I.  Nutshell is that you want to 
specialize the analysis to the specific 
scientific/biological/chemical/physical problem you are thinking of.

Joe

On Thu, 13 Feb 2003, Chris Dwan (CCGB) wrote:

> 
> > ...
> > 2)  If I have two large genomes that need a lengthy blast, how can I
> >     split that up?
> > ...
> > Even a valid hit can have some repeat in it ...
> > ... 
> > However, I'm after a generalized solution that doesn't require special
> > knowledge of the sequences. 
> > ...
> 
> Disclaimer first:  I don't know if this comment applies to your
> particular situation.  
> 
> So much for apologies.
> 
> I've had several mid-sized script-n-hack projects start with exactly
> this question:  "How do I BLAST one genome against another?"  When we
> got to the root of it, the biological questions of interest demanded a
> variety of approaches.  Here are two examples:
> 
> 1) Find me putative orthologs between these two chromosomes.
>    ----------------------------------------------------------
>    - This broke down into 
>      1) Find the genes
>      2) Find the orthologs.
> 
>    - In this case it makes a lot of sense to filter out 
>      low complexity sequence up front, hit each chromosome
>      with a suite of gene-finders...including a blastx vs. 
>      a well annotated protein dataset like swissprot.  From 
>      that, we get a set of possible genes in each chromosome.
>      Now the problem is more recognizable as a job that BLAST
>      might be good at.
> 
> 2) Show me the large scale genomic events that provide evidence 
>    for evolutionary relation between these two specific chromosomes.
>    -----------------------------------------------------------------
>    - Here, we do NOT want to get rid of low complexity or repetitive
>      elements.  A straight-ahead "overlapping chunks -> blastn -> 
>      dot-plot" approach gives what is wanted.
> 
> 3) Show me the paralogs (duplicated genes within a single genome)
>    and...
> 
> You get the idea.
> 
> -Chris Dwan
>  Center for Computational Genomics and Bioinformatics
>  University of Minnesota
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>