[Biodevelopers] Looking for repeat motifs - ideas?

Mike Marchywka marchywka at hotmail.com
Wed Feb 27 16:04:21 EST 2008



If you can find my archived posts on the bioinformatics list, I have a tool like
that- similar to rnamotif as it turns out. I'm away from my own computer or
I could run some examples but essentially it is a regex search optimized for
situations you describe. I can find short exact matches/reverse-complement matches
that I used for CRISPR's and pseudoknots- the adaption to almost matches 
can be accomplished with two variable length sequences allowing for one or more
mismatches.
The regex people probably question my syntax but I'm using things like
[\1]{10,20}.{10,20}[\2]{10,20}.{10,20}[\1]{10,20}[\2]{10,20}
to find pseudo knots with distance of 10-20 between reverse-complement regions.
I'm building a rule base from literature and playing with things like DSCAM and
e coli genome for testing, happy to find something useful to do with this.

AFAIK, performance is pretty good when run on my Dell Dim 4100 with Cygwin,
probably does better on a server with linux.



Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> From: nuhn at rhrk.uni-kl.de
> To: biodevelopers at bioinformatics.org
> Date: Tue, 26 Feb 2008 10:31:50 +0100
> Subject: [Biodevelopers] Looking for repeat motifs - ideas?
>
> Hello, Everyone!
>
> I am looking for a program that can find short repeats with some mismatches
> and a certain distance between them. Something like this:
>
> TTTAAG GCGC TTTAAG
>
> where the actual sequence of the repeat (TTTAAG) is unknown in advance and
> may have mismatches to the repeat.
>
> Repeat finders I know about are optimized for searching for large repeats
> anywhere on the entire sequence. That would not be useful to me since it
> would create an abundance of matches like this: TTTAAG [ca. 300 000 bases
> here and then the second] TTTAAG. Others only look for tandem repeats, so
> they would find TTTAAG TTTAAG but not TTTAAG GCGC TTTAAG
>
> The closest thing I could find is the program rnabob. It is really cool, but
> rnabob seems only to be able to find inverted repeats, not normal repeats.
>
> Does anyone know a program that can solve my problem? Help would be greatly
> appreciated.
>
> Thanks in advance,
> Michael.
>
> --
> -----------------------------------------------------------
> Dipl.-Inform. Michael Nuhn
> Bioinformatik
> Zentrum für Nanostrukturtechnologie und
> Molekularbiologische Technologie
>
> +49 (0)631 - 205 4334
> nuhn at rhrk.uni-kl.de
> http://nbc3.biologie.uni-kl.de/
> -----------------------------------------------------------
>
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/biodevelopers

_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/


More information about the Biodevelopers mailing list