[Biodevelopers] Looking for repeat motifs - ideas?

Michael Nuhn nuhn at rhrk.uni-kl.de
Thu Feb 28 08:06:22 EST 2008


Hi, Mike!

I took a look at rnamot. It looks very similar to rnabob and, just like
rnamot, it does not look for direct repeats. :-(

I did not find your program on the mailing list. Your idea of using regular
expressions is interesting. Constructing them automatically from a specified
number of mismatches and then filtering through them automatically is
tricky.

Your way of reducing the problem to a different one, brought me to another
idea. Since rnabob can already find inverted repeats, every search for a
normal repeat could perhaps be reduced to a search for an inverted repeat
like so:

Given the sequence: AAA GC AAA
The repeat search should find the repetition of AAA

The reduction goes like this:

1. Find a sequence that does not appear in the original sequence, this will
be a seperator (here: XX)
2. Reverse complement the original sequence and join them with the
seperator. This would be

AAA GC AAA XX TTT GC TTT

3. Now instead of searching for

- Repeat,
- 2 spaces,
- Repeat

I search for

- Repeat,
- 2 + Length of sequence + Length of seperator spaces,
- INVERTED_Repeat

This would find an inverted repeat in the constructed sequence if and only
if there is a normal repeat in the original sequence. Additionally, all of
the nice functions of rnabob would be preserved.

Since this is a bit complicated, I will have to sleep over this a few times.
;-) Even if it works in theory, rnabob might run into some memory problems,
once the sequences get large. I don't know how big the motifs can be but I'm
fairly certain, rnabob was not designed for something like this.

- @Osnofian and RepeatMasker: I am still going to look into the program. I
did not get to it yet, but it looks like it searches for a different kind of
repeats at first glance.

- @Mark: and http://sourceforge.net/projects/pars/ : There is no way of
downloading your project. :-( I get the message: "This project has not yet
created any file release packages." when I go to download.

 @all: Thanks for sharing your ideas so far.

Cheers,
Michael.

----- Original Message ----- 
From: "Mike Marchywka" <marchywka at hotmail.com>
To: "Development in Bioinformatics" <biodevelopers at bioinformatics.org>
Sent: Wednesday, February 27, 2008 11:58 PM
Subject: Re: [Biodevelopers] Looking for repeat motifs - ideas?



> Using RegExes, how would you handle limited mismatches within the repeated
motif, esp. when its position is unknown?
>

Well, first, I was looking for exact things and going with the idea that
equality is
easier than a metric in a high-dimensional space. But, if you are looking
for
short things, and willing to limit yourself to 1 or 2 mismatches, then you
could
split up an exact group into a pair of groups. For example, instead of
looking for
a thing 10 long with upto 1 mismatch, you could look for a pair of "things"
each
1-10 long with a "match anything" field of length 0-1.
[\-1]{1,10}.{0,1}[\-2]{1,10} etc
and then take things that total to the desired length. This particular
example
may generate a lot of 1-0-1 hits ( two identical bases separated by 0 or 1
"X"'s)
but, depending on what you are doing you could filter the output with awk or
I could make a total length requirement etc.
( the "-" is for forward match, NOT reverse complement that I do by
default )

I'd actually have to look- it may even be easier to code an allowed
"mismatch"
parameter if you are going to do this alot.


I'd have to give this some thought and maybe someone on the boost list
could explain how to do this with a "real" perl regex ( I have a made up
syntax
and set of limitations to meet my needs with best easily achievable
performance).







Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> Date: Wed, 27 Feb 2008 16:49:28 -0500
> From: jeff at bioinformatics.org
> To: biodevelopers at bioinformatics.org
> Subject: Re: [Biodevelopers] Looking for repeat motifs - ideas?
>
> Mike,
>
> Using RegExes, how would you handle limited mismatches within the repeated
motif, esp. when its position is unknown?
>
> Jeff
>
> Mike Marchywka wrote:
>>
>> The regex people probably question my syntax but I'm using things like
>> [\1]{10,20}.{10,20}[\2]{10,20}.{10,20}[\1]{10,20}[\2]{10,20}
>> to find pseudo knots with distance of 10-20 between reverse-complement
regions.
>
> --
> J.W. Bizzaro
> Bioinformatics Organization, Inc. (Bioinformatics.Org)
> E-mail: jeff at bioinformatics.org
> Phone: +1 508 890 8600
> --
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/biodevelopers

_________________________________________________________________
Climb to the top of the charts! Play the word scramble challenge with star
power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan
_______________________________________________
Biodevelopers mailing list
Biodevelopers at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/biodevelopers




More information about the Biodevelopers mailing list