[Biodevelopers] Looking for repeat motifs - ideas?

Mike Marchywka marchywka at hotmail.com
Wed Feb 27 17:58:58 EST 2008


> Using RegExes, how would you handle limited mismatches within the repeated motif, esp. when its position is unknown?
>

Well, first, I was looking for exact things and going with the idea that equality is
easier than a metric in a high-dimensional space. But, if you are looking for
short things, and willing to limit yourself to 1 or 2 mismatches, then you could
split up an exact group into a pair of groups. For example, instead of looking for
a thing 10 long with upto 1 mismatch, you could look for a pair of "things" each
1-10 long with a "match anything" field of length 0-1. 
[\-1]{1,10}.{0,1}[\-2]{1,10} etc
and then take things that total to the desired length. This particular example
may generate a lot of 1-0-1 hits ( two identical bases separated by 0 or 1 "X"'s)
but, depending on what you are doing you could filter the output with awk or
I could make a total length requirement etc. 
( the "-" is for forward match, NOT reverse complement that I do by default )

I'd actually have to look- it may even be easier to code an allowed "mismatch"
parameter if you are going to do this alot. 


I'd have to give this some thought and maybe someone on the boost list 
could explain how to do this with a "real" perl regex ( I have a made up syntax
and set of limitations to meet my needs with best easily achievable performance).







Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: Hotmail is blocking my mom's entire
ISP claiming it is to reduce spam but probably
to force users to use hotmail. Please DON'T
assume I am ignoring you and try
me on marchywka at yahoo.com if no reply
here. Thanks.

> Date: Wed, 27 Feb 2008 16:49:28 -0500
> From: jeff at bioinformatics.org
> To: biodevelopers at bioinformatics.org
> Subject: Re: [Biodevelopers] Looking for repeat motifs - ideas?
>
> Mike,
>
> Using RegExes, how would you handle limited mismatches within the repeated motif, esp. when its position is unknown?
>
> Jeff
>
> Mike Marchywka wrote:
>>
>> The regex people probably question my syntax but I'm using things like
>> [\1]{10,20}.{10,20}[\2]{10,20}.{10,20}[\1]{10,20}[\2]{10,20}
>> to find pseudo knots with distance of 10-20 between reverse-complement regions.
>
> --
> J.W. Bizzaro
> Bioinformatics Organization, Inc. (Bioinformatics.Org)
> E-mail: jeff at bioinformatics.org
> Phone: +1 508 890 8600
> --
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/biodevelopers

_________________________________________________________________
Climb to the top of the charts! Play the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan


More information about the Biodevelopers mailing list