[Biococoa-dev] Digest tool
Alexander Griekspoor
mek at mekentosj.com
Sun Feb 27 08:40:13 EST 2005
>> Some potential pitfalls we have to foresee in the method:
>> - DNA has two strands and enzymes can both generate blunt ends and
>> overhangs (cut different position in the forward and reverse strand,
>> see the picture of the enzyme sequence in enzymeX).
>
> We should then first think about creating a double stranded BCSequence
> object in BioCocoa. Once that is in place, we can think about
> digesting those.
Perhaps yes... But at first we can just ignore the overhang and just
return the segments based on the forward sequence only.
>> - Far more than is the case in peptide cleavage, ambiguity place a
>> major role in DNA restriction enzymes: i.e. enzymes like EaeI that
>> recognises: y^GGCCr
>
> Right now I am using an NSScanner to identify the cleavage sites in a
> sequence string. Then I use those sites to create subsequences. I
> assume that NSScanner can be used for such ambiguous cases. What
> approach are you guys using in EnzymeX to find the cleavage sites? BTW
> is that supposed to be a regular expression?
I think there still is a necessity to at one point come up with an
enumerator and/or scanner for our native sequence objects as well...
That aside, yes in the upcoming enzymex (with primitive cutting
capabilities, I first use the by now famous enzyme objects, and indeed
the opensource AGRegex framework. I have an NSString extension that
returns a conventional regex string generated from it's ambiguous
recognition string:
-(NSString *)convertSequenceToPattern{
int i;
NSMutableString* pattern = [NSMutableString stringWithCapacity:
[self length]*2];
NSString *ustring = [self uppercaseString];
for(i= 0; i < [ustring length]; i++){
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"A"]) [pattern appendString: @"A"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"C"]) [pattern appendString: @"C"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"G"]) [pattern appendString: @"G"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"T"]) [pattern appendString: @"T"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @" "]) [pattern appendString: @""];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"W"]) [pattern appendString: @"[AT]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"S"]) [pattern appendString: @"[CG]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"M"]) [pattern appendString: @"[AC]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"K"]) [pattern appendString: @"[GT]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"R"]) [pattern appendString: @"[AG]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"Y"]) [pattern appendString: @"[CT]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"N"]) [pattern appendString: @"."];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"H"]) [pattern appendString: @"[ACT]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"V"]) [pattern appendString: @"[ACG]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"D"]) [pattern appendString: @"[AGT]"];
if([[ustring substringWithRange: NSMakeRange(i,1)]
isEqualToString: @"B"]) [pattern appendString: @"[CGT]"];
}
return pattern;
}
But, I know the implementation of John did not use regex, so perhaps
you better start with that one...
By the way, the problems listed above are from my own list of things to
solve before release of the new enzymex ;-)
>
>> - DNA can be circular, meaning that one has to account for potential
>> cuts in the connecting segment if circularity is the case
>
> That's a tricky one, but as with the double stranded sequence, until
> we have circular sequences in BioCocoa probably not so urgent.
I'm not sure if we need to have a circular sequence object for this
though, you only have to fuse the last and first segment, and also make
sure you don't miss any recognition patterns in the begin/end boundary.
Alex
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
Windows vs Mac
65 million years ago, there were more
dinosaurs than humans.
Where are the dinosaurs now?
*********************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 8490 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050227/d691c72c/attachment.bin>
More information about the Biococoa-dev
mailing list