[Biococoa-dev] Digest tool

Alexander Griekspoor mek at mekentosj.com
Sun Feb 27 08:40:13 EST 2005


>> Some potential pitfalls we have to foresee in the method:
>> - DNA has two strands and enzymes can both generate blunt ends and 
>> overhangs (cut different position in the forward and reverse strand, 
>> see the picture of the enzyme sequence in enzymeX).
>
> We should then first think about creating a double stranded BCSequence 
> object in BioCocoa. Once that is in place, we can think about 
> digesting those.
Perhaps yes... But at first we can just ignore the overhang and just 
return the segments based on the forward sequence only.

>> - Far more than is the case in peptide cleavage, ambiguity place a 
>> major role in DNA restriction enzymes: i.e. enzymes like EaeI that 
>> recognises: y^GGCCr
>
> Right now I am using an NSScanner to identify the cleavage sites in a 
> sequence string. Then I use those sites to create subsequences. I 
> assume that NSScanner can be used for such ambiguous cases. What 
> approach are you guys using in EnzymeX to find the cleavage sites? BTW 
> is that supposed to be a regular expression?
I think there still is a necessity to at one point come up with an 
enumerator and/or scanner for our native sequence objects as well... 
That aside, yes in the upcoming enzymex (with primitive cutting 
capabilities, I first use the by now famous enzyme objects, and indeed 
the opensource AGRegex framework. I have an NSString extension that 
returns a conventional regex string generated from it's ambiguous 
recognition string:

-(NSString *)convertSequenceToPattern{
     int i;
     NSMutableString* pattern = [NSMutableString stringWithCapacity: 
[self length]*2];
     NSString *ustring = [self uppercaseString];

     for(i= 0; i < [ustring length]; i++){
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"A"]) [pattern appendString: @"A"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"C"]) [pattern appendString: @"C"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"G"]) [pattern appendString: @"G"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"T"]) [pattern appendString: @"T"];
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @" "]) [pattern appendString: @""];

         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"W"]) [pattern appendString: @"[AT]"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"S"]) [pattern appendString: @"[CG]"];	

         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"M"]) [pattern appendString: @"[AC]"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"K"]) [pattern appendString: @"[GT]"];

         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"R"]) [pattern appendString: @"[AG]"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"Y"]) [pattern appendString: @"[CT]"];

         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"N"]) [pattern appendString: @"."];

         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"H"]) [pattern appendString: @"[ACT]"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"V"]) [pattern appendString: @"[ACG]"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"D"]) [pattern appendString: @"[AGT]"];	
         if([[ustring substringWithRange: NSMakeRange(i,1)] 
isEqualToString: @"B"]) [pattern appendString: @"[CGT]"];

     }
     return pattern;
}

But, I know the implementation of John did not use regex, so perhaps 
you better start with that one...
By the way, the problems listed above are from my own list of things to 
solve before release of the new enzymex ;-)

>
>> - DNA can be circular, meaning that one has to account for potential 
>> cuts in the connecting segment if circularity is the case
>
> That's a tricky one, but as with the double stranded sequence, until 
> we have circular sequences in BioCocoa probably not so urgent.
I'm not sure if we need to have a circular sequence object for this 
though, you only have to fuse the last and first segment, and also make 
sure you don't miss any recognition patterns in the begin/end boundary.
Alex

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   AIM: mekentosj at mac.com
                   E-mail: a.griekspoor at nki.nl
               Web: http://www.mekentosj.com

                           Windows vs Mac
	65 million years ago, there were more
                      dinosaurs than humans.
	     Where are the dinosaurs now?

*********************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 8490 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050227/d691c72c/attachment.bin>


More information about the Biococoa-dev mailing list