[Biococoa-dev] peptides and proteins

Thu Sep 9 07:19:56 EDT 2004

>>> the reason for seeing things as digests instead of proteases would 
>>> be to allow cleavage with multiple enzymes, like is commonly the 
>>> case with restriction enzymes. Therefore, the enzymes should be an 
>>> array which you can add and remove enzymes from.
>> this line would then become:
>> [digest addEnzyme: protease];
>
>
> Good idea. I will see how that fits in my code.  I hope we can make a 
> general BCDigest class, without subclassing. Although I am not sure 
> yet how to implement multiple enzymes. Should they be handled one by 
> one, or all at the same time (by 'summing their cleavage sites')?

The way I did that in EnzymeX was to have a custom class EXMapCut 
(don't bother the stupid name) that stores the position and the enzyme, 
which I instantiated for each position an enzyme cuts. When you do this 
for each enzyme and store all of the cut objects in array, then the 
rest is simply to sort the array on position and itterate over the 
array and create the fragment objects accordingly. I could imagine that 
you could do this with a dictionary or set as well to prevent the need 
for a custom object.
>
>> Something we have to watch out for is that the sequence object 
>> contained in the object is a mutable one, so potentially can be 
>> changed underneath us. Unless we do not store a pointer, but would 
>> copy it. This however might be expensive.
>
> If we just store the sequenceString, which makes the use of an 
> NSScanner very easy, then we can store it as an NSString:
>
> [snippet]

I'm afraid this leads again to discussions we had before, but I'm not 
in favour of this approach for two reasons. First, you could just as 
well then copy the handed BCSequence and have your own copy that can't 
be edited. Second, we should use strings only within implementations 
and not as variables. Now if I want to ask the digest for its sequence, 
this has first to be created again from the sequencestring (losing all 
features for example)!

>> So perhaps this is one of the examples where it would be handy to 
>> have both a mutable and immutable variants of the BCSequence class. 
>> Unless anyone of you can shed more light on the issue.
>
> See snippet above.
I think we should still consider this option. Although, it's not of 
high priority right now.

>>> NSArray	*thePeptides = [digest digestResult];
>> That would be the idea. This means that the result is cached by the 
>> digest object right?
> ¿Que?
>
What I meant is that by calling [digest performDigest], the digest 
objects performs the digest and stores the fragments in an array, thus 
caching the result (you can ask the results many times without the need 
for recalculation).

>>> Yes, that's taken care of in the plist using the CleaveDirection 
>>> key. We have to add some code like:
>>>
>>>      [newPeptide setCleavedAt: cleavedAtN];		// or 5' or 3' or 
>>> cleavedAtC
>>
>> Well that's not exactly what I meant. When you cut  vector DNA with 
>> for instance EcoRI and BamHI, you would get for example:
>> Fragment 1: 	EcoRI---------------BamHI
>> Fragment 2: 	BamHI--------------EcoRI
>>
>> So what I thought was to store in a new BCSequenceDNA subclass, 
>> called BCFragmentDNA two variables like
>> [fragment1 set5EndEnzyme: ecori];      // ecori and bamhi are of 
>> class BCEnzyme, or BCRestrictionEnzyme to be more precise
>> [fragment1 set3EndEnzyme: bamhi];
>> indeed set by the digest object.
>>
>> for peptides that would be
>> [peptide setCarboxyEnzyme: nil];
>> [peptide setAminoEnzyme: trypsin];
>>
>> Although I hate the set5EndEnzyme already so if anyone could come up 
>> with a better name, ideally spanning all sequence types 
>> (DNA/RNA/Protein).....
>>
>> Finally, besides the enzymes, the fragment class also needs to store 
>> the position it represents within the uncut sequence (see below)
>>
>>
>>>
>>>> Therefore, I proposed the BCFragment class, which could be a 
>>>> subclass of BCSequence that stores these additional BCEnzyme 
>>>> variables (which can also be nil by default if the end is 
>>>> untreated).
>
>
> The fragments are just sequences, and once created they no nothing 
> about where they originate from (just as in a petridish).

That is why the BCFragments should be subclasses of BCSequence that 
store the enzymes and range withing the uncut sequence.

> Why not keep that data in the BCDigest class that did the actual 
> cutting? But I am open to more discussion, because below I suggested a 
> BCPeptide class :)
Exactly ;-) We both want the BCFragment subclass so it seems ;-)

> Or we make a BCDigest return a dictionary that looks something like:
>
> 	<key>fragment1</key>
> 	<dict>
> 		<key>sequence</key>
> 		<string>GATATAGATCGAT</string>
> 		<key>start</key>
> 		<int>23</int>
> 		<key>end</key>
> 		<int>32</int>
> 		<key>startEnzyme</key>
> 		<string>bamhi</string>
> 		<key>endEnzyme</key>
> 		<string>ecori</string>
> 	</dict>
>
> There - all info stored together :)

That's not a bad idea either, although I would prefer to store not a 
string but a BCSequence object under the "sequence" key in the 
dictionary, and the enzymes as objects as well. The start and stop 
could just be a single NSRange variable. Hey, wait a minute, there we 
have our BCFragment class ;-) But again, this might be a good 
alternative. The real advantage of a BCFragment class that you could 
easily add logic for sorting for example, because how do you sort this 
dictionary on cutposition for example, or worse on enzymes?

>> Like a BCDigest, you could think in the direction of a very analogous 
>> BCMap which would return instead of an array of fragments, an array 
>> of positions. You would feed BCMap, a single sequence, enzyme(s), and 
>> it would return all cut positions.
>
> This is already how I code my digest class. First create an array of 
> cutpositions using the NSScanner, then feed those numbers to the 
> actual digest, which returns the fragments.
Yep, exactly the plan. This brings up another thought I had, perhaps it 
would be nice to actually create an NSScanner equivalent for our 
BCSequences, I know the omni frameworks have constructed there own 
scanner as well, so we might look through their code for hints how to 
do it. The big advantage would be that we could in the implementations 
stay native in BCSequences instead of converting everything to strings 
all the time.

Cheers,
Alex
*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   AIM: mekentosj at mac.com
                   E-mail: a.griekspoor at nki.nl
               Web: http://www.mekentosj.com

                             iRNAi, do you?
              http://www.mekentosj.com/irnai

*********************************************************