[Biococoa-dev] peptides and proteins

Thu Sep 9 20:36:44 EDT 2004

On Sep 9, 2004, at 7:19 AM, Alexander Griekspoor wrote:

> The way I did that in EnzymeX was to have a custom class EXMapCut 
> (don't bother the stupid name) that stores the position and the 
> enzyme, which I instantiated for each position an enzyme cuts. When 
> you do this for each enzyme and store all of the cut objects in array, 
> then the rest is simply to sort the array on position and itterate 
> over the array and create the fragment objects accordingly. I could 
> imagine that you could do this with a dictionary or set as well to 
> prevent the need for a custom object.
>>
>>> Something we have to watch out for is that the sequence object 
>>> contained in the object is a mutable one, so potentially can be 
>>> changed underneath us. Unless we do not store a pointer, but would 
>>> copy it. This however might be expensive.
>>
>> If we just store the sequenceString, which makes the use of an 
>> NSScanner very easy, then we can store it as an NSString:
>>
>> [snippet]
>
> I'm afraid this leads again to discussions we had before, but I'm not 
> in favour of this approach for two reasons. First, you could just as 
> well then copy the handed BCSequence and have your own copy that can't 
> be edited. Second, we should use strings only within implementations 
> and not as variables. Now if I want to ask the digest for its 
> sequence, this has first to be created again from the sequencestring 
> (losing all features for example)!

I think it is fine to use a BCSequence, I was under the impression that 
you thought it would be expensive, which is why I suggested to use the 
sequenceString. But using a BCSequence is more logical.

>> Why not keep that data in the BCDigest class that did the actual 
>> cutting? But I am open to more discussion, because below I suggested 
>> a BCPeptide class :)
> Exactly ;-) We both want the BCFragment subclass so it seems ;-)

I guess so ;-) So, if I understand correctly, we have a general 
BCDigest object that is fed a BCSequence. Then after the digest, the 
BCDigest object spits out an array of BCFragments, containing all the 
needed information. And the BCFragment class is a subclass of 
BCSequence. But wait, then you lose the specific BCSequenceProtein and 
BCSequenceDNA features. So maybe we should have a BCFragmentProtein (or 
BCPeptide) and a BCFragmentDNA (or BCOligoNucleotide).

>  The start and stop could just be a single NSRange variable. Hey, wait 
> a minute, there we have our BCFragment class ;-) But again, this might 
> be a good alternative. The real advantage of a BCFragment class that 
> you could easily add logic for sorting for example, because how do you 
> sort this dictionary on cutposition for example, or worse on enzymes?

You've convinced me here.

>> This is already how I code my digest class. First create an array of 
>> cutpositions using the NSScanner, then feed those numbers to the 
>> actual digest, which returns the fragments.
> Yep, exactly the plan. This brings up another thought I had, perhaps 
> it would be nice to actually create an NSScanner equivalent for our 
> BCSequences, I know the omni frameworks have constructed there own 
> scanner as well, so we might look through their code for hints how to 
> do it. The big advantage would be that we could in the implementations 
> stay native in BCSequences instead of converting everything to strings 
> all the time.

Sounds fine - you've just assigned yourself some homework for the 
weekend!

- Koen.