[Biococoa-dev] peptides and proteins

Wed Sep 8 02:23:35 EDT 2004

Koen,

> I never thought of keeping the digest object around, but if the 
> developer wants to keep it, why not?
Indeed, that would be the idea. I was still thinking to have a digest 
object be fed to a digester "controller", more from a MVC model, but 
you're right that there's no reason to add the additional layer, I like 
the way you implemented the masscalculator, this one could be very 
alike. The like example as well:

> BCProtease	*protease	= [[BCProtease alloc] initWithSequence: aSeq]
Why BCProtease? This should be a BCDigest subclass 
(BCDigestDNA/RNA/Protein) right?
Thus:
[BCDigest *digest = [[BCDigest alloc] initWithSequence: aSeq];

Then the next step would be to instantiate the enzyme:
> BCProtease	*protease	= [BCProtease enzymeWithName: @"trypsin"];
In principle BCProtease would be a subclass of BCEnzyme, just as 
BCRestrictionEnzyme, which all would have class methods to call for 
predefined enzymes (from a singleton dictionary), and methods to 
instantiate new ones from scratch/plist (like the BCSymbol subclasses).

> [protease setEnzyme: @trypsin];	
the reason for seeing things as digests instead of proteases would be 
to allow cleavage with multiple enzymes, like is commonly the case with 
restriction enzymes. Therefore, the enzymes should be an array which 
you can add and remove enzymes from.
this line would then become:
[digest addEnzyme: protease];

> // or based on a popup menu,
well that's something appkit like, doesn't really matter here how the 
developer implements that.

> [protease digest]
[digest performDigestion];
would be a convenient way to start the digestion on cue, but we can 
also let the internal methods give the cue automatically if you ask for 
the digest results. In addition, if the object is kept around, adding 
and removing enzymes while a previous result is present should trigger 
a redigest. Something we have to watch out for is that the sequence 
object contained in the object is a mutable one, so potentially can be 
changed underneath us. Unless we do not store a pointer, but would copy 
it. This however might be expensive.
So perhaps this is one of the examples where it would be handy to have 
both a mutable and immutable variants of the BCSequence class. Unless 
anyone of you can shed more light on the issue.

> NSArray	*thePeptides = [digest digestResult];
That would be the idea. This means that the result is cached by the 
digest object right?

> [protease release];	  // optional
That would be the way to discard/keep things around. Very nice.

>>
>> The other think I would like to add to a BCSequence is the info of 
>> which enzyme produced the 5' end and which the 3' end.
>
> Yes, that's taken care of in the plist using the CleaveDirection key. 
> We have to add some code like:
>
>      [newPeptide setCleavedAt: cleavedAtN];		// or 5' or 3' or 
> cleavedAtC

Well that's not exactly what I meant. When you cut  vector DNA with for 
instance EcoRI and BamHI, you would get for example:
Fragment 1: 	EcoRI---------------BamHI
Fragment 2: 	BamHI--------------EcoRI

So what I thought was to store in a new BCSequenceDNA subclass, called 
BCFragmentDNA two variables like
[fragment1 set5EndEnzyme: ecori];      // ecori and bamhi are of class 
BCEnzyme, or BCRestrictionEnzyme to be more precise
[fragment1 set3EndEnzyme: bamhi];
indeed set by the digest object.

for peptides that would be
[peptide setCarboxyEnzyme: nil];
[peptide setAminoEnzyme: trypsin];

Although I hate the set5EndEnzyme already so if anyone could come up 
with a better name, ideally spanning all sequence types 
(DNA/RNA/Protein).....

Finally, besides the enzymes, the fragment class also needs to store 
the position it represents within the uncut sequence (see below)

>
>> Therefore, I proposed the BCFragment class, which could be a subclass 
>> of BCSequence that stores these additional BCEnzyme variables (which 
>> can also be nil by default if the end is untreated).
>
> The BCFragment would be similar to a separate BCPeptide class, 
> correct? Then let's create both classes if that makes it easier.
If you mean you like BCPeptide better than BCFragmentProtein, yes. But 
BCPeptide would be at least a subclass of a general BCFragment class 
right?
>
>>  The nice thing is that for the BCDigest story above, nothing changes 
>> still get an array of BCSequences returned, but as a convenience the 
>> digest object fills in which enzyme produced the ends.
>
>> Very nice indeed, the plists are definitely the way to go. Still, for 
>> restriction enzymes instantiating 600 enzymes each time would be to 
>> expensive I think, so that's where it would be nice to instantiate 
>> once from the plist and keep the objects around in a static 
>> dictionary.
>
> I'm not familiair with restriction enzyms, can you explain why you 
> have to instantiate 600 enzymes if you're just using one?
Not so specifically for digest indeed (unless one feels the need to mix 
the content of his freezer and see how many bands he can produce of 
course ;-), but I was more thinking in the direction of mapping. So 
where are all the restriction enzyme sites inside my vector.
Like a BCDigest, you could think in the direction of a very analogous 
BCMap which would return instead of an array of fragments, an array of 
positions. You would feed BCMap, a single sequence, enzyme(s), and it 
would return all cut positions. Again, you can keep the object around 
if you want to "cache" the results. To check a vector for the positions 
of all 600 available commercial restriction enzymes, one has to 
instantiate all of them each time. Therefore I like to have a 
predefined set available in a singleton dictionary, just like the 
proteases.

The question is whether the mapping and digestion requires two separate 
classes, I think we can fuse them into one, as long as we provide 
sufficient methods to let them act in both ways. What we could do is 
have the class work as follows:

BCDigest
	stores:
		enzymes  array
		sequence
		results array

	mechanism:
		set sequence and enzymes
		call digest (can be triggered automatically if results are asked for)
		creates fragments in results array, which contain an NSRange variable 
as well with their location in the original sequence)

	results:
		(NSArray *)fragments; 	returns the array of fragments
		(NSArray *)cutpositions; 	returns the array of cutpositions by 
enumaration over the fragments' ranges

This would create a very flexible class that can be used for mapping, 
determining cut positions, but also doing single and multiple enzyme 
digests.

Cheers,		
Alex

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   E-mail: a.griekspoor at nki.nl
	        AIM: mekentosj at mac.com
               Web: http://www.mekentosj.com

                  EnzymeX - To cut or not to cut
              http://www.mekentosj.com/enzymex

*********************************************************

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   E-mail: a.griekspoor at nki.nl
	        AIM: mekentosj at mac.com
               Web: http://www.mekentosj.com

                  EnzymeX - To cut or not to cut
              http://www.mekentosj.com/enzymex

*********************************************************