[Biococoa-dev] peptides and proteins
Alexander Griekspoor
mek at mekentosj.com
Wed Sep 8 02:23:35 EDT 2004
Koen,
> I never thought of keeping the digest object around, but if the
> developer wants to keep it, why not?
Indeed, that would be the idea. I was still thinking to have a digest
object be fed to a digester "controller", more from a MVC model, but
you're right that there's no reason to add the additional layer, I like
the way you implemented the masscalculator, this one could be very
alike. The like example as well:
> BCProtease *protease = [[BCProtease alloc] initWithSequence: aSeq]
Why BCProtease? This should be a BCDigest subclass
(BCDigestDNA/RNA/Protein) right?
Thus:
[BCDigest *digest = [[BCDigest alloc] initWithSequence: aSeq];
Then the next step would be to instantiate the enzyme:
> BCProtease *protease = [BCProtease enzymeWithName: @"trypsin"];
In principle BCProtease would be a subclass of BCEnzyme, just as
BCRestrictionEnzyme, which all would have class methods to call for
predefined enzymes (from a singleton dictionary), and methods to
instantiate new ones from scratch/plist (like the BCSymbol subclasses).
> [protease setEnzyme: @trypsin];
the reason for seeing things as digests instead of proteases would be
to allow cleavage with multiple enzymes, like is commonly the case with
restriction enzymes. Therefore, the enzymes should be an array which
you can add and remove enzymes from.
this line would then become:
[digest addEnzyme: protease];
> // or based on a popup menu,
well that's something appkit like, doesn't really matter here how the
developer implements that.
> [protease digest]
[digest performDigestion];
would be a convenient way to start the digestion on cue, but we can
also let the internal methods give the cue automatically if you ask for
the digest results. In addition, if the object is kept around, adding
and removing enzymes while a previous result is present should trigger
a redigest. Something we have to watch out for is that the sequence
object contained in the object is a mutable one, so potentially can be
changed underneath us. Unless we do not store a pointer, but would copy
it. This however might be expensive.
So perhaps this is one of the examples where it would be handy to have
both a mutable and immutable variants of the BCSequence class. Unless
anyone of you can shed more light on the issue.
> NSArray *thePeptides = [digest digestResult];
That would be the idea. This means that the result is cached by the
digest object right?
> [protease release]; // optional
That would be the way to discard/keep things around. Very nice.
>>
>> The other think I would like to add to a BCSequence is the info of
>> which enzyme produced the 5' end and which the 3' end.
>
> Yes, that's taken care of in the plist using the CleaveDirection key.
> We have to add some code like:
>
> [newPeptide setCleavedAt: cleavedAtN]; // or 5' or 3' or
> cleavedAtC
Well that's not exactly what I meant. When you cut vector DNA with for
instance EcoRI and BamHI, you would get for example:
Fragment 1: EcoRI---------------BamHI
Fragment 2: BamHI--------------EcoRI
So what I thought was to store in a new BCSequenceDNA subclass, called
BCFragmentDNA two variables like
[fragment1 set5EndEnzyme: ecori]; // ecori and bamhi are of class
BCEnzyme, or BCRestrictionEnzyme to be more precise
[fragment1 set3EndEnzyme: bamhi];
indeed set by the digest object.
for peptides that would be
[peptide setCarboxyEnzyme: nil];
[peptide setAminoEnzyme: trypsin];
Although I hate the set5EndEnzyme already so if anyone could come up
with a better name, ideally spanning all sequence types
(DNA/RNA/Protein).....
Finally, besides the enzymes, the fragment class also needs to store
the position it represents within the uncut sequence (see below)
>
>> Therefore, I proposed the BCFragment class, which could be a subclass
>> of BCSequence that stores these additional BCEnzyme variables (which
>> can also be nil by default if the end is untreated).
>
> The BCFragment would be similar to a separate BCPeptide class,
> correct? Then let's create both classes if that makes it easier.
If you mean you like BCPeptide better than BCFragmentProtein, yes. But
BCPeptide would be at least a subclass of a general BCFragment class
right?
>
>> The nice thing is that for the BCDigest story above, nothing changes
>> still get an array of BCSequences returned, but as a convenience the
>> digest object fills in which enzyme produced the ends.
>
>> Very nice indeed, the plists are definitely the way to go. Still, for
>> restriction enzymes instantiating 600 enzymes each time would be to
>> expensive I think, so that's where it would be nice to instantiate
>> once from the plist and keep the objects around in a static
>> dictionary.
>
> I'm not familiair with restriction enzyms, can you explain why you
> have to instantiate 600 enzymes if you're just using one?
Not so specifically for digest indeed (unless one feels the need to mix
the content of his freezer and see how many bands he can produce of
course ;-), but I was more thinking in the direction of mapping. So
where are all the restriction enzyme sites inside my vector.
Like a BCDigest, you could think in the direction of a very analogous
BCMap which would return instead of an array of fragments, an array of
positions. You would feed BCMap, a single sequence, enzyme(s), and it
would return all cut positions. Again, you can keep the object around
if you want to "cache" the results. To check a vector for the positions
of all 600 available commercial restriction enzymes, one has to
instantiate all of them each time. Therefore I like to have a
predefined set available in a singleton dictionary, just like the
proteases.
The question is whether the mapping and digestion requires two separate
classes, I think we can fuse them into one, as long as we provide
sufficient methods to let them act in both ways. What we could do is
have the class work as follows:
BCDigest
stores:
enzymes array
sequence
results array
mechanism:
set sequence and enzymes
call digest (can be triggered automatically if results are asked for)
creates fragments in results array, which contain an NSRange variable
as well with their location in the original sequence)
results:
(NSArray *)fragments; returns the array of fragments
(NSArray *)cutpositions; returns the array of cutpositions by
enumaration over the fragments' ranges
This would create a very flexible class that can be used for mapping,
determining cut positions, but also doing single and multiple enzyme
digests.
Cheers,
Alex
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
E-mail: a.griekspoor at nki.nl
AIM: mekentosj at mac.com
Web: http://www.mekentosj.com
EnzymeX - To cut or not to cut
http://www.mekentosj.com/enzymex
*********************************************************
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
E-mail: a.griekspoor at nki.nl
AIM: mekentosj at mac.com
Web: http://www.mekentosj.com
EnzymeX - To cut or not to cut
http://www.mekentosj.com/enzymex
*********************************************************
More information about the Biococoa-dev
mailing list