[Biococoa-dev] peptides and proteins
Koen van der Drift
kvddrift at earthlink.net
Wed Sep 8 23:38:49 EDT 2004
On Sep 8, 2004, at 2:23 AM, Alexander Griekspoor wrote:
>> BCProtease *protease = [[BCProtease alloc] initWithSequence: aSeq]
> Why BCProtease? This should be a BCDigest subclass
> (BCDigestDNA/RNA/Protein) right?
> Thus:
> [BCDigest *digest = [[BCDigest alloc] initWithSequence: aSeq];
Right.
>
> Then the next step would be to instantiate the enzyme:
>> BCProtease *protease = [BCProtease enzymeWithName: @"trypsin"];
> In principle BCProtease would be a subclass of BCEnzyme, just as
> BCRestrictionEnzyme, which all would have class methods to call for
> predefined enzymes (from a singleton dictionary), and methods to
> instantiate new ones from scratch/plist (like the BCSymbol
> subclasses).
>
>> [protease setEnzyme: @trypsin];
> the reason for seeing things as digests instead of proteases would be
> to allow cleavage with multiple enzymes, like is commonly the case
> with restriction enzymes. Therefore, the enzymes should be an array
> which you can add and remove enzymes from.
> this line would then become:
> [digest addEnzyme: protease];
Good idea. I will see how that fits in my code. I hope we can make a
general BCDigest class, without subclassing. Although I am not sure yet
how to implement multiple enzymes. Should they be handled one by one,
or all at the same time (by 'summing their cleavage sites')?
>> [protease digest]
> [digest performDigestion];
> would be a convenient way to start the digestion on cue, but we can
> also let the internal methods give the cue automatically if you ask
> for the digest results. In addition, if the object is kept around,
> adding and removing enzymes while a previous result is present should
> trigger a redigest.
Sounds like a good plan.
> Something we have to watch out for is that the sequence object
> contained in the object is a mutable one, so potentially can be
> changed underneath us. Unless we do not store a pointer, but would
> copy it. This however might be expensive.
If we just store the sequenceString, which makes the use of an
NSScanner very easy, then we can store it as an NSString:
@implementation BCDigest
-(id) initWithSequence:(BCSequence *)seq
{
if (self = [super init])
{
[self setSequenceString: [seq sequenceString]];
.... blah
}
- (void) setSequenceString:(NSString *)s
{
[s retain];
[sequenceString release];
sequenceString = s;
}
> So perhaps this is one of the examples where it would be handy to have
> both a mutable and immutable variants of the BCSequence class. Unless
> anyone of you can shed more light on the issue.
See snippet above.
>
>> NSArray *thePeptides = [digest digestResult];
> That would be the idea. This means that the result is cached by the
> digest object right?
¿Que?
>> Yes, that's taken care of in the plist using the CleaveDirection key.
>> We have to add some code like:
>>
>> [newPeptide setCleavedAt: cleavedAtN]; // or 5' or 3' or
>> cleavedAtC
>
> Well that's not exactly what I meant. When you cut vector DNA with
> for instance EcoRI and BamHI, you would get for example:
> Fragment 1: EcoRI---------------BamHI
> Fragment 2: BamHI--------------EcoRI
>
> So what I thought was to store in a new BCSequenceDNA subclass, called
> BCFragmentDNA two variables like
> [fragment1 set5EndEnzyme: ecori]; // ecori and bamhi are of class
> BCEnzyme, or BCRestrictionEnzyme to be more precise
> [fragment1 set3EndEnzyme: bamhi];
> indeed set by the digest object.
>
> for peptides that would be
> [peptide setCarboxyEnzyme: nil];
> [peptide setAminoEnzyme: trypsin];
>
> Although I hate the set5EndEnzyme already so if anyone could come up
> with a better name, ideally spanning all sequence types
> (DNA/RNA/Protein).....
>
> Finally, besides the enzymes, the fragment class also needs to store
> the position it represents within the uncut sequence (see below)
>
>
>>
>>> Therefore, I proposed the BCFragment class, which could be a
>>> subclass of BCSequence that stores these additional BCEnzyme
>>> variables (which can also be nil by default if the end is
>>> untreated).
The fragments are just sequences, and once created they no nothing
about where they originate from (just as in a petridish). Why not keep
that data in the BCDigest class that did the actual cutting? But I am
open to more discussion, because below I suggested a BCPeptide class :)
Or we make a BCDigest return a dictionary that looks something like:
<key>fragment1</key>
<dict>
<key>sequence</key>
<string>GATATAGATCGAT</string>
<key>start</key>
<int>23</int>
<key>end</key>
<int>32</int>
<key>startEnzyme</key>
<string>bamhi</string>
<key>endEnzyme</key>
<string>ecori</string>
</dict>
There - all info stored together :)
> Like a BCDigest, you could think in the direction of a very analogous
> BCMap which would return instead of an array of fragments, an array of
> positions. You would feed BCMap, a single sequence, enzyme(s), and it
> would return all cut positions.
This is already how I code my digest class. First create an array of
cutpositions using the NSScanner, then feed those numbers to the actual
digest, which returns the fragments.
> The question is whether the mapping and digestion requires two
> separate classes, I think we can fuse them into one, as long as we
> provide sufficient methods to let them act in both ways.
Yes - see comment above.
- Koen.
More information about the Biococoa-dev
mailing list