[Biococoa-dev] peptides and proteins

Wed Sep 8 23:38:49 EDT 2004

On Sep 8, 2004, at 2:23 AM, Alexander Griekspoor wrote:

>> BCProtease	*protease	= [[BCProtease alloc] initWithSequence: aSeq]
> Why BCProtease? This should be a BCDigest subclass 
> (BCDigestDNA/RNA/Protein) right?
> Thus:
> [BCDigest *digest = [[BCDigest alloc] initWithSequence: aSeq];

Right.

>
> Then the next step would be to instantiate the enzyme:
>> BCProtease	*protease	= [BCProtease enzymeWithName: @"trypsin"];
> In principle BCProtease would be a subclass of BCEnzyme, just as 
> BCRestrictionEnzyme, which all would have class methods to call for 
> predefined enzymes (from a singleton dictionary), and methods to 
> instantiate new ones from scratch/plist (like the BCSymbol 
> subclasses).
>
>> [protease setEnzyme: @trypsin];	
> the reason for seeing things as digests instead of proteases would be 
> to allow cleavage with multiple enzymes, like is commonly the case 
> with restriction enzymes. Therefore, the enzymes should be an array 
> which you can add and remove enzymes from.
> this line would then become:
> [digest addEnzyme: protease];

Good idea. I will see how that fits in my code.  I hope we can make a 
general BCDigest class, without subclassing. Although I am not sure yet 
how to implement multiple enzymes. Should they be handled one by one, 
or all at the same time (by 'summing their cleavage sites')?

>> [protease digest]
> [digest performDigestion];
> would be a convenient way to start the digestion on cue, but we can 
> also let the internal methods give the cue automatically if you ask 
> for the digest results. In addition, if the object is kept around, 
> adding and removing enzymes while a previous result is present should 
> trigger a redigest.

Sounds like a good plan.

> Something we have to watch out for is that the sequence object 
> contained in the object is a mutable one, so potentially can be 
> changed underneath us. Unless we do not store a pointer, but would 
> copy it. This however might be expensive.

If we just store the sequenceString, which makes the use of an 
NSScanner very easy, then we can store it as an NSString:

@implementation BCDigest

-(id) initWithSequence:(BCSequence *)seq
{
     if (self = [super init])
     {
		[self setSequenceString: [seq sequenceString]];

.... blah
}

- (void) setSequenceString:(NSString *)s
{
     [s retain];
     [sequenceString release];
     sequenceString = s;
}

> So perhaps this is one of the examples where it would be handy to have 
> both a mutable and immutable variants of the BCSequence class. Unless 
> anyone of you can shed more light on the issue.

See snippet above.

>
>> NSArray	*thePeptides = [digest digestResult];
> That would be the idea. This means that the result is cached by the 
> digest object right?

¿Que?

>> Yes, that's taken care of in the plist using the CleaveDirection key. 
>> We have to add some code like:
>>
>>      [newPeptide setCleavedAt: cleavedAtN];		// or 5' or 3' or 
>> cleavedAtC
>
> Well that's not exactly what I meant. When you cut  vector DNA with 
> for instance EcoRI and BamHI, you would get for example:
> Fragment 1: 	EcoRI---------------BamHI
> Fragment 2: 	BamHI--------------EcoRI
>
> So what I thought was to store in a new BCSequenceDNA subclass, called 
> BCFragmentDNA two variables like
> [fragment1 set5EndEnzyme: ecori];      // ecori and bamhi are of class 
> BCEnzyme, or BCRestrictionEnzyme to be more precise
> [fragment1 set3EndEnzyme: bamhi];
> indeed set by the digest object.
>
> for peptides that would be
> [peptide setCarboxyEnzyme: nil];
> [peptide setAminoEnzyme: trypsin];
>
> Although I hate the set5EndEnzyme already so if anyone could come up 
> with a better name, ideally spanning all sequence types 
> (DNA/RNA/Protein).....
>
> Finally, besides the enzymes, the fragment class also needs to store 
> the position it represents within the uncut sequence (see below)
>
>
>>
>>> Therefore, I proposed the BCFragment class, which could be a 
>>> subclass of BCSequence that stores these additional BCEnzyme 
>>> variables (which can also be nil by default if the end is 
>>> untreated).

The fragments are just sequences, and once created they no nothing 
about where they originate from (just as in a petridish). Why not keep 
that data in the BCDigest class that did the actual cutting? But I am 
open to more discussion, because below I suggested a BCPeptide class :)

Or we make a BCDigest return a dictionary that looks something like:

	<key>fragment1</key>
	<dict>
		<key>sequence</key>
		<string>GATATAGATCGAT</string>
		<key>start</key>
		<int>23</int>
		<key>end</key>
		<int>32</int>
		<key>startEnzyme</key>
		<string>bamhi</string>
		<key>endEnzyme</key>
		<string>ecori</string>
	</dict>

There - all info stored together :)

> Like a BCDigest, you could think in the direction of a very analogous 
> BCMap which would return instead of an array of fragments, an array of 
> positions. You would feed BCMap, a single sequence, enzyme(s), and it 
> would return all cut positions.

This is already how I code my digest class. First create an array of 
cutpositions using the NSScanner, then feed those numbers to the actual 
digest, which returns the fragments.

> The question is whether the mapping and digestion requires two 
> separate classes, I think we can fuse them into one, as long as we 
> provide sufficient methods to let them act in both ways.

Yes - see comment above.

- Koen.