[Biococoa-dev] Factories

Mon Aug 30 08:46:24 EDT 2004

> Oops, classical mistake from us DNA people :-) Indeed, a stop amino
> acid doesn't exist, only a stop codon. The encoded tRNA doesn't have an
> amino acid attached, therefore the ribozyme falls off the mRNA and
> translation is terminated. So conceptually adding a stop amino acid is
> not right here.
> I think we should implement the intermediate layer here, which was a
> good thing to do anyway: the BCCodon and BCAlphabet objects.
> BCCodon's are objects containing three BCSymbols sized BCSequenceDNA
> (or perhaps RNA to be more precise actually), which can also act as
> their identifier, and a BCSymbol of type aminoacid. A group of BCCodons
> forms a species specific BCAlphabet which contains the species name,
> and serves as the central point to pass around in translation methods,
> and can be generated by an AlphabetManager. The AphabetManager allows
> manipulation of the BCAlphabet objects and also facilitates the
> creation of predefined commonly used alphabets (from a plist).

I'm going to check in code and a .plist later today that's my first stab at
a translation object.  There may be a way that I'm missing, but as I set out
to design things, I couldn't come up with a way to translate that uses
codons that's easier or more clear to code than going straight through the
DNA itself.  It would be if we refused to translate sequences with ambiguous
bases, but I don't like that idea.

Basically, with codons it turned into one giant lookup table where every
potential codon had to be represented.  Using the tree-like structure, all
the triplets where the wobble base doesn't matter can be represented by a
single entry, and most have only a purine/pyrimidine entry in the wobble
position, and ambiguous bases are easy to handle.  The downside is that, if
you initially lump the DNA sequence into codons, you have to decompose them
into individual bases again to use this layout.

Again, I very well may be missing something, but it'll take me committing
the code for you to get a better sense of that, I'd imagine ;).

> I'm just thinking a bit out loud here about the following. In 4Peaks I
> get a nucleotide sequence derived from the trace file, which I
> "translate" to a protein sequence. But commonly this indeed contains a
> lot of stops: ACTW*GGH*LAK etc. By definition this is can not be a
> protein as Koen nicely mentioned. Perhaps we can make BCCodon a
> subclass of BCSymbol as well (I think that makes sense) and add a
> BCSequence subclass called BCSequenceCodons. I think this can greatly
> help in implementing translations and also in things like ORF finding.
> The nice thing here is that we can model the Protein Sequence as a real
> protein in which we don't have to think about what to do with stops in
> calculations like pI.
Okay, we do seem to have a problem.  Stop codons don't belong in a protein,
and would screw up calculations on the protein (how do you do a molecular
weight of something discontiguous?) but as you saw, there's many cases
imaginable where you're going to need the full stretch of amino acid symbols
that include stop codons (I'm going to want a bunch when I do the ORF
methods in BCSequenceDNA).

A potential solution:  have a BCSequenceAminoAcid, that may contain stop
codons.  BCSequenceProtein can be a subclass of that, or a separate class
entirely.  It would (just maybe?) need to validate its sequences to ensure
that there are no stop codons.

Cheers,

John

_______________________________________________
This mind intentionally left blank