[Biococoa-dev] Factories

Mon Aug 30 11:32:35 EDT 2004

> 
>> Okay, we do seem to have a problem.  Stop codons don't belong in a
>> protein,
>> and would screw up calculations on the protein (how do you do a
>> molecular
>> weight of something discontiguous?) but as you saw, there's many cases
>> imaginable where you're going to need the full stretch of amino acid
>> symbols
>> that include stop codons (I'm going to want a bunch when I do the ORF
>> methods in BCSequenceDNA).
> And that's exactly where the BCSequenceCodons comes in! This is the
> intermediate your are looking for. If you enumerate over each codon and
> ask for it's representing aminoacid you get the AVTV*KLATC list you
> want including your stop codons. This sequence can also be passed to
> your ORF finder object that can generate the openreading frame (the DNA
> sequence can be easily extracted as well as each codon also has it's
> characteristic three nucleotide sequence as a variable).
> 
> Like in real life the BCSequenceCodons acts as the intermediate between
> DNA/RNA and a real protein...
> I'll have a detailed look at the translation problem later today....
> Alex

I want to start out by saying that I like the idea of a codon, and I think
they're a great idea in theory.  The issue I have is that I can't figure out
how to make them work in practice.

The problem I have is that basically a codon is a cluster of 3 nucleotides.
Its meaning depends on the genetic code, its derivation depends on the
reading frame, etc. - the codons themselves are essentially devoid of
information unless they're provided with a lot of context.  I'm just not
seeing an easy way to provide all of that context within a codon itself
without having way too many codon items to manage, or generating every
single codon uniquely, on the fly.  They also seem a bit wasteful - making
codons would involve composing them from combinations of bases, but they'd
have to be decomposed into individual bases again to handle translation
easily. 

What I've been thinking of during my commute in was a BCSequenceTranslation,
which would contain that sort of context -
A reference to the original sequence it was translated from.
A reading frame indication and/or range of translation
A genetic code reference.
The ability to derive BCSequenceProtein objects from it.

This isn't ideal either - the DNA sequence can be edited after it's created
- so I'm not entirely happy with it.  It's just that I'm not happy with any
other options at this point, either.

I had a nice weekend, too, so I don't think it's just that I'm generally
unhappy ;).

Cheers,

John

_______________________________________________
This mind intentionally left blank