[Biococoa-dev] Factories

Mon Aug 30 18:55:47 EDT 2004

> I want to start out by saying that I like the idea of a codon, and I 
> think
> they're a great idea in theory.  The issue I have is that I can't 
> figure out
> how to make them work in practice.
>
> The problem I have is that basically a codon is a cluster of 3 
> nucleotides.
> Its meaning depends on the genetic code, its derivation depends on the
> reading frame, etc. - the codons themselves are essentially devoid of
> information unless they're provided with a lot of context.
True.
>  I'm just not
> seeing an easy way to provide all of that context within a codon itself
> without having way too many codon items to manage, or generating every
> single codon uniquely, on the fly.
Ok, here's a poor man's overview of what I had in mind.

BCCodon{
	BCSequenceDNA  "ATG"
	BCAminoAcid "Methionine"
}

BCSequenceCodon{
	NSArray codons -> BCCodon 1, BCCodon 2, etc
	(Species "Homo Sapiens" / BCAlphabet "Homo Sapiens")
	(Frame "+1")

	Methods to convert output a BCSequenceDNA object (iterate over codons 
and read back sequence from BCCodons).
	Methods to convert output to BCSequenceProtein object (interate over 
codons and read aminoacids from BCCodons)
	The latter might need to ORF finder or needs parameters to define what 
to do based on stops. Return longest protein, return first protein, 
return all proteins, return all proteins longer then... etc)
}

The BCCodons are indeed species specific and instantiate by the 
AlphabetManager on a per alphabet manager, and are BCSymbol (subclass) 
singletons in a static dictionary. Most commonly used alphabets are 
predefined and can be instantiated directly using class methods

BCAlphabet{
	NSArray codons or dictionary with DNA triplet as key.
	Species "Homo Sapiens"
}

I think again much of the code used for Nucleotides and Aminoacids can 
be used for the BCCodons as well, as they are BCSymbol subclasses. In 
each alphabet are 64 possible triplets, if encoded in a plist these 
should be easy to implement in a static dictionary.
I most certainly agree that there are problems with this approach as 
well, some of which you mention below. But what I understood from your 
methods is that you for instance create translation dictionaries as 
well...

> They also seem a bit wasteful - making
> codons would involve composing them from combinations of bases, but 
> they'd
> have to be decomposed into individual bases again to handle translation
> easily.
Well could be, compositing them from combinations of bases wouldn't be 
necessary if you just add a BCSequenceDNA object for the triplet. You 
can then just use the sequence comparison methods from BCSequenceDNA to 
check for equality. But I agree that this could include decompositing 
as well, perhaps there's a way to optimize this.

>
> What I've been thinking of during my commute in was a 
> BCSequenceTranslation,
> which would contain that sort of context -
> A reference to the original sequence it was translated from.
I don't think that's wise for syncing reasons as you already mentioned. 
In addition it should be a problem to iterate back over the codons to 
get  your DNA sequence back it's just a matter of adding the triplets 
to a sequence for every codon (is there a appendSequence method in 
BCSequenceDNA already?)

> A reading frame indication and/or range of translation
That could be a variable in BCSequenceCodon

> A genetic code reference.
Idem

> The ability to derive BCSequenceProtein objects from it.
See above, similar for DNA sequences, just iterate over the codons and 
append the aminoacid they represent. I mentioned a few example methods 
above already.
>
> This isn't ideal either - the DNA sequence can be edited after it's 
> created
> - so I'm not entirely happy with it.  It's just that I'm not happy 
> with any
> other options at this point, either.
Right, perhaps we still have to think of a clever way to have some 
super object that can contain all kinds of info and keep things in sync 
when you edit one of the subcontents. Ideally in a way that you only 
update locally the sequence instead of recalculating the whole thing. 
No clue how to do this however.

>
> I had a nice weekend, too, so I don't think it's just that I'm 
> generally
> unhappy ;).

Nope, I do believe that John, this is quite a complex matter...
>
*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                     AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.

*********************************************************