[Biococoa-dev] More on BCSymbolSets

John Timmer jtimmer at bellatlantic.net
Fri Mar 4 10:50:31 EST 2005

>> Renaming it's fine, but we HAVE to keep it.  There are going to be
>> literally
>> dozens of symbol sets, plus the potential for user-generated sets, and
>> I
>> would not want to loop through all the possibilities just to find out
>> whether a sequence could be translated or complemented.
> I am not sure if I follow this. As soon as a sequence is created, the
> symbolset is defined. So there is no need to iterate over all
> symbolsets to find out if a certain operation is possible. For
> convenience, we could extend BCSymbolSet with a method
> "containsNucleotides" that will return yes is the objects in the set
> are of that type.

I agree that the symbol set's defined, but you'd still need some way of
recognizing which type of symbol set it is.  I can't see how to do that
without iteration.

For example, let's say we provide all combinations of symbol sets using only
the single bases (ATCG), those plus N, those plus N and gap, those plus, N,
gap, and undefined, etc.  You're easily up to about a dozen symbol sets for
DNA alone.  Then you add RNA, and protein, and you're probably in the area
of 25.

Now, you need to do a restriction digest.  That only works with DNA, so you
need to know if you have a DNA sequence.  There's no easy way to do this
with just a symbol set.  You'd have to either iterate through all its
symbols and determine whether they're all DNA nucleotides, or iterate
through all the DNA symbol set singletons and test for equality to the set
that the sequence is using.  Translation's even worse, since it works with
DNA and RNA.  

I don't see how you can avoid iteration, but you feel you can, so maybe i'm
missing something.  Your alternative, "containsNucleotides" is fine, but we
already have the other system in place -  it's simple, and it works, so I
don't see the need to redo it.

Anyway, as an aside, i've been thinking that the symbol set structure would
allow for a nice encapsulation of a genetic code.  The problem is that
codons aren't symbols (since they have both amino acid and nucleotide
information).  Any suggestions on how to adapt things?


This mind intentionally left blank

More information about the Biococoa-dev mailing list