[Biococoa-dev] BCSequenceReader

Sat Nov 13 19:49:01 EST 2004

On Nov 13, 2004, at 9:44 AM, John Timmer wrote:
>

>
> All the sequence classes use [symbol undefined] of the appropriate 
> subclass
> if they hit a character they can't recognize.  Koen also put the
> sequenceCountedSet code in.  Simply send the string to each of the 
> three
> sequence classes, then use the counted set to determine the one which
> results in the fewest undefined symbols.  If the number turns out to be
> equal,  use DNA > RNA > protein to decide which sequence to use so 
> that we
> can stay within the central dogma.
>
> The code should be very clean and easy to follow, though it may not be 
> as
> fast as I'd like, given there's three sequence objects created and 
> looped
> through.

That's a problem, I agree. But this situation is not going to happen 
that often, because in most cases the user probably knows what format 
is used. However, we should be prepared for such cases. I suggest we 
use a sequencefactory class that takes care of creating sequences in a 
centralized location, instead of scattered throughout the framework in 
classes that might encounter such situations. I will have a look at 
this this weekend, to see if I can get this to work.
>
>

>
> My previous thoughts follow - disregard them unless you think the 
> above is a
> bad idea:

I don't know yet :)

> Where this is going to work poorly is very short sequences, like 
> restriction
> sites - I think we should only enter this code if the sequence is over 
> 10bp
> or so.  Maybe we should just treat anything under 10 characters as a
> protein?

I would call it a peptide then ;-)

>
> One other thought - I know the nucleotides have a non-base character, 
> and
> you also have code for
......

Actually, proteins can have ambigous symbols as well, I still need to 
update the BCSymbolAminoAcid class. I will post another message on this 
subject in a new thread.

>
>
> And that guy who answered your email was VERY optimistic in assuming 
> there's
> an accession number in the comment field....

Yeah, that's not going to work.

- Koen.