[Biococoa-dev] more ramblings

Mon Nov 29 10:59:06 EST 2004

Okay, I'm cutting out a ton of quotations, because I was beginning to lose
track of the discussion (I blame my cold for lack of focus ;).  There's a
couple sets of ramblings going on, which I'll try to summarize and in lude
my thoughts on -

The first is the issue of how to handle untyped sequence files.  Koen
suggests that the method for each untyped file goes through a factory object
that handles its lack of clarity, an idea which I like.

The question then becomes how to determine which type of sequence to return.
The way I would imagine is to have a flag to determine whether to ask for
user input - this could put up a standard dialog box.  If the flag is false,
the factory method could create each type of possible sequence, then use the
sequence counted set to look for undefined symbols.  Compare the results,
and take the one with the fewest undefined symbols.  In case of a tie,
default to DNA>RNA>protein.

Ramble #2 is about the sequence wrapper/bundle, and how to implement that to
handle the multiple sequences in an alignment file.  I had envisioned the
wrapper as holding features, and a bundle as linking related sequences.  If
this is the way we go, we'd have to implement both in order to handle this
circumstance.  

A short summary of how I expected a bundle to work -
Each wrapper would have a unique bundle ID, and a reference to its bundle.
Features within the wrapper, features could include a bundle ID.  Basically,
if code wanted to look at a feature, it would check to make sure that the
bundle reference was not nil - if it wasn't, it would take the feature's
bundle ID, and ask the bundle for the sequence corresponding to that ID.
Given that a feature should have an NSRange, this would allow the two
sequences to be aligned.

For an alignment, I guess we'd have to define a key sequence, which would be
the root level - all other sequences would have to be features of this
sequence.  Otherwise, it seems like coding it would be very complex - though
maybe someone else could see a better way.

The last issue seems to be around the quote from Koen:
> I agree, but let's then focus on having these one-liners in BCSequence
> only, not in the subclasses.
I remember this quote as bothering me when I first read it, because there
are some one liners that clearly belong in a specific sequence subclass (ie
- finding the longest open reading frame should not be available to a
protein sequence, and finding the hydrophobicity should not be available to
nucleotides or codons).  I seem to remember that reading further alleviated
my concerns on this, but I can't remember how.  Since Alex and I share this
concern, could you clarify what you meant here, Koen?

I think that's everything

Cheers,
JT

_______________________________________________
This mind intentionally left blank