[Biococoa-dev] Sequence Structure

John Timmer jtimmer at bellatlantic.net
Mon Jul 11 11:04:24 EDT 2005

> My understanding is that the main reason you prefer the typed
> sequences, is that you can avoid sending the wrong type of sequence to
> the wrong operation, is that correct?  Because this is unavoidable for
> untyped sequences, we should do our best to find a solution for this.
> Alex's suggestion, using the symbolset, seems a good step forward. On
> the other hand, untyped sequences make implementing the
> immutable/mutable classes much easier.

Well, it's not just that.  Untyped sequences make the sequences stupid -
they don't know what they are, they don't know what operations they can
perform, and they have to have defined responses to cope with requests for
operations they can't perform.  Essential methods get scattered in a ton of
other classes (mostly tools, but there's going to be a lot of tools in the
end) - you need to call through to a separate class just to figure out what
type of sequence you have, and something as simple as complementing a
nucleotide sequence requires the creation of a new object and may call
through about 4 different methods there (which is what I hate about BioJava,
as we've discussed).  And yes, I hate having to test the sequence type every
time I think about doing anything with the sequence.  If I create a
nucleotide sequence, I want it to act like one.

> John, I am sure I speak for the others too that I would hate to see you
> leave, so again, we should try all our efforts to come to a good way
> out of this.

I seem to be doing my best thinking on the subway these days - on the
commute in, I thought about how to possibly handle this, and here's a
potential solution:

We do create a lightweight, high performance sequence object that's untyped.
Basically, it acts as a specialized NSArray for sequences.  The tools focus
on working with this object, since they will be performing the processor
intensive operations, and this is designed for performance.  I rework the
existing sequence subclasses to be holders for this.  Convenience calls
through to the tools put a "smart" interface on the otherwise stupid
sequence object.

This is not ideal, as it creates a lot more call-throughs to another class.
That's not such a problem, though, as most of those call-throughs would have
gone to NSArray or tool classes in the current structure anyway.  It also
creates design decisions - when a file is read, does it create a sequence or
a typed sequence holder?  Should we create methods to do both?  What about
annotated sequences - should they hold one or both types of sequence
objects?  Fortunately, the option of creating the appropriate type of
sequence object on the fly should let us keep both around, as needed.

Regardless, in the end, I'm not threatening to no longer contribute - my
focus would just change based on which things I would find most useful.  The
sequence classes would no longer be useful, but I'm certain there are still
things in the framework that I'd like to use in future projects.



This mind intentionally left blank

More information about the Biococoa-dev mailing list