[Biococoa-dev] Sequence Structure
Charles Parnot
charles.parnot at gmail.com
Mon Jul 11 14:08:05 EDT 2005
> I seem to be doing my best thinking on the subway these days - on the
> commute in, I thought about how to possibly handle this, and here's a
> potential solution:
>
> We do create a lightweight, high performance sequence object that's
> untyped.
> Basically, it acts as a specialized NSArray for sequences. The
> tools focus
> on working with this object, since they will be performing the
> processor
> intensive operations, and this is designed for performance. I
> rework the
> existing sequence subclasses to be holders for this. Convenience
> calls
> through to the tools put a "smart" interface on the otherwise stupid
> sequence object.
>
> This is not ideal, as it creates a lot more call-throughs to
> another class.
> That's not such a problem, though, as most of those call-throughs
> would have
> gone to NSArray or tool classes in the current structure anyway.
> It also
> creates design decisions - when a file is read, does it create a
> sequence or
> a typed sequence holder? Should we create methods to do both?
> What about
> annotated sequences - should they hold one or both types of sequence
> objects? Fortunately, the option of creating the appropriate type of
> sequence object on the fly should let us keep both around, as needed.
Lately, the consensus was that we should not have both untyped and
typed sequence classes at the same time, because it is confusing for
the user and even the developers of the framework. I personally don't
think it would be that confusing if things are clearly explained and/
or exposed at different levels. For instance, typed sequences could
be for "the experts". Kind of like CFArray and NSArray. BTW, which
are toll-free bridged.
Then there are different ways to implement this. The structure that
we have now is one. What you are proposing is another, and might be
easier to understand at least from the BioCocoa developer point of
view. The important thing is that the two worlds (typed and untyped)
are separate from the user and compiler perspective, BUT avoid code
duplication in the implementation. This is a hard challenge. The
only way to do it is indeed to either wrap one of the object inside
the other like you propose, or use the placeholder trick I set up in
the current design; in other word, one of the object is the "real"
one, the "master" implementation, and the other is just using it and
putting a fake interface in front of it. So in the end, the public
interface look like there are 2 different kind of objects. But
internally, there is really only one, so that any change in the
implementation of the 'master' object is automatically used by the
other one.
To come back to your proposition, it is symmetric to the current
implementation. Currently, the typed classes are the "real" objects,
while the BCSequence is just a placeholder and internally generates
these typed objects. You are proposing the opposite. The one-for-all
unique BCSequence class is where implementation is, and the typed
classes would just be wrappers around instances of it.
I do think the concept would be easier to understand than the way it
is now. Maybe we could work it out to be like an "extension" not
included in the BCFoundation header, but just as an additional header
(the binary could still be part of BCFoundation, so the user would
not have to link against an additional framework, but simply to
#import an additional header, only for the compiler benefit). This
header would declare the following classes: BCTypedSequence (root,
inherits from NSObject), BCDNASequence, BCRNASequence,... It is
important that these classes are not subclasses of BCSequence,
because type-specific methods such as '-complement' are declared in
the BCSequence header and will be recognized as valid for all
subclasses. And you don't want the compiler to think that
BCProteinSequence can respond to the message.
This would work with the wrapper design your propose. I see 2
problems with the wrapper design, though:
* you add an additional layer to the call stack, which you mention;
in most cases, it should be OK and won't have much effect of
performance; but it is still there
* more problematic is that for every method of BCSequence, such as '-
complement', '-reverse', 'subsequence',... you need to write a method
for the wrapper that call the BCSequence method. This is a lot of
code. One way around it is to use the -forward trick, but that adds a
lot of overhead and may not be that easy to set up (we could
certainly consider it, though).
Rather than a wrapper, I propose we use the placeholder trick ;-) All
you have to write are the init methods, and return a BCSequence
object from these. All the code can be in the superclass
'BCTypedSequence', and the subclasses BCDNASequence,... are just
empty shells, only there for the headers (actually, they might just
need a trivial '-sequenceType' method that the superclass can call to
do the right init). So, the instance returned by the init methods
would in fact be a BCSequence object, ready to respond to all the
methods implemented there. And it would respond to any method we add
in the future without additional code (we would just have to keep the
header in sync). Of course, one thing you can't do this way is to
throw an exception when you call the wrong method on the wrong type
of sequence, like calling '-complement' on a protein. But you get a
compiler warning, which is the most important part. If you ignore it,
you only get what you deserve if your app has a weird behavior!!
Let me add the mandatory OmniGraffle thingie:
http://cmgm.stanford.edu/~cparnot/temp/typed-sequences.png
What do you think?
charles
--
Xgrid-at-Stanford
Help science move fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford
Charles Parnot
charles.parnot at gmail.com
More information about the Biococoa-dev
mailing list