[Biococoa-dev] Should we choose?

Charles PARNOT charles.parnot at stanford.edu
Tue Jan 11 03:59:55 EST 2005


At 10:34 PM +0100 1/9/05, Alexander Griekspoor wrote:
>Peter sums up my feelings perfectly:
>
>>In other words, I'd love the class cluster approach with one, good-for-all BCSequence class if this were the only interface, just like with NSString or NSArray. But what would be the point of having the cluster around if we will need to deal with the subclasses (or related, more specific classes) anyway. Wouldn't that be an unnecessary duplication of the interface users will have to learn, or at least be confusing? The point I want to make is: shouldn't we choose one approach, in stead of offering two options to the users?
>
>Absolutely! I'm definitely in favor of this brilliant one-does-it-all BCSequence class, but the moment we're moving into the situation where you're using the subclasses in the end, there's no use to go in that direction IMHO. I think we should choose either way.

OK, here is my latest thoughts. We may not have to choose now, and maybe we should not, for several reasons:

* the way I proposed to have BCSequenceGeneric implemented in my Saturday's email, it just requires a few lines of code in this new subclass, and will automagically take advantage of the code in the other subclasses (see my other email today); so both designs can be developed at the same time at very little cost for the developers (the BCSequenceGeneric just living like a small harmless parasite; OK, better, living in perfect symbiosis with the others;-); if ultimately we want to choose, there will be almost no refactoring needed; we either ditch BCSequenceGeneric, or make the other subclasses private (and probably promote BCSequenceGeneric to the superclass); what I am saying here is we can start coding now and think later (that should please Alex!!)

* because it can wait, proponents of both sides can live happily together (in symbiosis) within the BioCocoa team without the feeling that this is not right, which could hurt their motivation; again, us developers will be also the main users for a while, and giving the possibility for each of us to do it the way we like it will be a good motivation to keep contributing...

* as applications using the frameworks get mature (mostly developed by the developer/user of the Biococoa team), things might get clearer too; ultimately, it might make sense to have both BCSequenceGeneric as well as the other typed subclasses around at the same time; they have very distinct roles and distinct potential uses; the BCSequenceGeneric could be used in general purpose program, while the other subclasses could be used in more specialized programs (this is just an example); users may like one approach vs the other for various other reasons; probably the user would not want to mix both approaches, though;

This is the first part of my thoughts.


Following the third point, I just want to consider for a minute that we keep both designs around the way I proposed it. I already talked about the coding effort, which I see as really small. What about the documentation issue, for both the user and a new developer? This is a general concern about having two designs at the same time, and I agree it might ultimately prove confusing. But let's just imagine it is there for now. How could we still do it right?

* documenting BCSequence; the purpose here is mostly the way you introduce it; I will try something...

----
BCSequence is an abstract superclass for the different type of sequences handled by the BioCocoa framework. The concrete subclasses include:
- BCSequenceDNA that handles ....
- BCSequenceRNA that handles ....
- BCSequenceProtein that handles ....
- BCSequenceCodon that handles ....

In addition, BCSequence has an other concrete subclass called BCSequenceGeneric. This subclass encompasses all the different types of sequences and can respond to all of the messages normally specifically handled by only one or a subset of the other subclasses BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... Thus, BCSequenceGeneric is a general purpose class, that can handle of the messages that any type of sequence could have to respond to.

It is the user's choice to use the weakly type BCSequenceGeneric class or to use the set of typed subclasses. This choice will depend on the type of applications developed, and will also depend on the user's personal taste. The framework has been designed to function with both approaches, though mixing the two might prove confusing and is not recommended. BCSequenceGeneric will appear more powerful and flexible to develop general purpose applications. The use of the typed sequence classes BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,...  will allow more control on the details of the app behavior, and might be more appropriate for more specialized applications. Finally, it might also simply be a matter of taste.

Note that BCSequenceGeneric is designed to automatically use the implementation of the typed sequence classes. Because of this, the behavior and the performance of the general purpose class are strictly equivalent to that of the corresponding typed sequence class, in any given situation.
-----


* documenting BCSequenceGeneric - introduction...
----
BCSequenceGeneric is a concrete subclass of BCSequence. As suggested by its name, BCSequenceGeneric provides a generic interface to all the sequence types (DNA, RNA, protein,...). In reality, BCSequenceGeneric is just a placeholder class. After initialization, it will actually return an instance of one of the typed subclasses BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,... Its functioning is very similar to the class cluster design. Importantly, this is all transparent, so the user of the BioCocoa framework does not have to know about the details (and is better off ignoring them, actually). Importantly, this design results in behavior indistiguishible from the underlying typed sequence classes BCSequenceDNA, BCSequenceRNA, BCSequenceProtein,..., and has no cost in performance over using those subclasses explicitely.

When a method is appropriately called on the right sequence type (like calling hydrophobicity for a protein), it automatically uses the appropriate implementation of the subclass. When the method is irrelevant for the sequence type (like calling hydrophobicity for a DNA sequence), the method still returns a value of the expected type, such as an empty sequence, an empty array, or a zero value. This way, the developer should be able to use BCSequenceGeneric in all situations without having to check the sequence type or fear runtime errors. By leaving the details for the framework to handle, the application requires less code and its behavior will be more general.

If more control is needed over the application behavior, or if different types of sequences are handled by separate parts of the application, the developer might consider using explicitely the other subclasses of BCSequence, namely  BCSequenceDNA, BCSequenceRNA and BCSequenceProtein.
----

* documenting the methods of BCSequenceGeneric: copy and past of the headers from BCSequenceDNA/RNA/...

* explaining the design to a new developer. Reading the user docs will introduce the concept just as well. The class hierarchy itself makes sense. Once the purpose of BCSequenceGeneric is understood, the implementation is trivial. The concept of a placeholder class is either already known, or new, in which case the new developer will learn something. He can then forget about the details.


I may be missing some other details (or huge problems?), but it seems not so difficult to explain, is it?


OK, I will stop here!
In conlusion, I believe we could keep the existing code, start coding again now, keep the two designs around, and choose later the best design. Or not even choose. In which case there might be ways to present it to the user, the easiest path here being plain honest about the schizophrenic aspect of the framework.

good night,

Charles

-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021



More information about the Biococoa-dev mailing list