[Biococoa-dev] Re: a new design to please everybody (am I pleased?)

Tue Jan 11 02:35:10 EST 2005

Hi all,

I have a cold too, that is a coincidence! I am writing under the influence of Actifed (I feel like I am floating, very nice). In responding to John here, I just want to show (again) that both approaches have advantages and pitfalls (well, because John is already defending typed sequences, I can only show their pitfalls and I can only say good things about a generic BCSequenceGeneric class...).

At 9:49 AM -0500 1/9/05, John Timmer wrote:
>Non-typed methods mean that the sequence type has to be checked every time
>the method is called, slowing the code down.

Actually, this is exactly the problem I could have with typed classes... and that you don't have with BCSequenceGeneric. The good thing with a generic sequence is that you don't have to check, you know it responds (how to respond to irrelevant methods is another topic, see below!).

Conversely, I can see where I could have some checking to do with typed classes. Let's say I am writing the new killer app, say... DNAStrander. It can load proteins, plasmids,... and export/import all sort of formats(wow!). Of course, my document has a BCSequence ivar to hold the sequence (I have to use the superclass type because it could be any sequence). Imagine the user chooses complement in the menu. My code can't just send the message complement to my BCSequence ivar. I have to check the sequence type (to avoid irrelevant behavior that I don't want to handle) and cast my ivar to a BCSequenceDNA (to prevent compiler warnings) before sending the message.

As soon as you use a mix of different types of sequences in an app, you get into trouble, because you have to use the superclass to refer to them as a whole, and then identify their type and possibly some cast to get the messages right. Plus, if more sequence types are added in the future, you will have to go through all your code to add the corresponding case tests.

>Uncertain return values mean that careful developers will have to surround
>every method call with tests (did it return nil?  Was the returned sequence
>length 0?) that slow the code down and are very tedious to constantly
>implement.
>
>How are we going to define a sensible return value for a method call that
>makes no sense in the first place?  Is nil appropriate?  Throwing an
>exception?

If a header says a method is handled, it should not crash the app. So, at least, I don't think throwing an exception is appropriate in the case of a generic sequence. I would also ban nil as much as possible.

Here are examples of possible behaviors:
* complement of a protein --> self or empty sequence
* cutting a prot with enzyme --> return empty arrry or array with just the prot
* hydrophobicity of DNA --> return 0
* align a DNA and prot --> align next to each other

I don't think it will crash the app as long as you get objects of the expected types. It may result in weird behavior on the final app, but only in cases where the final user does equally weird things.

>
>With typed classes, methods could actually be grouped with the data they
>could operate on, instead of in with data they may or may not operate on.

This is what would happen anyway in the design with a placeholder class BCSequenceGeneric. Methods specific for one type will be written in the corresponding subclass. To handle all other cases, the superclass would step in and return something consistent with the expected return type. I think it could work mostly with one-liner, like 'return self' or 'return [NSArray array];',...

About alignment, more tests may have to be done (which would anyway have to be done by the user otherwise). Alignement could involve passing NSArray of sequences, so some sort of type checking will probably be needed no matter what.

>At 4 non-abstract classes to represent all sequences, I hadn't thought the
>structure was that bad.

I think it is actually good, and I am proposing to take advantage of that great structure to add another potentially useful class =)

>What is the advantage of allowing something that makes no biological sense
>(ie - complementing a protein)?

I gave some examples, and for me it is also a general sense that this could work for some types of apps (a gut feeling).

>I'm sure I could think of more were I not a bit foggy headed from cold
>medication.  It just feels like we're twisting the biology in order to
>achieve code elegance.

Yes, elegance, robustness, conciseness, backward-compatibility. Sorry I am just throwing words here and I am being a bit pedantic (am I?). I don't want to be too long and explain all over again why I think it could be true in a number of situations, and that summarizes my feelings.

>And I'm not even certain we're achieving that - it feels like the equivalent
>of getting rid of all of the UI control classes in order to achieve the code
>purity of only interacting with NSViews.  A button responds to different
>things and conveys different information than a slider does - they're
>different classes.  The same could be said for a protein and a DNA sequence
>- why treat them like they're the same class?
>Although the class cluster idea is very appealing on an intellectual level,
>it's going to take some extra work to implement it in such a way that users
>(again, meaning non-contributing developers) will be able to grasp it
>easily.  And the internal structure is probably going to be extremely
>confusing to anyone downloading the source for the first time.  Given that,
>I'm just wondering whether it's going to be more effort than it's worth.  As
>someone noted a few mails ago, it's going to require extensive
>documentation, and the assumption that developers are any better about
>reading the documentation than regular users are.

I am not fighting for class cluster per se. I am defending the existence of a good-for-all generic sequence class that will allow simple apps to be designed by the user with very little work.

The simili-class cluster I proposed in my email was a way to minimize code writing (it should not be called class cluster anymore, really, maybe simply a placeholder design). The idea is to have just two methods '-initWithString' and '-sequenceWithString' in a placeholder class BCSequenceGeneric. Once init-ed, the instances will completely rely on code in the typed sequence subclasses. Any change in those will be automatically used without doing anything special. So, the additional code is very very little: the only additional code is every time a new public method is added to one of the subclasses, we have to have code in the superclass that handles it no matter what (like I said above, at least something in the line of 'return self'). In many cases, hopefully, it would be handled by the superclass anyway (like complement).

Regarding documentation and how the whole thing is received by the user and by a potential new BioCocoa developer, I agree some extra work will be needed depending on the option we choose. However, I don't think it is 'extensive' (eg, getting the headerdoc for BCSequenceGeneric is just copy and paste of the headers from the typed BCSequence subclass) and I don't think it would be so confusing for a potential developer. Because it is a general concern (including me), I will try to write another email about this issue.

>
>As an aside, I was struck by the following quote from Charles:
>> 2. Oups, BCSequenceProtein.h does also import BCSequence.h, so the compiler
>> thinks that BCSequenceProtein can respond to '-complement'. Well, and all the
>> methods. So much for a strongly-typed sequence class!! What do I do??
>>
>> OK, I remove -complement from the BCSequence.h header, and only put it in
>> BCSequenceDNA.h, but not in BCSequenceProtein.h
>>
>> 3. Aïe, now BCSequence gets some compiler warnings when trying to use
>> -complement. How do I prevent that??
>Item 3 is the whole point - it's a good thing, not something that needs to
>be prevented ;).

Yes, this would be the most logical behavior in the current design if we were to have only typed sequence (and I would want it to be the behavior).
Interestingly, even in my last proposition, it would also be the behavior;-) Only the additional subclass BCSequenceGeneric would be granted all the methods. The abstract BCSequence would only have the methods common to all subclasses.

In conclusion, I still think there is a case for a BCSequenceGeneric. I am starting to think that there might not be a need to choose now (or ever) for one design over the other. I hope I can get that in an other email!

and of course, I wish you a quick recovery, John!

Charles

-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021