[Biococoa-dev] BCSequence class cluster

Charles PARNOT charles.parnot at stanford.edu
Fri Jan 7 19:23:35 EST 2005


>Hi all -
>
>Let me add my happy new year wishes to everyone as well.  I've followed this
>discussion as carefully as I could, given a ton of scientific writing I've
>got to do (plus a side programming project related to the digital camera I
>got for christmas).  I think it might be useful for me to try to summarize
>what I think I understand and add a comment or two, and then let you guys
>tell me where I'm wrong -
>
>For the purposes of discussion, I'm going to use "users" to mean "developers
>using the framework, but not working on its implementation" and "developers"
>to mean us.

Yes, this is a very simple and useful convention!

Sorry I will answer the email in a different order and answer the final comment first (mostly to justify the length of my emails today)

>Now, for the important comment:  I think for major design issues like this,
>we could do worse than look at how Apple implements things.  I went to a
>Tiger Tech Talk recently, and it's clear that the folks working there put a
>tremendous amount of thought into design, and in some cases plan designs to
>account for technology that's over a year away from being implemented.
>
>Given that, it may be useful to look at where Apple uses class clusters.  As
>far as I can tell, they use them only in situations where there are multiple
>Core Foundation objects that they want to provide a single, simple object
>oriented wrapper that hides the implementations from ObjC users.  The
>primary advantage of this seems to be that it allows the class to swap the
>CF objects that hold the actual data as needed without the user being
>bothered with the implementation.
>
>I don't think that we have an analogous situation here, though I'm not
>positive about that.  Given that, I think we should proceed with caution,
>and perhaps ask Apple's Cocoa Dev list for their opinions on when to use a
>class cluster.

I totally agree that such a big design decision should be carefully questioned and then carefully planned, be it class clusters or not. It seems to me that with the core of BCSequence in place, the arrival of annotations and of BCTools, some design decisions have to be taken anyway. I did raise an additional sets of questions and pointed to some issues, coming after many of the discussions you had before. It seems you already had to take some decisions in the past, and more are coming! All these discussions are useful (!), even though right now, it seems to be all theoretical and no decision has been taken. In fact, one could get the feeeling that such discussions could go forever, and stall the project. I agree with you that the current design decisions could be critical for the future, so one or two weeks of discussions are no big deal. This way, we can see where the questions are, and get a sense of the priorities before coming up with a roadmap, and also a set of yes/no design questions to take a decision on. And then all vote, and then Peter makes the decision.. I am just guessing at the current process for decision taking ;-)

Like you say, we have to foresee all the reasonable possibilities. We have one great tool, which is to look at other implementations of the various BioX efforts. It does not mean we have to copy the design (though part of it can probably ripped off). The language is different, of course, and the possibilities it offers, and the habits that come with it. But also the usage is different. Cocoa users are used to some designs and have different expectations than perl users. The latter write quick, flexible small scripts, that they may  very well dump after 2 weeks. The Cocoa developer wants to write a long-lasting application, starting with a simple layout, where more details can be added later. What the other BioX efforts can show us, is what kind of tools can exist, what type of sequence can exist, what kind of annotations, and how they all play together. I personnaly don't have much knowledge of the other BioX efforts. How about you guys?

Regarding class cluster.
To quote Apple, 'The grouping of classes in this way simplifies the publicly visible architecture of an object-oriented framework without reducing its functional richness'.
I did not know about the CF types so much, I had just read about class clusters as a way to provide optimized codes adapted to the size of the object (or at least have the possibility to do so at some point). This is true of NSData, NSString, NSArray and NSDictionary, that may vary a lot in size.
NSNumber is a quite different story and is probably closer in spirit to what a BCSequence class cluster would be. So that could be a better reference.

Asking the cocoadev mailing list (or a discussion on cocoadev.com) is a very good idea. I thought about it at some point. Ultimately, because WE are the biologists, only us can decide wether dna and protein sequences are as close to each other as, eg int and float (cf NSNumber).


To go further, let 's go very far (or not so far) in the future. What could BioCocoa be? For me, it seems it could do to sequences what the WebView does with web pages. Thanks to the simplicity and power of WebView, with two lines of code (or even just a few links in a nib), you get a web browser. Imagine the same with BioCocoa. A nib with a BCSequenceView in a window, and a few menu items like 'complement', 'reverse', etc... Then a few lines of code in a controller would allow the user to load a sequence from file, choose complement in the menu and get a new window with the complement. The developer of that app (I can't call him the user anymore, sorry!) would not have to know which type of sequence the view is dealing with. So it would just forward the 'complement' calls to the sequence in the view and pop up a new view with the returned sequence, no question asked. What if the user of the app opens a protein sequence, and chooses complement in the menu. What should the user of the app expect? Well, the user should not be surprised to get the same sequence back, or some empty window, or nothing. The developer of BioCocoa are not to blame, the developer of the app is not to blame, the user of the app can only blame himself for that and if he does not understand what is happening, he should probably not use that app!! In the meantime, BioCocoa has made the life of the developer of the app very easy; it took less than an hour to build a good-looking app; and should more types of sequence be added in the framework, no need for any change in the code. I love that story :-)

*******
now the other stuff...


>A class cluster is a potentially good thing, in that it would hide some of
>the complexity of the implementation from users.  It doesn't necessarily
>make Koen happier, since we may well have all the current classes used
>internally.  As far as I can tell, though, nothing short of me getting
>around to implementing BCSequenceNucleotide would solve Koen's biggest
>gripe, the duplication of three methods in the DNA and RNA subclasses.

I agree that BCSequenceNucleotide could help at some point, no matter what design is chosen.

>Since only the headers are going to be clearly visible to users, could we
>make sure that there are comments in the header that indicate if a method is
>a convenience call through to a different class and, if so, what class to
>find the critical method in.

Yes, this is true for any class hierarchy you are building, but probably even more critical for a class cluster.





>None of us have ever done this, so we'd kindof be making it up as we go
>along.
>
>Is that about right?

yep, looks like it!

Actually, I have implemented a class cluster in some of my code. In my (limited) experience, it is a nice design and of very simple use once you get past the conceptual alloc/init trick.
It is also a great example of how flexible you can get with typing. I knew that the compiler warnings were just syntax sugar (I hope this is the appropriate expression), but the possible manipulations at runtime are infinite. For instance, in the class cluster design, there is absolutely nothing that prevents you from returning instances of objects that are NOT subclasses of the public class. The object returned could be from a completely unrelated class; at runtime, they are all ids. Of course, you can easily shoot yourself in the foot too...





>We could still make Alex and I happy by defining protocols for sequence
>sub-types, and use them in the methods that act on specific sequence types,
>like complementation and hydrophobicity.  This way, errors can still be
>thrown at compile time, instead of taking the app down at runtime.

Protocols are a possibility, definitely. If there are not too many cases where this is needed, it is a possibility. What I had in mind when proposing protocols to handle strong typing and not giving away the single interface was something like:

//public superclass of the class cluster
@interface BCSequence {}
+(BCSequence *)sequenceWithString:(NSString *)aString;
-(BCSequence *)complement;
-(NSNumber *)hydrophobicity;
@end

//these methods shuold be implemented but not put in a public header
//these methods should be implemented ONLY in the relevant private subclasses
//so that a runtime error is generated when called on the wrong type
@interface BCSequence {}
-(BCSequence <BCSequenceDNA>)dnaComplement;
-(NSNumber *)proteinHydrophobicity;
@end

//a category to provide strong typing
//the instance returned will be BCSequence at runtime
// but the compiler does not know
@interface BCSequence (BCSequenceStrongTyping)
+(id <BCSequenceDNA>)dnaSequenceWithString:(NSString *)aString;
+(id <BCSequenceProtein>)proteinSequenceWithString:(NSString *)aString;
@end

//the method will actually return a BCSequence at runtime
//but the compiler does not know
@protocol BCSequenceDNA
-(id <BCSequenceDNA>)complement;
@end

@protocol BCSequenceProtein
-(NSNumber *)hydrophobicity;
@end

...

NSNumber h;
BCSequence *seq;
id <BCSequenceDNA> *dna;

//when it is a BCSequence, dna can use the generic methods
//and always get something back
seq=[BCSequence sequenceWithString:@"ATGCTAGACGAAT"];
seq=[seq complement];
//h is now nil, or [NSNull null], or NSNumber=0 (?)... or runtime error?
//and no compilation warning
h=[seq hydrophobicity];

//now strong typing
dna=[BCSequence dnaSequenceWithString:@"ATGCTAGACGAAT"];
dna=[dna complement];
//compilation warning
h=[dna hydrophobicity];

...

I tried several alternatives and it was not easy to choose the right way of doing things. Is that what you were thinking of? It seems you suggest to even remove 'complement' and 'hydrophobicity' from the header of BCSequence. I just realized that then there is no need to hide the subclasses and build the artificial protocol thing. So I suppose you want to keep BCSequence header with all the methods. In that case throwing the app down is not a good answer, and the results should always be something not too stupid.

I just thought of an analogy to throw in the discussion. NSString has the path methods like 'stringByAppendingPathExtension'. These will work on ANY string, even if they are not path, but the contents of this email. However, it does not make sense. Do we get a compiler warning? no. Do we get a runtime error? no. Are we in trouble? yes. What the f... this string is doing here when I should have a path??? This is clearly the fault of the user, here, not of the guy who designed NSString. OK, this is really a much simpler situation than ours, but still.

Ok, enough blabla for today.

Have a good night, guys!

Charles


-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021



More information about the Biococoa-dev mailing list