[Biococoa-dev] Sequence factory

Charles PARNOT charles.parnot at stanford.edu
Sat Dec 25 18:56:16 EST 2004


I hope I will address of all the points you raised. The bottom line is I am still not convinced that a separate BCSequenceFactory is needed, as you will see! But I am anyway glad that you want to have factory methods for the BCSequence object and make the life of the BioCocoa user easier.

1. More about BCSequenceFactory

At 1:49 +0100 12/24/04, Alexander Griekspoor wrote:
>That was exactly my problem with this approach as well. Yet, it has some clear advantages, code centralization being the most important, but also think about caching (my favorite one is restriction enzyme analysis, a (shared/factory) object could initialize 600 enzymes, which can simply be kept around as long as you need. In contrast bringing this into a sequence object for instance would mean that you have to reinitialize the enzyme plist again and again.

Whichever way you do it, you have to create a sequence object. Now if you need some instance-independent stuff, like a list of enzymes, it can be handled many different ways and does not have to be loaded every time you create an object; it can be cached in another class or within the class. Caching in another class could use the shared singleton pattern, so you could simply have a special class to hold the information that needs to be kept around, and that can still be created lazily (for instance, a separate class dealing with enzymes would certainly be useful).

Or the cached stuff could stay inside the class implementation, using the equivalent of 'class instance variables' that can be created with static variables private to the implementation file. Actually, a relevant case is that of enzymes. If the user tries to create an enzyme that has already been created, the factory method (a class method such as '+(BCEnzyme *)EcoRI') or even the 'init' method (an instance method like '-(id)initWithName:(NSString *)name') would return the cached BCEnzyme instance that has already been created. The the BCEnzyme.m implementation file would have a static NSDictionary with the current instances already created.


2. Again about BCSequenceFactory

At 1:49 +0100 12/24/04, Alexander Griekspoor wrote:
>You would simply call:
>	mySeq=[BCSequence sequenceWithString:@"AGTAGATTTGAGGT"];
>and behind the scene this (in this case class) method would invoke:
>	factory=[BCSequenceFactory sharedSequenceFactory];
>	mySeq=[factory sequenceWithString:@"AGTAGATTTGAGGT"];
>The best of both worlds. The User won't notice the difference, except that he now has the option to choose for simplicity or to optimize things if needed (for instance retaining the tool object).

Of course, this is a way to go and keep the existing pattern. It is not exactly the best of the two worlds, though, because now you have some code dependency. Each of the BCSequence factory methods have to have a counterpart in BCSequenceFactory. If you change the name of one BCSequenceFactory method, you have to change the code in BCSequence. Ah, ah! ;-)
OK, not such a big deal, but the more code, the more bugs...


3. About BCSymbolListFactory

At 1:31 PM -0500 12/24/04, Koen van der Drift wrote:
>BTW, what is you guys opinion on adding a BCSymbolListFactory class as well?

Do you mean replacing BCSequenceFactory with BCSymbolListFactory, or do you mean having two separate classes? Having two separate classes seems a bit too much, no? Having all the members of a class tree created in the same entity seems more appropriate to me.


4. An additional note about the factory methods of BCSequence, BCSequenceDNA, ... This idea is elevant whatever the chosen pattern is, BCSequenceFactory or not.

At 11:10 AM -0500 12/24/04, Koen van der Drift wrote:
>I thought a little bit more about how to implement this. Which class should actually contain this code? Right now the code
>
> + (BCSequenceDNA *) dnaSequenceWithString: (NSString *)entry skippingNonBases: (BOOL)skip {
>     BCSequenceDNA *theReturn = [[BCSequenceDNA alloc] initWithString: entry skippingNonBases: skip];
>     return [theReturn autorelease];
> }
>
>is already in place in BCSequenceDNA (and similar ones in other subclasses of BCSequence).

To avoid having factory methods spread out in the superclass and the subclasses, you could keep them all in the superclass (they could still return instances of the subclasses). Actually, maybe this would be a bit extreme, and that could confuse the user of the framework, but it is something to think about. That would actually be one step closer to a 'class cluster' pattern... (read below)


5. Now about class cluster and dynamic vs static typing. This discussion is relevant whatever the chosen pattern is, BCSequenceFactory or not.

At 20:10 -0500 12/23/04, Koen van der Drift wrote:
>The original reason to put in the factory class was to have a central object that figures out what type of sequence we're dealing with when reading files.

At 1:49 +0100 12/24/04, Alexander Griekspoor wrote:
>At 15:48 -0800 12/23/04, Charles PARNOT wrote:
>>In a way, BCSymbolList would look a little bit like a class cluster, except there is no need for a placeholder class (actually, this could also be implemented to automagically take care of -(id)initWithString when called on the superclass), and except some of the subclasses would be public (if somebody using BioCocoa wants to use more static typing and catch more problems at compile time).
>Could you comment a bit more on this Charles? It's not entirely clear to me what you mean.

I started talking about class cluster, because you have something looking like it going on. And I have now thought a little more about it. So here are my (very deep!) thoughts...

First, here is how I would define a class cluster. A class cluster looks like a single class and has a unique public interface, so you think there is only one class of object, but in fact, under the hood, there is one public abstract superclass and there are several private subclasses that handle the different cases, which helps with optimization. So you create a object, you think it is an instance of the superclass, but in fact, if you look at it, it is an instance of one of the private subclass. Of course, it is all transparent, and it works seemlessly as if it was just one single class. For example, NSNumber has several private subclasses, each holding a different 'value' instance variable of a different type (int, or double, or etc..). This way, when you ask -stringValue or -intValue, the subclass can do educated casts. If the superclass had to do it, it would have to loop through all the different cases, which would be not very optimal, and makes difficult the addition of a new type of number.

Sorry if you already know all of that and the concept of class cluster, I just want to make the rest clear.

Now, why does BCSequence look like a class cluster? This is mainly because of the method +(BCSequence *)sequenceWithString:(NSString *)sequence (it does not matter whether there is a BCSequenceFactory class hidden there). This method guesses the type of sequence. It returns an object statically typed to the superclass BCSequence. But IN FACT, it is really a BCSequenceDNA, or a BCSequenceProtein, etc... So the user could think it is a BCSequence, but it is actually a instance of a subclass.

At this point, the BCSequence family is not exactly a class cluster like Apple's NSString, NSNumber,... First, the superclass BCSequence is not abstract. It can be instantiated (well, I am not sure about that, but at least, BCSymbolList can, right?). Second, the subclasses are not private. For example, you can explicitely instantiate an instance of BCSequenceDNA with the method -(BCSequenceDNA *)dnaSequenceWithString:

However, there is an important issue with dynamic and static typing, that also has to do with compile vs runtime error. I have seen in the mailing list that you discussed it earlier. As soon as you allow the creation of a BCSequenceDNA object that is statically typed as a BCSequence, the compiler has no way to tell that you are now manipulating a BCSequenceDNA object. If you call '-complement' on it, and you have not defined it in BCSequence, you get a warning, even though at runtime, it will be fine. That object will really be a BCSequenceDNA and will indeed respond to '-complement'. To avoid the compiler warning, you have to declare it at the superclass level. Which also means that it could be called on BCSequenceProtein without a compiler warning. Then you have to handle at runtime a call to '-complement' on a BCSequenceProtein.

I know you have discussed that issue already to some extent, but I am not sure if you have decided on something. One possibility is to have BCSequence superclass accept all the methods of the subclass. Another is to let the user deal with it and have him do some type checking and some casting. Basically, if she wants to use -complement, she has to use -dnaSequenceWithString, or cast the result of -sequenceWithString to a BCSequenceDNA. This is what i refer to in my previous email: the user will use strong typing when she needs it, and will thus be able to rely on compiler warnings. It looks like this is the pattern you have chosen: if the user ever want to use '-complement', she can not use it on an object created with +(BCSequence *)sequenceWithString:(NSString *)sequence (or else has to ignore compiler warning or cast the object to BCSequenceDNA).

Why am I discussing the concept of class cluster? Well, I am not sure! I just want to bring the idea, because that could have been one way to go, and you are half-way to it...

OK, I will stop here. I hope this is not too confusing. And sorry if some of these questions have been already answered before, or if I am missing other aspects...

Merry Christmas!

Charles


--
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021
-- 
Charles Panot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021



More information about the Biococoa-dev mailing list