[Biococoa-dev] BCSequence class cluster

Wed Jan 5 15:12:40 EST 2005

>>About complaint (b)
>>I thought of enforcing immutability as a starting point, as this is 
>>easier on the developer side to deal with immutable objects. Giving 
>>the option of immutability to the user is anyway a good thing, as 
>>it allows a number of optimizations, that could really pay off in a 
>>real application with lots of copying, ref passing,...
>Yes, this is exactly how the mutable variants of NSData, NSString 
>etc are setup as I discovered in the devnote I mentioned above. 
>Indeed, it would be very nice to have a mutable and immutable 
>variant of BCSequence objects.

You will  notice that NSNumber does not have an mutable version. Why? 
One reason is that creating a new instance is not too costly, the 
data is small. Another reason is maybe that the implementation is a 
bit more tricky as the NSNumber resembles our BCSequence, with a 
large number of potential subclasses, and then the question of how to 
implement mutability and immutability.

>>2. Implementing the class cluster
>>------------------------------
>>
>>The class cluster that I implement in the attached project looks 
>>very much like what you have already done. There is a superclass 
>>BCSequence, and then subclasses, BCSequenceDNA, 
>>BCSequenceRNA,...etc... plus a new special subclass 
>>BCSequenceFactory. Now the purpose of a class cluster is that the 
>>user just does everything using the public interface for 
>>BCSequence, and as far as the user is concerned, every object is an 
>>instance of BCSequence. But inside the hood, you actually return 
>>instances of one of the subclasses so that some operations can be 
>>optimized for the particular type of sequence you are dealing with.
>In other words the subclasses are private, only BCSequence.h is public right?

Yes.

>>The problem for the developer of a class cluster is that you know 
>>which subclass to use only once you call one of the init methods, 
>>but you still have to do the 'alloc' before the init. There is no 
>>way BCSequence will know what subclass it should use at the time 
>>'alloc' is called. So the trick is to alloc a temporary instance of 
>>a particular subclass, a 'placeholder' class. Look at the 
>>implementation of 'alloc' in BCSequence.m.
>+ (id)alloc
>{
>	if (self==[BCKSequence class]  // Should this be [BCSequence class]?
>		return [BCKSequencePlaceholder alloc];  // So this 
>would be [BCSequenceFactory alloc]?
>	else
>		return [super alloc];
>}

Arghh, a stupid typo in the most important piece of code!! OK, I 
corrected the code in the link. Download it again...

>That's exactly my thought at the moment, indeed it fits nicely in 
>between the two opposite choices in the subclassing debate and 
>satisfies  most arguments. The only problem is that I don't have a 
>real oversight to see potential problems coming, but that's simply 
>because of my inexperience with programming. Perhaps we just have to 
>take the jump and see where it ends, at least it has proven very 
>effective in the cocoa framework (wow, that's a biased opinion ;-).

Yes, there is a real good foundation in the framework and plenty of 
good ideas of implementation. You/we are probably at the point where 
we foresee all the potential developments and have a better sense of 
what the design can be.

>>They would have some init methods, but when the user uses these 
>>classes and alloc/init an instance, she would get in fact one of 
>>the BCSequence subclasses. The compiler would not know and would 
>>trust the headers to generate warning. For instance, the header for 
>>the BCSequenceProtein placeholder class would not define the 
>>methods 'complement' or 'cutWithRestrictionEnzyme:', and you would 
>>get a compiler warning even though the object would in fact respond 
>>to the methods at runtime (but would have to return some dummy 
>>values). So these headers would really define completely virtual 
>>classes. One of the problem is the names of these placeholder 
>>classes conflict with the names of the BCSequence private 
>>subclasses that are defined in the project I sent. We could rename 
>>the latter to BCSeqDNA/RNA/... for example, and keep the nice full 
>>names 'BCSequenceDNA/RNA/...' for the placeholder public classes.
>Seems feasible, although having separate names for internal vs 
>public representations might be troublesome.

In case it was not clear, and because I am not sure what you 
understood, I want to say again that they have to have different 
names. We cannot keep the same names for the private and public 
classes. It it true that it could be a little confusing for the 
developer, but we would probably almost never use the public classes 
internally; so confusion will be not too bad. Also, using an 
abbreviation like BCSeq for something internal is a good mnemonic to 
remember that these names are really private.

>
>Perhaps you're right, but what I was thinking is to implement a way 
>to better return the reason why something don't work instead of a 
>simple nil. For instance, calling 
>cutInPiecesWithThisRestrictionEnzyme on a DNA would return the 
>pieces, while it would also work on proteins, but return nil right. 
>Of course you could also let the method return an exception, it will 
>then become the developers responsibility to call methods on the 
>right object. The downside is that this might lead to easily to 
>program halts/crashes if the developer doesn't pay attention. But 
>think in terms of NSArray objectAtIndex method, it returns nil if 
>you ask an object out of bounds, AND raises an Exception.
>I'm still wondering a bit how we're going to implement these kind of 
>methods, as we now have to start ALL methods with a test what the 
>sequence type is.

No, there is no test at the beginning of a method. It is simply coded 
in the subclass. For example, BCSequenceProtein could override the 
'complement' method to return an empty sequence. Actually, this is 
not such a great example as 'complement' can easily be taken care of 
by the superclass which would call 'complement' on the BCSymbol 
objects of the symbolArray.
Now I have an additional comment on what to do with strongly typed 
instances, when the user is purposedly using a BCSequenceProtein, has 
a call to 'complement' and ignores the compiler warning and runs the 
program. It would then be nice to have run time error (yeah, this is 
nice!) when calling a method on a strongly typed instance. For this 
we could have an additional flag 'isTyped' and have the private 
BCSeqProtein check the value of the flag in the critical methods, and 
raise an exception if isTyped=YES or call super if =NO.

>>To implement mutable objects in the class cluster could be a bit 
>>tricky, because there are two conflicting subclass organizations 
>>here: mutable/immutable and dna/rna/protein/codon. To get all the 
>>combinations, it seems that we need 8 subclasses!!
>Oops, Koen won't like this, LOL ;-) On the other hand, look at the 
>number of NSNumber subclasses...

See my comment above about NSNumber. They did not bother to implement 
mutability, probably not worth it in the case of NSNumber. Now, they 
would be in trouble if they decided to implement a wrapper for C 
arrays of different types, ie vectors or matrices. You would have all 
the combinations mutable/double, immutable/double, mutable/float, 
immutable/float, mutable/int,...

>>I am not completely sure how to deal with it, or if we should deal 
>>with it or just give up and stick to mutable only. One possibility 
>>is to not have distinct subclasses for mutable/immutable. Instead, 
>>there could be simply a BOOL flag 'isMutable' as one of the 
>>instance variables. The object would then return different results 
>>in key methods such as 'copy' depending on the value of the flag.
>But then we could just as well do the subclasses right?

Yes, that may be true. On the other hand, most of the code could be 
in the superclass and use that flag. In fact, we should start 
thinking about where mutability makes a difference. What methods 
should be implemented. There are not so many: insertSequenceAtRange, 
removeSequenceAtRange, setSequence, appendSequence, addAnnotation(s), 
removeAnnotation(s). These would be defined in another placeholder 
class 'BCMutableSequence' (which would return in facts subclasses of 
the class cluster), which would give compiler warnings if called on 
BCSequence objects. If they are called at runtime on a sequence for 
which isMutable=NO, they would generate a runtime error (so a test 
would be needed at the beginning of each of these methods). It seems 
they might be coded in the superclass. The same is true for 'copy', 
that may not even have to know if the copied instance is mutable. For 
example, it would do [symbolArray copy] which would return the same 
pointer if symbolArray is immutable, or a real copy if symbolArray is 
mutable. Note that 'copy' always returns an immutable instance by 
convention. Then 'mutableCopy' would apply the same tricks. The 
subclasses may have to deal with their own instance variables (they 
don't have any so far), and may have to check [self isMutable].

>Let's first decide if we all like the idea of the class cluster, and 
>then see how to implement it and the naming. Just one thing you 
>might have thought about as well Charles, how do you see the 
>annotations stuff fitting in this scheme? The nice thing is that it 
>applies to all subclasses, but can it still be implemented in the 
>superclass? Perhaps not, as the mutable vs immutable implementation 
>will be quite different. And that's where my major doubt is, as you 
>mentioned you have both a divergency in the direction of mutable vs 
>immutable, as well as in DNA/RNA/Protein. This automatically leads 
>to duplication of the code in one of the two directions I'm afraid...
>There's plenty to discuss ;-)

About annotations, I have not a good grasp of the whole concept, but 
it certainly seems that if the concept of an annotation is 
sufficiently abstract, it could easily go in the superclass, the same 
way many methods can be handled in the superclass thanks to the 
BCSymbol abstraction you guys have designed.
In fact, if annotations are just one NSArray, it is not too costly in 
terms of memory, adding just one instance variable = a few bytes to 
the size of the object, and can be kept to nil if no annotations are 
present.

In conclusion, to discuss the class cluster possibility, it is maybe 
time to come up with a list of:
* methods that could be in the public BCSequence.h header; we should 
not be afraid to have many; they could be dispatched in categories 
for convenience; the doc for BCSequence would be big, but that would 
be quite normal!
* methods that could be in the public BCMutableSequence.h header
* sequence types that could be added in the future

And then see how well that fits with class cluster, and if 
mutable/immutable implementation is feasible.

OK, I'll stop there, hoping teh europeans will get that before going to bed.

Charles

-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050105/18954f98/attachment.html>