[Biococoa-dev] BCSequence class cluster

Alexander Griekspoor mek at mekentosj.com
Wed Jan 5 15:35:05 EST 2005


> OK, I'll stop there, hoping teh europeans will get that before going 
> to bed.
Yep, still got it. Don't have the time to reply to all of it yet, but 
one comment that came to mind immediately:

> You will  notice that NSNumber does not have an mutable version. Why? 
> One reason is that creating a new instance is not too costly, the data 
> is small. Another reason is maybe that the implementation is a bit 
> more tricky as the NSNumber resembles our BCSequence, with a large 
> number of potential subclasses, and then the question of how to 
> implement mutability and immutability.
You're right, but in our case we're not talking about small subclasses 
with only one variable (int, float, bool etc), the BCSequences are way 
to big to return a new instance every time you call a method on it.

Therefore, I would propose to start with the mutable version, then 
later we can always generate the immutable versions in addition for 
optimization purposes.

About the warnings, I'm not that much of a fan to add flags like 
isMutable or isTyped, in the first case I would rather have a real 
immutable subclass and in the second can't we just generate the runtime 
errors, in general those will surface in 99% of the cases in the 
development cycle and the developer can take countermeasures to prevent 
the end-user from doing stupid things.

Just my 2 cents,
Alex


Op 5-jan-05 om 21:12 heeft Charles PARNOT het volgende geschreven:

> About complaint (b)
> I thought of enforcing immutability as a starting point, as this is 
> easier on the developer side to deal with immutable objects. Giving 
> the option of immutability to the user is anyway a good thing, as it 
> allows a number of optimizations, that could really pay off in a real 
> application with lots of copying, ref passing,...
> Yes, this is exactly how the mutable variants of NSData, NSString etc 
> are setup as I discovered in the devnote I mentioned above. Indeed, it 
> would be very nice to have a mutable and immutable variant of 
> BCSequence objects.
>
>
>
>
>
> 2. Implementing the class cluster
> ------------------------------
>
> The class cluster that I implement in the attached project looks very 
> much like what you have already done. There is a superclass 
> BCSequence, and then subclasses, BCSequenceDNA, 
> BCSequenceRNA,...etc... plus a new special subclass BCSequenceFactory. 
> Now the purpose of a class cluster is that the user just does 
> everything using the public interface for BCSequence, and as far as 
> the user is concerned, every object is an instance of BCSequence. But 
> inside the hood, you actually return instances of one of the 
> subclasses so that some operations can be optimized for the particular 
> type of sequence you are dealing with.
> In other words the subclasses are private, only BCSequence.h is public 
> right?
>
> Yes.
>
>
>
> The problem for the developer of a class cluster is that you know 
> which subclass to use only once you call one of the init methods, but 
> you still have to do the 'alloc' before the init. There is no way 
> BCSequence will know what subclass it should use at the time 'alloc' 
> is called. So the trick is to alloc a temporary instance of a 
> particular subclass, a 'placeholder' class. Look at the implementation 
> of 'alloc' in BCSequence.m.
> + (id)alloc
> {
>         if (self==[BCKSequence class]  // Should this be [BCSequence 
> class]?
>                 return [BCKSequencePlaceholder alloc];  // So this 
> would be [BCSequenceFactory alloc]?
>         else
>                 return [super alloc];
> }
>
> Arghh, a stupid typo in the most important piece of code!! OK, I 
> corrected the code in the link. Download it again...
>
>
>
>
>
> That's exactly my thought at the moment, indeed it fits nicely in 
> between the two opposite choices in the subclassing debate and 
> satisfies  most arguments. The only problem is that I don't have a 
> real oversight to see potential problems coming, but that's simply 
> because of my inexperience with programming. Perhaps we just have to 
> take the jump and see where it ends, at least it has proven very 
> effective in the cocoa framework (wow, that's a biased opinion ;-).
>
> Yes, there is a real good foundation in the framework and plenty of 
> good ideas of implementation. You/we are probably at the point where 
> we foresee all the potential developments and have a better sense of 
> what the design can be.
>
>
>
>
>
>
> They would have some init methods, but when the user uses these 
> classes and alloc/init an instance, she would get in fact one of the 
> BCSequence subclasses. The compiler would not know and would trust the 
> headers to generate warning. For instance, the header for the 
> BCSequenceProtein placeholder class would not define the methods 
> 'complement' or 'cutWithRestrictionEnzyme:', and you would get a 
> compiler warning even though the object would in fact respond to the 
> methods at runtime (but would have to return some dummy values). So 
> these headers would really define completely virtual classes. One of 
> the problem is the names of these placeholder classes conflict with 
> the names of the BCSequence private subclasses that are defined in the 
> project I sent. We could rename the latter to BCSeqDNA/RNA/... for 
> example, and keep the nice full names 'BCSequenceDNA/RNA/...' for the 
> placeholder public classes.
> Seems feasible, although having separate names for internal vs public 
> representations might be troublesome.
>
> In case it was not clear, and because I am not sure what you 
> understood, I want to say again that they have to have different 
> names. We cannot keep the same names for the private and public 
> classes. It it true that it could be a little confusing for the 
> developer, but we would probably almost never use the public classes 
> internally; so confusion will be not too bad. Also, using an 
> abbreviation like BCSeq for something internal is a good mnemonic to 
> remember that these names are really private.
>
>
>
> Perhaps you're right, but what I was thinking is to implement a way to 
> better return the reason why something don't work instead of a simple 
> nil. For instance, calling cutInPiecesWithThisRestrictionEnzyme on a 
> DNA would return the pieces, while it would also work on proteins, but 
> return nil right. Of course you could also let the method return an 
> exception, it will then become the developers responsibility to call 
> methods on the right object. The downside is that this might lead to 
> easily to program halts/crashes if the developer doesn't pay 
> attention. But think in terms of NSArray objectAtIndex method, it 
> returns nil if you ask an object out of bounds, AND raises an 
> Exception.
> I'm still wondering a bit how we're going to implement these kind of 
> methods, as we now have to start ALL methods with a test what the 
> sequence type is.
>
> No, there is no test at the beginning of a method. It is simply coded 
> in the subclass. For example, BCSequenceProtein could override the 
> 'complement' method to return an empty sequence. Actually, this is not 
> such a great example as 'complement' can easily be taken care of by 
> the superclass which would call 'complement' on the BCSymbol objects 
> of the symbolArray.
> Now I have an additional comment on what to do with strongly typed 
> instances, when the user is purposedly using a BCSequenceProtein, has 
> a call to 'complement' and ignores the compiler warning and runs the 
> program. It would then be nice to have run time error (yeah, this is 
> nice!) when calling a method on a strongly typed instance. For this we 
> could have an additional flag 'isTyped' and have the private 
> BCSeqProtein check the value of the flag in the critical methods, and 
> raise an exception if isTyped=YES or call super if =NO.
>
>
> To implement mutable objects in the class cluster could be a bit 
> tricky, because there are two conflicting subclass organizations here: 
> mutable/immutable and dna/rna/protein/codon. To get all the 
> combinations, it seems that we need 8 subclasses!!
> Oops, Koen won't like this, LOL ;-) On the other hand, look at the 
> number of NSNumber subclasses...
>
> See my comment above about NSNumber. They did not bother to implement 
> mutability, probably not worth it in the case of NSNumber. Now, they 
> would be in trouble if they decided to implement a wrapper for C 
> arrays of different types, ie vectors or matrices. You would have all 
> the combinations mutable/double, immutable/double, mutable/float, 
> immutable/float, mutable/int,...
>
>
>
> I am not completely sure how to deal with it, or if we should deal 
> with it or just give up and stick to mutable only. One possibility is 
> to not have distinct subclasses for mutable/immutable. Instead, there 
> could be simply a BOOL flag 'isMutable' as one of the instance 
> variables. The object would then return different results in key 
> methods such as 'copy' depending on the value of the flag.
> But then we could just as well do the subclasses right?
>
> Yes, that may be true. On the other hand, most of the code could be in 
> the superclass and use that flag. In fact, we should start thinking 
> about where mutability makes a difference. What methods should be 
> implemented. There are not so many: insertSequenceAtRange, 
> removeSequenceAtRange, setSequence, appendSequence, addAnnotation(s), 
> removeAnnotation(s). These would be defined in another placeholder 
> class 'BCMutableSequence' (which would return in facts subclasses of 
> the class cluster), which would give compiler warnings if called on 
> BCSequence objects. If they are called at runtime on a sequence for 
> which isMutable=NO, they would generate a runtime error (so a test 
> would be needed at the beginning of each of these methods). It seems 
> they might be coded in the superclass. The same is true for 'copy', 
> that may not even have to know if the copied instance is mutable. For 
> example, it would do [symbolArray copy] which would return the same 
> pointer if symbolArray is immutable, or a real copy if symbolArray is 
> mutable. Note that 'copy' always returns an immutable instance by 
> convention. Then 'mutableCopy' would apply the same tricks. The 
> subclasses may have to deal with their own instance variables (they 
> don't have any so far), and may have to check [self isMutable].
>
>
>
>
> Let's first decide if we all like the idea of the class cluster, and 
> then see how to implement it and the naming. Just one thing you might 
> have thought about as well Charles, how do you see the annotations 
> stuff fitting in this scheme? The nice thing is that it applies to all 
> subclasses, but can it still be implemented in the superclass? Perhaps 
> not, as the mutable vs immutable implementation will be quite 
> different. And that's where my major doubt is, as you mentioned you 
> have both a divergency in the direction of mutable vs immutable, as 
> well as in DNA/RNA/Protein. This automatically leads to duplication of 
> the code in one of the two directions I'm afraid...
> There's plenty to discuss ;-)
>
> About annotations, I have not a good grasp of the whole concept, but 
> it certainly seems that if the concept of an annotation is 
> sufficiently abstract, it could easily go in the superclass, the same 
> way many methods can be handled in the superclass thanks to the 
> BCSymbol abstraction you guys have designed.
> In fact, if annotations are just one NSArray, it is not too costly in 
> terms of memory, adding just one instance variable = a few bytes to 
> the size of the object, and can be kept to nil if no annotations are 
> present.
>
> In conclusion, to discuss the class cluster possibility, it is maybe 
> time to come up with a list of:
> * methods that could be in the public BCSequence.h header; we should 
> not be afraid to have many; they could be dispatched in categories for 
> convenience; the doc for BCSequence would be big, but that would be 
> quite normal!
> * methods that could be in the public BCMutableSequence.h header
> * sequence types that could be added in the future
>
> And then see how well that fits with class cluster, and if 
> mutable/immutable implementation is feasible.
>
>
*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                     AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.

*********************************************************

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 13027 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050105/e517d44f/attachment.bin>


More information about the Biococoa-dev mailing list