<!doctype html public "-//W3C//DTD W3 HTML//EN">

<html><head><style type="text/css"><!--

blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }

 --></style><title>Re: [Biococoa-dev] BCSequence class

cluster</title></head><body>

<blockquote type="cite" cite>

<blockquote type="cite" cite>About complaint (b)</blockquote>

<blockquote type="cite" cite>I thought of enforcing immutability as a

starting point, as this is easier on the developer side to deal with

immutable objects. Giving the option of immutability to the user is

anyway a good thing, as it allows a number of optimizations, that

could really pay off in a real application with lots of copying, ref

passing,...</blockquote>

</blockquote>

<blockquote type="cite" cite>Yes, this is exactly how the mutable

variants of NSData, NSString etc are setup as I discovered in the

devnote I mentioned above. Indeed, it would be very nice to have a

mutable and immutable variant of BCSequence objects.</blockquote>

<div><br></div>

<div>You will  notice that NSNumber does not have an mutable

version. Why? One reason is that creating a new instance is not too

costly, the data is small. Another reason is maybe that the

implementation is a bit more tricky as the NSNumber resembles our

BCSequence, with a large number of potential subclasses, and then the

question of how to implement mutability and immutability.</div>

<div><br>

<br>

</div>

<blockquote type="cite" cite>

<blockquote type="cite" cite>2. Implementing the class

cluster</blockquote>

<blockquote type="cite"

cite>------------------------------</blockquote>

<blockquote type="cite" cite><br></blockquote>

<blockquote type="cite" cite>The class cluster that I implement in the

attached project looks very much like what you have already done.

There is a superclass BCSequence, and then subclasses, BCSequenceDNA,

BCSequenceRNA,...etc... plus a new special subclass BCSequenceFactory.

Now the purpose of a class cluster is that the user just does

everything using the public interface for BCSequence, and as far as

the user is concerned, every object is an instance of BCSequence. But

inside the hood, you actually return instances of one of the

subclasses so that some operations can be optimized for the particular

type of sequence you are dealing with.</blockquote>

</blockquote>

<blockquote type="cite" cite>In other words the subclasses are

private, only BCSequence.h is public right?</blockquote>

<div><br></div>

<div>Yes.</div>

<div><br>

<br>

</div>

<blockquote type="cite" cite>

<blockquote type="cite" cite>The problem for the developer of a class

cluster is that you know which subclass to use only once you call one

of the init methods, but you still have to do the 'alloc' before the

init. There is no way BCSequence will know what subclass it should use

at the time 'alloc' is called. So the trick is to alloc a temporary

instance of a particular subclass, a 'placeholder' class. Look at the

implementation of 'alloc' in BCSequence.m.</blockquote>

</blockquote>

<blockquote type="cite" cite><tt>+ (<font

color="#760F50">id</font>)alloc</tt></blockquote>

<blockquote type="cite" cite><tt>{</tt></blockquote>

<blockquote type="cite"

cite><tt><x-tab>       

</x-tab><font color="#760F50">if</font> (<font

color="#760F50">self</font>==[BCKSequence class]  // Should this

be [BCSequence class]?</tt></blockquote>

<blockquote type="cite"

cite><tt><x-tab>       

</x-tab><x-tab>       

</x-tab><font color="#760F50">return</font> [BCKSequencePlaceholder

alloc];  // So this would be [BCSequenceFactory

alloc]?</tt></blockquote>

<blockquote type="cite"

cite><tt><x-tab>       

</x-tab><font color="#760F50">else</font></tt></blockquote>

<blockquote type="cite"

cite><tt><x-tab>       

</x-tab><x-tab>       

</x-tab><font color="#760F50">return</font> [<font

color="#760F50">super</font> alloc];</tt></blockquote>

<blockquote type="cite" cite><tt>}</tt></blockquote>

<div><br></div>

<div>Arghh, a stupid typo in the most important piece of code!! OK, I

corrected the code in the link. Download it again...</div>

<div><br>

<br>

</div>

<div><br></div>

<div><br></div>

<blockquote type="cite" cite>That's exactly my thought at the moment,

indeed it fits nicely in between the two opposite choices in the

subclassing debate and satisfies  most arguments. The only

problem is that I don't have a real oversight to see potential

problems coming, but that's simply because of my inexperience with

programming. Perhaps we just have to take the jump and see where it

ends, at least it has proven very effective in the cocoa framework

(wow, that's a biased opinion ;-).</blockquote>

<div><br></div>

<div>Yes, there is a real good foundation in the framework and plenty

of good ideas of implementation. You/we are probably at the point

where we foresee all the potential developments and have a better

sense of what the design can be.</div>

<div><br></div>

<div><br></div>

<div><br></div>

<div><br>

<br>

</div>

<blockquote type="cite" cite>

<blockquote type="cite" cite>They would have some init methods, but

when the user uses these classes and alloc/init an instance, she would

get in fact one of the BCSequence subclasses. The compiler would not

know and would trust the headers to generate warning. For instance,

the header for the BCSequenceProtein placeholder class would not

define the methods 'complement' or 'cutWithRestrictionEnzyme:', and

you would get a compiler warning even though the object would in fact

respond to the methods at runtime (but would have to return some dummy

values). So these headers would really define completely virtual

classes. One of the problem is the names of these placeholder classes

conflict with the names of the BCSequence private subclasses that are

defined in the project I sent. We could rename the latter to

BCSeqDNA/RNA/... for example, and keep the nice full names

'BCSequenceDNA/RNA/...' for the placeholder public

classes.</blockquote>

</blockquote>

<blockquote type="cite" cite>Seems feasible, although having separate

names for internal vs public representations might be

troublesome.</blockquote>

<div><br></div>

<div>In case it was not clear, and because I am not sure what you

understood, I want to say again that they have to have different

names. We cannot keep the same names for the private and public

classes. It it true that it could be a little confusing for the

developer, but we would probably almost never use the public classes

internally; so confusion will be not too bad. Also, using an

abbreviation like BCSeq for something internal is a good mnemonic to

remember that these names are really private.</div>

<div><br></div>

<div><br></div>

<blockquote type="cite" cite><br></blockquote>

<blockquote type="cite" cite>Perhaps you're right, but what I was

thinking is to implement a way to better return the reason why

something don't work instead of a simple nil. For instance, calling

cutInPiecesWithThisRestrictionEnzyme on a DNA would return the pieces,

while it would also work on proteins, but return nil right. Of course

you could also let the method return an exception, it will then become

the developers responsibility to call methods on the right object. The

downside is that this might lead to easily to program halts/crashes if

the developer doesn't pay attention. But think in terms of NSArray

objectAtIndex method, it returns nil if you ask an object out of

bounds, AND raises an Exception.</blockquote>

<blockquote type="cite" cite>I'm still wondering a bit how we're going

to implement these kind of methods, as we now have to start ALL

methods with a test what the sequence type is.</blockquote>

<div><br></div>

<div>No, there is no test at the beginning of a method. It is simply

coded in the subclass. For example, BCSequenceProtein could override

the 'complement' method to return an empty sequence. Actually, this is

not such a great example as 'complement' can easily be taken care of

by the superclass which would call 'complement' on the BCSymbol

objects of the symbolArray.</div>

<div>Now I have an additional comment on what to do with strongly

typed instances, when the user is purposedly using a

BCSequenceProtein, has a call to 'complement' and ignores the compiler

warning and runs the program. It would then be nice to have run time

error (yeah, this is nice!) when calling a method on a strongly typed

instance. For this we could have an additional flag 'isTyped' and have

the private BCSeqProtein check the value of the flag in the critical

methods, and raise an exception if isTyped=YES or call super if

=NO.</div>

<div><br></div>

<div><br></div>

<blockquote type="cite" cite>

<blockquote type="cite" cite>To implement mutable objects in the class

cluster could be a bit tricky, because there are two conflicting

subclass organizations here: mutable/immutable and

dna/rna/protein/codon. To get all the combinations, it seems that we

need 8 subclasses!!</blockquote>

</blockquote>

<blockquote type="cite" cite>Oops, Koen won't like this, LOL ;-) On

the other hand, look at the number of NSNumber

subclasses...</blockquote>

<div><br></div>

<div>See my comment above about NSNumber. They did not bother to

implement mutability, probably not worth it in the case of NSNumber.

Now, they would be in trouble if they decided to implement a wrapper

for C arrays of different types, ie vectors or matrices. You would

have all the combinations mutable/double, immutable/double,

mutable/float, immutable/float, mutable/int,...</div>

<div><br>

<br>

</div>

<blockquote type="cite" cite>

<blockquote type="cite" cite>I am not completely sure how to deal with

it, or if we should deal with it or just give up and stick to mutable

only. One possibility is to not have distinct subclasses for

mutable/immutable. Instead, there could be simply a BOOL flag

'isMutable' as one of the instance variables. The object would then

return different results in key methods such as 'copy' depending on

the value of the flag.</blockquote>

</blockquote>

<blockquote type="cite" cite>But then we could just as well do the

subclasses right?</blockquote>

<div><br></div>

<div>Yes, that may be true. On the other hand, most of the code could

be in the superclass and use that flag. In fact, we should start

thinking about where mutability makes a difference. What methods

should be implemented. There are not so many: insertSequenceAtRange,

removeSequenceAtRange, setSequence, appendSequence, addAnnotation(s),

removeAnnotation(s). These would be defined in another placeholder

class 'BCMutableSequence' (which would return in facts subclasses of

the class cluster), which would give compiler warnings if called on

BCSequence objects. If they are called at runtime on a sequence for

which isMutable=NO, they would generate a runtime error (so a test

would be needed at the beginning of each of these methods). It seems

they might be coded in the superclass. The same is true for 'copy',

that may not even have to know if the copied instance is mutable. For

example, it would do [symbolArray copy] which would return the same

pointer if symbolArray is immutable, or a real copy if symbolArray is

mutable. Note that 'copy' always returns an immutable instance by

convention. Then 'mutableCopy' would apply the same tricks. The

subclasses may have to deal with their own instance variables (they

don't have any so far), and may have to check [self isMutable].</div>

<div><br>

<br>

</div>

<div><br></div>

<blockquote type="cite" cite>Let's first decide if we all like the

idea of the class cluster, and then see how to implement it and the

naming. Just one thing you might have thought about as well Charles,

how do you see the annotations stuff fitting in this scheme? The nice

thing is that it applies to all subclasses, but can it still be

implemented in the superclass? Perhaps not, as the mutable vs

immutable implementation will be quite different. And that's where my

major doubt is, as you mentioned you have both a divergency in the

direction of mutable vs immutable, as well as in DNA/RNA/Protein. This

automatically leads to duplication of the code in one of the two

directions I'm afraid...</blockquote>

<blockquote type="cite" cite>There's plenty to discuss

;-)</blockquote>

<div><br></div>

<div>About annotations, I have not a good grasp of the whole concept,

but it certainly seems that if the concept of an annotation is

sufficiently abstract, it could easily go in the superclass, the same

way many methods can be handled in the superclass thanks to the

BCSymbol abstraction you guys have designed.</div>

<div>In fact, if annotations are just one NSArray, it is not too

costly in terms of memory, adding just one instance variable = a few

bytes to the size of the object, and can be kept to nil if no

annotations are present.</div>

<div><br></div>

<div>In conclusion, to discuss the class cluster possibility, it is

maybe time to come up with a list of:</div>

<div>* methods that could be in the public BCSequence.h header; we

should not be afraid to have many; they could be dispatched in

categories for convenience; the doc for BCSequence would be big, but

that would be quite normal!</div>

<div>* methods that could be in the public BCMutableSequence.h

header</div>

<div>* sequence types that could be added in the future</div>

<div><br></div>

<div>And then see how well that fits with class cluster, and if

mutable/immutable implementation is feasible.</div>

<div><br></div>

<div>OK, I'll stop there, hoping teh europeans will get that before

going to bed.</div>

<div><br></div>

<div>Charles</div>

<div><br></div>

<x-sigsep><pre>-- 

</pre></x-sigsep>

<div>Charles Parnot<br>

charles.parnot@stanford.edu<br>

<br>

Help science go fast forward:<br>

http://cmgm.stanford.edu/~cparnot/xgrid-stanford/<br>

<br>

Room  B157 in Beckman Center<br>

279, Campus Drive<br>

Stanford University<br>

Stanford, CA 94305 (USA)<br>

<br>

Tel +1 650 725 7754<br>

Fax +1 650 725 8021</div>

</body>

</html>