[Biococoa-dev] Sequence factory

Mon Dec 27 03:24:37 EST 2004

..continuing here:

Op 27-dec-04 om 6:32 heeft Charles PARNOT het volgende geschreven:

>> On Dec 25, 2004, at 6:56 PM, Charles PARNOT wrote:
>>
>>> Or the cached stuff could stay inside the class implementation, 
>>> using the equivalent of 'class instance variables' that can be 
>>> created with static variables private to the implementation file. 
>>> Actually, a relevant case is that of enzymes. If the user tries to 
>>> create an enzyme that has already been created, the factory method 
>>> (a class method such as '+(BCEnzyme *)EcoRI') or even the 'init' 
>>> method (an instance method like '-(id)initWithName:(NSString 
>>> *)name') would return the cached BCEnzyme instance that has already 
>>> been created. The the BCEnzyme.m implementation file would have a 
>>> static NSDictionary with the current instances already created.
>>
>> I guess this is what we already do for BCNucleotide and BCAminoAcid. 
>> Only in those cases all possible instances are created at once. For 
>> enzymes though, I would keep all info in a plist, and not hard-code 
>> any name in BioCocoa. Just because there are much more enzymes than 
>> nucleotides, and amino acids.
>
> Yes, BCNucleotide is a much better example... I am not familiar enough 
> with the framework, I should be ashamed of myself... and I am a little 
> bit! At least, it is true that you can cache stuff outside of 
> instances, at the level of a class, which was my point... Pfiou!

No worries! Indeed the enzyme stuff was just an example, but this goes 
back to a discussion we once had about doing digests etc. If you say 
"cut my plasmid with all enzymes available and give me the fragments" 
you certainly want to instantiate them all at once without having to 
call "give me ecori, give me hindiii etc). But let's first focus on the 
sequences before destroying them with enzymes ;-)

>>> Of course, this is a way to go and keep the existing pattern. It is 
>>> not exactly the best of the two worlds, though, because now you have 
>>> some code dependency. Each of the BCSequence factory methods have to 
>>> have a counterpart in BCSequenceFactory. If you change the name of 
>>> one BCSequenceFactory method, you have to change the code in 
>>> BCSequence. Ah, ah! ;-)
>>
>> Why? No need to change the name in BCSequence, just call the existing 
>> method in BCSequence.
>
> What I meant is you have one method A in BCSequenceFactory that is 
> being called inside a method B in BCSequence. So if you change the 
> name of method A, you have to edit the code in method B. Or maybe more 
> relevant: if you add a method in BCSequenceFactory, you need to add 
> one in BCSequence too, e.g. if you add -(BCSequenceET 
> *)extraterrestrialSequenceWithString in BCSequenceFactory and you want 
> the factory method in BCSequence, you have to also write it (and it 
> will call the method in BCSequenceFactory). The bottom line: more 
> code!

That's certainly through, I'll rephrase that then to: "It's the best 
way to go at the moment" ;-)

> I just want to conclude about BCSequenceFactory. Sorry I have been 
> pushing this so far. I am OK using it if you feel it is safer, I just 
> want to make sure I am not missing something more subtle about it. 
> Thanks Alex and Koen for taking the time to answer my questions :-)

Absolutely no problem, in fact you have discovered by now that this 
still is a topic of discussion...

>>> Do you mean replacing BCSequenceFactory with BCSymbolListFactory, or 
>>> do you mean having two separate classes? Having two separate classes 
>>> seems a bit too much, no?
>>
>> I meant having two separate classes. We recently introduced the 
>> BCSymbolList as a class that only holds an NSArray of BCSymbols, no 
>> other info such as name and features. And it has an identifier for 
>> the sequence-type. The BCSequence class used to be like this, but it 
>> was changed to a subclass of BCSymbolList allowing it to have 
>> features, etc. We could have kept the original BCSequence, and create 
>> a new class BCAnnotatedSequence, but we found that would result in 
>> too long names, such as BCAnnotatedDNASequence. Therefore we 
>> introduced the intermediate class BCSymbolList. Actualy a symbol list 
>> class can be very handy when doing calculations and manipulations of 
>> the sequence itself, without all the other info.
>>
>>> Having all the members of a class tree created in the same entity 
>>> seems more appropriate to me.
>>
>> What do you mean by this?
>
> I know and understand about BCSymbolList/BCSequence. What I call the 
> class tree is the whole family 
> BCSymbolList-->BCSequence-->BCSequenceDNA/Protein,... And I was just 
> saying that maybe one factory for all of them is enough.

I very much like this idea, in fact I was about to comment to Koen's 
orginal question that I felt that we might overdue the factory thing. I 
know it almost has become a necessity because of the rather complicated 
architecture we had to come up with regarding sequences, but we should 
do it if not absolutely necessary. The idea to have one cluster for the 
complete class tree might actually be nice way to limit the rapid 
increase of factories. I think the BioJava framework is a nice (depends 
on how you see it) example where you need a factory for almost all 
things you do, something I don't really like and also quite non-cocoa.

A nice story here perhaps, I was discussing how to implement alignments 
with Serge Cohen (yes, there's something in the works but quite in a 
premature stage though), when at some point we came to the number of 
basic classes we needed. And he said two, one for alignments and one 
for contigs. So I asked, where's the object that manages all this, and 
which does the actual alignment, "the alignment factory"? He answered, 
you don't need that, simply have a class method that does the alignment 
which returns the alignment as a class instance. It struck me how 
simple it indeed should be and how much I had been going to think in 
terms of factories and controllers. Which leads me to ask Koen to 
convince me why we do need a BCSymbolList factory and can't do with 
simple class methods?

> BTW, one thing is not clear to me: is BCSymbolList an abstract class 
> (like BCSequence apparrently was) and just used to separate code, or 
> is it going to be instantiable? In my previous emails, I was a little 
> confused about it (and probably confusing).

The latter is the plan I believe.

>>> To avoid having factory methods spread out in the superclass and the 
>>> subclasses, you could keep them all in the superclass (they could 
>>> still return instances of the subclasses). Actually, maybe this 
>>> would be a bit extreme, and that could confuse the user of the 
>>> framework, but it is something to think about. That would actually 
>>> be one step closer to a 'class cluster' pattern... (read below)
>>
>> The idea of using a class cluster pattern is intriguing, and 
>> definitely worth more thoughts. Thanks for bringing it up, Charles.  
>> To clarify it more, could you post some code snippets here using real 
>> BioCocoa examples?

Yep, copy that!

> Wow, now I need to write some real code, and not just be 
> super-theoretical without  having to think about the real world? I am 
> in trouble ;-)
Sorry for introducing you to the project Charles, actually I'm not so 
sorry at all ;-)
Cheers,
Alex

*********************************************************
                       ** Alexander Griekspoor **
*********************************************************
                 The Netherlands Cancer Institute
                 Department of Tumorbiology (H4)
           Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                    AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

           LabAssistant - Get your life organized!
           http://www.mekentosj.com/labassistant

*********************************************************