[Biococoa-dev] (no subject)

Charles PARNOT charles.parnot at stanford.edu
Thu Jan 6 01:35:48 EST 2005


>I had a better look at your code, and see that you have actually 
>most methods in BCSequence. Although I think most of them can 
>actually go in the superclass of BCSequence, BCSymbolList.
>
>so the class cluster should like:
>
>BCSymbolList -> BCSequence -> BCSequenceDNA
>						-> BCSequenceRNA
>						-> BCSequenceProtein
>						-> BCSequenceCodon
>
>BCSymbolList is supposed to a 'barebone' sequence class, BCSequence 
>has additional annotations.

Yes, I understand that. I discussed it in the initial super long 
email (part 7, I believe?!? Even I have some trouble remembering it 
all...). The reason why I picked BCSequence for the superclass name 
is because this would become the only public class, and I thought the 
name BCSequence would be a  better public name than BCSymbolList. The 
problem of course is that it is different from the current 
implementation, which had also good reason to use these names. So I 
am actually proposing the names:

old name                   new name
--------                  ---------
		BCSequence (no instance variable, only public header)
BSSymbolList           BCSymbolList
BCSequence              BCSeq
BCSequenceDNA	BCSeqDNA

with an inheritance tree parallel to the existing one. An alternative 
is to not have an 'empty' class at the top and go with the following:

old name                   new name
--------                  ---------
BSSymbolList           BCSequence
BCSequence              BCSeq
BCSequenceDNA	BCSeqDNA



We have to talk more about the role of BCSymbolList in the context of 
a class cluster, though. When creating an instance of a subclass of 
the class cluster, when should we choose BCSymbolList? In the case 
when there is no annotation? In this case, we have to make sure that 
BCSymbolList will respond to all the messages. Once an instance is 
created, we cannot decide to suddenly make it one of the typed 
subclass just to be able to handle a message not implemented by 
BCSymbolList.

We could also decide to have BCSymbolList and BCSequence both publics 
in the existing code. There might be some good reson for that... 
speaking of which ... in fact, I have a question on how the current 
implementation is supposed to work. It would be up to the user to 
decide wether to use BCSymbolList or BCSequence? And this choice 
would be made depending on what the user wants to later do with that 
object/sequence? I know I already asked some questions about it, but 
could you develop a bit more the reasons and the usage. Thanks:-)

>
>>  I know you already had the discussion with Alex. No matter what 
>>you do, if a type of sequence has to be treated separately, you 
>>have to write two different versions of a particular piece of code. 
>>It is actually easier to separate the two cases in two separate 
>>methods for two different classes in two different files. It still 
>>think I see your point. The only problem with subclass is that it 
>>is quite easy not to realize that you are duplicating code. It is 
>>more apparent when you have a series of if statement in front of 
>>your eyes, all with the same contents. You can also more easily 
>>spot the common stuff. Having subclasses does not force you to 
>>duplicate code, it just tends to happen more frequently.
>
>If the two methods are very different, then indeed it makes sense to 
>have them in two different subclasses. My point before was that the 
>methods in the various subclasses were almost identical, so there 
>was no use in duplicating them.

Sorry for being so 'dogmatic' in the previous email. It is now clear 
we have a very similar view of the whole thing! We all want to have a 
minimal number of methods in the subclasses. In a few cases, 
optimized versions may still be possible with this design.



>BTW, the reason that BioJava uses immutable sequences is: " It is 
>worth noting that many BioJava implementations of Sequence and 
>SymbolList do not allow edit operations as this may invalidate 
>underlying Features or Annotations." That's indeed something to keep 
>in the back of our minds.

Clearly, managing annotations on a mutable sequence is a problem. 
Which also occurs when you cut a piece of a sequence and return a new 
instance. I can clearly see that a separate class would be in charge 
of the job. This would be the job of one the BCTool, right?


So, now the 1 million dollars question: you did not tell yet what you 
think of the whole class cluster design and if you like the idea...;-)
(I have to say the more I discuss it, the more I like it, just for 
the simplicity and the level of abstraction it would give to the 
users of the framework...)

Charles

Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021


-- 
Charles Parnot
charles.parnot at stanford.edu

Help science go fast forward:
http://cmgm.stanford.edu/~cparnot/xgrid-stanford/

Room  B157 in Beckman Center
279, Campus Drive
Stanford University
Stanford, CA 94305 (USA)

Tel +1 650 725 7754
Fax +1 650 725 8021
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050105/103ba779/attachment.html>


More information about the Biococoa-dev mailing list