[Biococoa-dev] Subclassing BCSequence really needed?

Koen van der Drift kvddrift at earthlink.net
Sat Oct 23 18:10:51 EDT 2004


On Oct 23, 2004, at 11:37 AM, John Timmer wrote:
>
> If what's bothering you so much are the two range methods, I can easily
> change the names and have them more explicitly parallel the superclass 
> -
> have the normal method just be the super's, and change the one that 
> handles
> ambiguity changed to _loose (right now, the range method overrides the
> super's to allow ambiguity, and the super's method is called by a 
> _strict
> method.  I'm sure that makes no sense, but I think I know what's 
> bothering
> Koen there) .  The key thing is that they're not as good as they can 
> be, but
> that doesn't mean there's a need to throw the whole structure out.


As far as I know there aren't any other examples in BioCocoa right now, 
which is why I used the subrange code as an example. But there might be 
similar situations in the future while we expand BioCocoa. I now do 
remember it's on your todo list to merge this code for all sequences 
(including proteins I guess). In no means I was trying to throw the 
whole structure out, I was thinking aloud how we could improve the 
structure. We both agree that there is a need for improvement, only 
still differ on how. It might well be a good idea to implement both 
solutions, so there's more than one way to do it. I will look into 
that.

>
> There's two other reasons I favor this structure:  What you tend to do 
> with
> DNA sequences and what you tend to do with protein sequences are 
> typically
> very different - I don't see why they all the differences should be 
> forced
> to reside in the same class.

They shouldn't be, although I fail to understand why getting a 
subsequence is different for a DNA or protein. But that might be 
because of my limited knowledge of the biology.


>  The second is the thing that Alex always
> champions (and I sometimes argue against ;) - we should try to 
> represent
> every biological concept with an object, be it a codon, a sequence 
> fragment,
> whatever.  To make things more useful, it's better to force the code to
> reflect the biology, rather than distort the biology in order to have 
> it
> conform to how we want to code.

That's definitely a very good point. My argument is that we can have 
just a general sequence object, with additional information that 
identifies the type of sequence. I don't think that's distorting 
biology, and I am afraid that's where we don't agree on. The same 
approach is used in other bio frameworks, such as biopython, bioperl, 
and biojava, so it's been proven useful (if that means anything). 
Anyway the popular vote seems to be winning, so I will rest my case.

>
>
>> An addiitonal advantage of this approach is that when we think of a 
>> new
>> cool feature to add to BioCocoa, we only have to put it in a wrapper
>> class, instead of a slightly modified version in each of the 
>> subclasses
>> of BCSequence. This is much easier to maintain, and also less prone to
>> errors.
> Well, if it works for all sequences, it belongs in the superclass.  If 
> it
> doesn't it can go in the subclass.  I'm not sure what the wrapper adds 
> here
> or how it eliminates complexity.  If all it's doing is managing an 
> array of
> symbols, then it doesn't seem to be doing much more than NSArray is.

The idea of using wrappers is to keep the superclass lightweight, and 
use small modules to add functionality. If we keep adding functionality 
to the superclass things become difficult to maintain quickly (and I 
speak from experience ;) Maybe it adds a level of complexity, but each 
task is well defined in one class.

>
>
> While we're engaging in a bit of redesigning suggestions, I have 
> problems
> with the BCSequence variables startPosition, endPosition, and range.  
> They
> don't reflect anything about the underlying sequence itself.  My
> understanding is that they are there for selection/display type usage.
> Given that, if we're trying to follow a MVC design, they belong in a
> sequence controller object rather than in the sequence itself.

All the sequence manipulation we do now is zero-based, which is how 
NSArray works. We need to make a decision if we want to keep it that 
way, and only move to a 1-based index when interacting with a GUI. In 
that case startPosition, endPosition, and range indeed could go into 
some controller class. Or we make all our sequences 1-based (maybe by 
adding a dummy symbol at index 0).




- Koen.




More information about the Biococoa-dev mailing list