[Biococoa-dev] more ramblings

Thu Nov 18 15:27:25 EST 2004

Hi guys,

Reading over the rest of the emails I missed, I think I have made my 
point already, just a few remarks that come up while reading:

>  But should the reader methods be concerned with whether it is a 
> protein or nucleotide sequence? I don't think they should. The 
> introduction of a factory class is a well established design pattern 
> in OOP that deals with these sort of situations. An advantage is that 
> when you ever decide to change the way a sequence is created, or 
> introduce a new type of sequence, you only have to modify the code 
> once in the factory, not in each readXXX method. Or maybe later on we 
> decide to implement a new read method, or introduce a class to obtain 
> a sequence from a database. Maybe the user types in a sequence in an 
> NSTextView and wants to make a BCSequence. Should each of these 
> classes then try to figure out whether it's a protein or nucleotide 
> sequence? If we keep that code in one place (factory or whatever you 
> want to call it) it makes it much easier to maintain.

Again, although certainly a possibility, sometimes there are 
alternatives just as good. For example, the thing that I immediately 
thought after reading:
> An advantage is that when you ever decide to change the way a sequence 
> is created, or introduce a new type of sequence, you only have to 
> modify the code once in the factory, not in each readXXX method.
was "Why not do the type checking/bcsequence subclass creation in one 
method inside the current implementation?"
We don't check the type in each readXXX method either, so one method 
like "determineSequenceType" or "sequenceObjectForFile" would also 
allow you to keep all "factory" methods centralized and easy to change 
right?

> I have added some more examples in this reply, and hopefully showed 
> that this is also a good OOP design.
Again, there are many examples that prove both to be good OOP design 
IMHO.

> I am very guilty of supporting the BCSequence subclasses myself when 
> we just started. But now that BioCocoa is growing, I came to the 
> realization that we may have to shuffle things around to make the code 
> easier to use and maintain.
That's a good thing Koen, it's never smart to keep going without 
reflection, still in this case we can end up with a good mix I hope.

>>  My first instinct would be to take
>> anything in BCFindSequence and work it back in to BCSequence.
>
> Please do so, but leave the BCFindSequence code as an alternative :)
Nope, let's choose!  We should all agree on one way again IMHO, we can 
provide convenience methods, but not two completely separate things 
please.

>> Another way to think about this - let's assume that Apple knows what 
>> they're
>> doing in designing their classes.  The most analogous item in Cocoa's
>> Foundation is NSMutableString.  There is only one utility class that's
>> directly related to strings (NSScanner - maybe two with 
>> NSCharacterSet).
>> Just about all the methods needed for handling the contents of 
>> strings are
>> either in NSMutableString or its superclass.  It's good design.
>>
>
> NSString indeed maintains a list of characters, and also does some 
> basic character manipulation, and substring searching. But it doesn't 
> translate a string to another language!

[myString utf8string]; [myString fileSystemRepresentation]; [myString 
cString]; (no spellcheck) seem nice examples to me (maybe not so 
complicated, but they are representations in different "languages").

>> Right now, you have several very similar methods in BCSequence (and 
>> its
>> subclasses). As I said before, this is usually a situation in OOP when
>> one has to rethink the design, and try to find a way to avoid
>> duplicating code.
> Right, and last time this came up, I mentioned that I had every 
> intention of
> fixing it.  It's not a fundamental class structure problem - it was a
> problem with me trying to put something in place first, and fix it 
> later.  I
> don't know how else to possibly say that this situation is temporary, 
> and
> doesn't say anything informative about the class structure.

Work in progress ;-)

> I'd also like to point out that having 2 methods vs. 1 method with a 
> boolean flag, as
> yours apparently does, doesn't make any argument about class 
> complexity at
> all.  I went back and forth on which to do for a while, and settled on 
> 2.
> If people prefer 1, it can be changed.

As you might have understood (boy I almost sound like Steve Balmer with 
his developers, developers, developers!!), I'm a great fan of 
convenience methods (convenience, convenience, convenience!!). Please 
provide two methods, one detailed, the other simple and convenient.
1 myMethodDoesThis:  withThisAsAnArgument: andThisAsAnArgument etc
2 myMethodDoesThis (with default arguments).
One of the things I absolutely love in the Cocoa frameworks.

> It comes down to the design decision of whether you want to send the
> sequence off somewhere else to get information back on it, or whether 
> you
> want to ask the sequence to tell you something about itself.  I'd say 
> that
> for the most part, for someone trying to use this framework, it's much
> easier to ask the sequence, instead of trying to figure out what
> object/method they need to send the sequence to.  I also don't think 
> that it
> leads to a painful burden on us developers in terms of organization.

I think it all comes down to how to describe the border or guidelines 
of when to choose for internal or external methods. My gut feeling 
says, hardcoded properties, "one liner" calculations, and trivial 
methods can be done internally. Also speed is an issue, if it takes 
time to calculate things it's way nicer to provide a wrapper object 
because that allows to go for threading, asynchronous methods, and 
progress monitoring very elegantly. Things like length, is so easy to 
calculate (a typical one-liner) that it would be ridiculous to have a 
helper calculate that.
Also complex calculations with many lines of code, special conditions, 
many parameters etc should definitely go outside. (maybe the guideline 
for internal methods should be "no arguments in the methodname" ;-) For 
some things I'm tended to let the gut feeling be determined by biology 
(strange huh). Translation needs a complete machinery in the cell, so 
it should here ;-)
I would say properties and representations inside, conversions, 
calculations, and manipulations outside.

> I think the individual symbols are great examples of this approach - 
> they
> are incredibly powerful because, unlike a character, they know things 
> about
> themselves.

They have properties I would say in the light of the above.

> You don't have to dig around to find out which class/method are
> needed to find out what the complement of a base is - the base already 
> knows
> what its complement is.  I'd love to see the same power extended to
> sequences as a whole.

Right so a BCSequence should have a GC% method or MW method or 
something alike for example right? So we add what the favorite thing 
what everyone would love to see (in the superclass if it's a general 
thing (in the case for MW, and not for GC%)):
[mySequence gcPercentage] or [mySequence molecularWeight] (purely 
hypothetical).

But now comes the clue, would the enduser or our framework care that 
the actual method is a convenience one and that there's a 
helper/wrapper object to handle the things needed behind the scenes? I 
wouldn't think so. Would we care? Absolutely!!!

1) it allows to keep our codebase of the sequence object in this case 
clean and lightweight
2) it can centralize code that works on multiple types, all subclasses 
can call the same convenience method (so it can go in the superclass) 
if necessary and guess what, the wrapper knows instantly the type of 
sequence it's working on (simply ask the sender its type). Central code 
is easier to maintain, change and optimize.
3) caching, think of sharedHelper objects, one can keep it (and the 
data it requires to work like enzyme dictionaries) alive if you want to 
do batch conversions/processing!

For users of our framework there's no problem to understand the code if 
we document our methods well and tell when certain methods make use of 
wrappers or not.

So than the final question, when to go for helpers or not? In the end 
we should decide on a per method basis I guess, it depends on how 
complicated things are to generate, how much it is shared by multiple 
sequence types etc... Let's leave that up to further discussions when 
we're actually getting there ;-)

Cheers,
Alex

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
              The Netherlands Cancer Institute
              Department of Tumorbiology (H4)
         Plesmanlaan 121, 1066 CX, Amsterdam
                    Tel:  + 31 20 - 512 2023
                    Fax:  + 31 20 - 512 2029
                    AIM: mekentosj at mac.com
                    E-mail: a.griekspoor at nki.nl
                Web: http://www.mekentosj.com

       Microsoft is not the answer,
       Microsoft is the question,
       NO is the answer

*********************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 9488 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20041118/21b80cc1/attachment.bin>