[Biococoa-dev] more ramblings

Tue Nov 16 22:24:31 EST 2004

Once again, I have to say I think this is a really bad idea.  Let me count
the ways...

For starters, nucleotide and protein sequences have some things in common,
but in general, they're very different.  They have different information
content.  You do different things with them.  Why try to squish them
together?  Treating them as the same object reduces the object's information
content without gaining any clear benefit.

One of the whole ideas of object oriented programming is to group the data
with the methods that act on it.  Complement/reverse complement are things
that only work with nucleotide sequences - they belong in a class that
handles nucleotide sequences.  The way you're trying to structure things is
by separating methods from data.  As a result, we're going to have one of
two situations - a TON of small wrapper classes that only perform a limited
number of functions, or a few giant conglomerations of utility functions.
It's not going to be easier to find and maintain methods.

The rangeOfSubsequence isn't the horrible situation you make it out to be.
We've got a set of related methods that work on all sequences in the
superclass.  When I get free time (hah!), I'll move the other set (handling
ambiguous sequences) into the superclass to - I'll just have it check for
the sequence type at the start.  The methods will go with the data they work
with.

The FASTA situation is also a bad example to support your case.  Some file
formats contain information regarding the type of sequence, other's don't.
Why should we make a sequence object handle that, or create a new class to
act as an intermediary - dealing with differences in file format is the job
of the object that knows about file formats, not a sequence.

Given all these things I view as negatives, I still don't understand what
advantages a single sequence class would provide.  The concrete examples you
provide seem to me to be causing more organizational issues than they solve,
and not following good OOP design.  My first instinct would be to take
anything in BCFindSequence and work it back in to BCSequence.

Another way to think about this - let's assume that Apple knows what they're
doing in designing their classes.  The most analogous item in Cocoa's
Foundation is NSMutableString.  There is only one utility class that's
directly related to strings (NSScanner - maybe two with NSCharacterSet).
Just about all the methods needed for handling the contents of strings are
either in NSMutableString or its superclass.  It's good design.

No shooting though!  At least not unless I ever invest in a copy of Halo...

JT

> By introducing BCFindSequence, I hope I showed that we don't need all
> the variations of rangeOfSubsequence in multiple locations. I am
> confident that the same applies for other sequence manipulations. For
> instance, code to calculate a complement or reverse complement could
> also go into a wrapper class. Code to translate a sequence is already
> in a wrapper class.
> 
> 
> You probably can guess where I am going next :-)
> 
> Having said all that, again I want to make a case that we don't have to
> subclass BCSequence. A sequence object IMO should only take care of
> maintaining the array of symbols, and maybe store additional
> information about the sequence, such as annotations and features. I
> don't think this is distorting biology, because in real life, DNA and
> proteins also use additional proteins to extend their behaviour
> (translate, get the complement, look for a epitope, digest, transport
> through the membrane, etc).
> 
> Another advantage is the following. Last week I asked for a way to
> determine if a fasta file contains a dna or protein. We don't know in
> advance, so what should the readFasta method return, BCSequenceProtein
> or BCSequenceDNA? If we just have readFasta return a BCSequence the
> read-method doesn't have to worry about that! Of course, when actually
> creating the sequence, we could either set BCSequenceType or a
> introduce a symbolset/alphabet, so at least we and the user knows what
> we are dealing with. But this is not the responsibility of readFasta
> which only extracts the relevant information from a file, and passes it
> on the code that creates a sequence.
> 
> I hope that with showing some concreate examples that this time I can
> convince you guys that we don't have to subclass BCSequence, or at
> least use wrappers for all additional functionality.
> 
> please now go ahead and shoot me ;-)

_______________________________________________
This mind intentionally left blank