[Biococoa-dev] more ramblings
Koen van der Drift
kvddrift at earthlink.net
Tue Nov 16 20:30:47 EST 2004
In two recent code cleanups I did (rangeOfSubsequence and initializing
the symbols) I found that code that was originally in each subclass
could be moved either to the super or to an external wrapper. I hope
you can appreciate that the code became more transparent and also more
easy to maintain. For example, during the coding of BCFindSequence, I
found an error in the rangeOfSubsequence code (see my post October
30th). Once I found the problem, it was easy to fix with
BCFindSequence, because the code is just in one place, instead of in
each variation of rangeOfSubsequence in all the subclasses (which I
didn't fix yet ;).
I would appreciate it if you could check and try out the code in
BCFindSequence. I already put some test code in the translation demo.
Here are the relevant lines in the demo:
BCFindSequence *sequenceFinder = [BCFindSequence
[sequenceFinder setStrict: NO];
[sequenceFinder setFirstOnly: NO];
NSArray *foundIt = [sequenceFinder findSequence:
[BCSequenceDNA DNASequenceWithString: @"AAT" skippingNonBases: YES]];
NSLog ( @"the found-array is %@", foundIt );
Try changing the setStrict and setFirstOnly values, and the @"AAT"
search string, and see if the results displayed by NSLog in the console
are what you expect. Note that the results in 'foundIt' are stored as
NSRanges in NSValue, we way have to change that. Maybe you can try to
put an ambiguous symbol in the search string. Try feeding it a protein,
or rna. If I have done everything right, BCFindSequence should be
similar to all the variations of rangeOfSubsequence in BCSequence and
its subclasses. If not let me know what went wrong and I can see if I
can fix it.
By introducing BCFindSequence, I hope I showed that we don't need all
the variations of rangeOfSubsequence in multiple locations. I am
confident that the same applies for other sequence manipulations. For
instance, code to calculate a complement or reverse complement could
also go into a wrapper class. Code to translate a sequence is already
in a wrapper class.
You probably can guess where I am going next :-)
Having said all that, again I want to make a case that we don't have to
subclass BCSequence. A sequence object IMO should only take care of
maintaining the array of symbols, and maybe store additional
information about the sequence, such as annotations and features. I
don't think this is distorting biology, because in real life, DNA and
proteins also use additional proteins to extend their behaviour
(translate, get the complement, look for a epitope, digest, transport
through the membrane, etc).
Another advantage is the following. Last week I asked for a way to
determine if a fasta file contains a dna or protein. We don't know in
advance, so what should the readFasta method return, BCSequenceProtein
or BCSequenceDNA? If we just have readFasta return a BCSequence the
read-method doesn't have to worry about that! Of course, when actually
creating the sequence, we could either set BCSequenceType or a
introduce a symbolset/alphabet, so at least we and the user knows what
we are dealing with. But this is not the responsibility of readFasta
which only extracts the relevant information from a file, and passes it
on the code that creates a sequence.
I hope that with showing some concreate examples that this time I can
convince you guys that we don't have to subclass BCSequence, or at
least use wrappers for all additional functionality.
please now go ahead and shoot me ;-)
More information about the Biococoa-dev