Fwd: [Biococoa-dev] more ramblings

Thu Nov 18 08:06:32 EST 2004

Sorry, forgot to include the biococoa list....
*******

Ok guys, this is going really fast now, I can hardly keep up, it's a 
good thing though ;-)
I'm not sure if this is the best thing, but I decided to comment 
personally on a per email basis, instead of aggregating them in one 
large one. Unfortunately, that probably means even more reading ;-)

Practically, I haven't had the time to help solving the initWithString 
method in BCSymbolSet, neither did I had the time to look if 
BCFindSequence works fine. The work on BCFindSequence looks really 
promising though Koen, well done!

< snippet of nice work from Koen en suggestions to check it our, which 
I will certainly do>
>
> By introducing BCFindSequence, I hope I showed that we don't need all 
> the variations of rangeOfSubsequence in multiple locations. I am 
> confident that the same applies for other sequence manipulations. For 
> instance, code to calculate a complement or reverse complement could 
> also go into a wrapper class. Code to translate a sequence is already 
> in a wrapper class.

Yep, I totally agree in this case. I believe I expressed my preference 
before in keeping BCSequence mere data storages and put manipulations 
like these in specialized wrapper classes that fulfill certain tasks 
very well. The restriction enzyme / digester thing is another perfect 
example that has come up a number of times. Still, there's a fairly 
large borderline here. For example, the complementation and reversion 
of sequences are fairly simple things and I'm not sure if you should 
have a wrapper for that. [mySequence complement] is so simple compared 
to a wrapper solution. Also NSString nicely shows that example, there 
are a lot of these methods there as well. Two remarks here; 1) I see 
partially why Koen wants to factor all these method out of BCSequence, 
as making BCSequence a general "one-for-all-types" sequence object 
wouldn't allow you anymore to keep these kind of methods very simple. 
2) Of course, one alternative would be to have the best of both worlds 
my making, as an example, [mySequence complement] a convenience method 
that internally calls the proper wrapper/helper object. We can than 
still have a simple interface, AND have a central place for the code 
that does the work behind the scenes.

> You probably can guess where I am going next :-)

Let me see... No not really. Kidding, so the question here is do we go 
for one general BCSequence class or multiple ones.

> Having said all that, again I want to make a case that we don't have 
> to subclass BCSequence. A sequence object IMO should only take care of 
> maintaining the array of symbols, and maybe store additional 
> information about the sequence, such as annotations and features. I 
> don't think this is distorting biology, because in real life, DNA and 
> proteins also use additional proteins to extend their behaviour 
> (translate, get the complement, look for a epitope, digest, transport 
> through the membrane, etc).

True, but physically they are different as well, and use different 
enzymes to be synthesized and degraded for instance. But in principle 
you're right, there's lots to say for this option.

> Another advantage is the following. Last week I asked for a way to 
> determine if a fasta file contains a dna or protein. We don't know in 
> advance, so what should the readFasta method return, BCSequenceProtein 
> or BCSequenceDNA? If we just have readFasta return a BCSequence the 
> read-method doesn't have to worry about that! Of course, when actually 
> creating the sequence, we could either set BCSequenceType or a 
> introduce a symbolset/alphabet, so at least we and the user knows what 
> we are dealing with. But this is not the responsibility of readFasta 
> which only extracts the relevant information from a file, and passes 
> it on the code that creates a sequence.

Yes, but that's just pushing the problem ahead, and has a few more 
consequences. For instance in the case of the fasta file, say we have 
"AAAATTT" (worst case scenario I agree). Sure we can instantiate a very 
general class for the sequence, but then which symbol do you pick to 
fill it? The A for Alanine, or the A for Adenine? I hope not a "N" or 
"Unknown". In the end, you MUST choose for which type to go, and if you 
made that choice, then you can just as well set the BCSequence type, or 
in our case pick the proper subclass. Unless I do not see the better 
alternative. But even if you could read a fasta file in an untyped 
bcsequence with "untyped"  symbols, what happens if you feed this one 
to a "make_complement" wrapper? You get the same problem again and 
again, what is the complement of an A symbol, either nothing in the 
protein world (or perhaps a codon ;-) or a T (I know it doesn't make 
sense to ask a protein for its complement, but as an example I think it 
illustrates the problem well).
>
> I hope that with showing some concreate examples that this time I can 
> convince you guys that we don't have to subclass BCSequence, or at 
> least use wrappers for all additional functionality.

To a certain point yes, at least I agree with the latter part. I'm 
strongly in favor of the wrapper classes, I use them a lot myself and 
think they nicely separate the model from the "controller". Also, with 
convenience methods one can still keep things "hybrid" I think.
In principle, I don't believe in untyped sequences, but of course the 
biojava way shows one possibility to indeed have a general BCSequence 
that is typed by the attached BCSymbolSet (Alphabet) you attach to it. 
I'm not sure as I can't overlook many consequences and problems that 
this strategy has. Most important, I think that the current solution 
works nicely, though more methods could be transferred to wrapper 
classes. Finally, the annotation stuff and alike are indeed very 
general, but hey that's why they belong in the BCSequence super class 
right!

>
> please now go ahead and shoot me ;-)

No shooting please ;-)
Cheers,
Alex
>

>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   AIM: mekentosj at mac.com
                   E-mail: a.griekspoor at nki.nl
               Web: http://www.mekentosj.com

                             iRNAi, do you?
              http://www.mekentosj.com/irnai

*********************************************************

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                     AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.

*********************************************************

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 7698 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20041118/ad843eb3/attachment.bin>