Fwd: [Biococoa-dev] more ramblings
Alexander Griekspoor
mek at mekentosj.com
Thu Nov 18 08:06:32 EST 2004
Sorry, forgot to include the biococoa list....
*******
Ok guys, this is going really fast now, I can hardly keep up, it's a
good thing though ;-)
I'm not sure if this is the best thing, but I decided to comment
personally on a per email basis, instead of aggregating them in one
large one. Unfortunately, that probably means even more reading ;-)
Practically, I haven't had the time to help solving the initWithString
method in BCSymbolSet, neither did I had the time to look if
BCFindSequence works fine. The work on BCFindSequence looks really
promising though Koen, well done!
< snippet of nice work from Koen en suggestions to check it our, which
I will certainly do>
>
> By introducing BCFindSequence, I hope I showed that we don't need all
> the variations of rangeOfSubsequence in multiple locations. I am
> confident that the same applies for other sequence manipulations. For
> instance, code to calculate a complement or reverse complement could
> also go into a wrapper class. Code to translate a sequence is already
> in a wrapper class.
Yep, I totally agree in this case. I believe I expressed my preference
before in keeping BCSequence mere data storages and put manipulations
like these in specialized wrapper classes that fulfill certain tasks
very well. The restriction enzyme / digester thing is another perfect
example that has come up a number of times. Still, there's a fairly
large borderline here. For example, the complementation and reversion
of sequences are fairly simple things and I'm not sure if you should
have a wrapper for that. [mySequence complement] is so simple compared
to a wrapper solution. Also NSString nicely shows that example, there
are a lot of these methods there as well. Two remarks here; 1) I see
partially why Koen wants to factor all these method out of BCSequence,
as making BCSequence a general "one-for-all-types" sequence object
wouldn't allow you anymore to keep these kind of methods very simple.
2) Of course, one alternative would be to have the best of both worlds
my making, as an example, [mySequence complement] a convenience method
that internally calls the proper wrapper/helper object. We can than
still have a simple interface, AND have a central place for the code
that does the work behind the scenes.
> You probably can guess where I am going next :-)
Let me see... No not really. Kidding, so the question here is do we go
for one general BCSequence class or multiple ones.
> Having said all that, again I want to make a case that we don't have
> to subclass BCSequence. A sequence object IMO should only take care of
> maintaining the array of symbols, and maybe store additional
> information about the sequence, such as annotations and features. I
> don't think this is distorting biology, because in real life, DNA and
> proteins also use additional proteins to extend their behaviour
> (translate, get the complement, look for a epitope, digest, transport
> through the membrane, etc).
True, but physically they are different as well, and use different
enzymes to be synthesized and degraded for instance. But in principle
you're right, there's lots to say for this option.
> Another advantage is the following. Last week I asked for a way to
> determine if a fasta file contains a dna or protein. We don't know in
> advance, so what should the readFasta method return, BCSequenceProtein
> or BCSequenceDNA? If we just have readFasta return a BCSequence the
> read-method doesn't have to worry about that! Of course, when actually
> creating the sequence, we could either set BCSequenceType or a
> introduce a symbolset/alphabet, so at least we and the user knows what
> we are dealing with. But this is not the responsibility of readFasta
> which only extracts the relevant information from a file, and passes
> it on the code that creates a sequence.
Yes, but that's just pushing the problem ahead, and has a few more
consequences. For instance in the case of the fasta file, say we have
"AAAATTT" (worst case scenario I agree). Sure we can instantiate a very
general class for the sequence, but then which symbol do you pick to
fill it? The A for Alanine, or the A for Adenine? I hope not a "N" or
"Unknown". In the end, you MUST choose for which type to go, and if you
made that choice, then you can just as well set the BCSequence type, or
in our case pick the proper subclass. Unless I do not see the better
alternative. But even if you could read a fasta file in an untyped
bcsequence with "untyped" symbols, what happens if you feed this one
to a "make_complement" wrapper? You get the same problem again and
again, what is the complement of an A symbol, either nothing in the
protein world (or perhaps a codon ;-) or a T (I know it doesn't make
sense to ask a protein for its complement, but as an example I think it
illustrates the problem well).
>
> I hope that with showing some concreate examples that this time I can
> convince you guys that we don't have to subclass BCSequence, or at
> least use wrappers for all additional functionality.
To a certain point yes, at least I agree with the latter part. I'm
strongly in favor of the wrapper classes, I use them a lot myself and
think they nicely separate the model from the "controller". Also, with
convenience methods one can still keep things "hybrid" I think.
In principle, I don't believe in untyped sequences, but of course the
biojava way shows one possibility to indeed have a general BCSequence
that is typed by the attached BCSymbolSet (Alphabet) you attach to it.
I'm not sure as I can't overlook many consequences and problems that
this strategy has. Most important, I think that the current solution
works nicely, though more methods could be transferred to wrapper
classes. Finally, the annotation stuff and alike are indeed very
general, but hey that's why they belong in the BCSequence super class
right!
>
> please now go ahead and shoot me ;-)
No shooting please ;-)
Cheers,
Alex
>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
iRNAi, do you?
http://www.mekentosj.com/irnai
*********************************************************
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.
*********************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 7698 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20041118/ad843eb3/attachment.bin>
More information about the Biococoa-dev
mailing list