[Biococoa-dev] more ramblings
Alexander Griekspoor
mek at mekentosj.com
Thu Nov 25 17:04:31 EST 2004
Koen,
>> Yes, but that's just pushing the problem ahead, and has a few more
>> consequences. For instance in the case of the fasta file, say we have
>> "AAAATTT" (worst case scenario I agree). Sure we can instantiate a
>> very general class for the sequence, but then which symbol do you
>> pick to fill it? The A for Alanine, or the A for Adenine? I hope not
>> a "N" or "Unknown". In the end, you MUST choose for which type to go,
>> and if you made that choice, then you can just as well set the
>> BCSequence type, or in our case pick the proper subclass. Unless I do
>> not see the better alternative. But even if you could read a fasta
>> file in an untyped bcsequence with "untyped" symbols, what happens
>> if you feed this one to a "make_complement" wrapper? You get the same
>> problem again and again, what is the complement of an A symbol,
>> either nothing in the protein world (or perhaps a codon ;-) or a T (I
>> know it doesn't make sense to ask a protein for its complement, but
>> as an example I think it illustrates the problem well).
>
> You are absolutely right that it is a problem to create an untyped
> BCSequence, that's not what I was trying to say. My point was that
> readFasta cannot always know if it is a protein or nucleotide
> sequence, so we let it just create a BCSequence.
Right, but what I try to make clear is that that is only shifting ahead
the problem... The question is whether we want to make all methods
compatible with untyped sequences as a consequence. I don't think so,
but perhaps you guys think differently.
> Even if it is clear what the sequence is, we should not have 2
> different readFasta methods, one for proteins, and one for dna/rna.
Totally agree! But this should be possible with typed BCSequences as
well.
> If we just create a BCSequence, the readFasta method will always work.
Sure, but I still haven't heard a solution of the most important
problem. Those characters that have an equivalent BCSymbol in multiple
types, like A (Alanine and Adenosine). You can only solve this problem
if you also introduce untyped BCSymbols, but as you can't add MW's and
other properties (because you don't know what it represents) to them,
they are merely replacements for characters. Also, what in the world
would you return if you feed such a thing to an object that calculates
it molecular weight? Get the problems we will get ourselves into?
> It's only task IMO should be to parse the file (which should have a
> constant structure, independent of the sequence type, so it works
> always), extract the requested data, and pass it on to the class that
> actually creates a new BCSequence object.
Hmm, ok, if you see it that way that's a possibility yes. Still it
sounds more complicated than necessary. If you read a fasta file, you
want a BCSequence (or a group of them) right? Why do it in two steps? I
think there's plenty to distill in general METHODS within the
sequenceIO class that all readXXX methods can use. It would keep things
limited to one class though.
> I think it is the responsibility of the user/caller to ask for either
> protein or dna or rna, by passing the right sequence type or symbol
> set.
So, then you can just as well ask him to tell us right away, and
instantiate the right BCSequence type immediately!
>
Cheers,
Alex
>
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
Claiming that the Macintosh is inferior to Windows
because most people use Windows, is like saying
that all other restaurants serve food that is
inferior to McDonalds
*********************************************************
More information about the Biococoa-dev
mailing list