[Biococoa-dev] more ramblings

Alexander Griekspoor mek at mekentosj.com
Thu Nov 25 17:04:31 EST 2004


>> Yes, but that's just pushing the problem ahead, and has a few more 
>> consequences. For instance in the case of the fasta file, say we have 
>> "AAAATTT" (worst case scenario I agree). Sure we can instantiate a 
>> very general class for the sequence, but then which symbol do you 
>> pick to fill it? The A for Alanine, or the A for Adenine? I hope not 
>> a "N" or "Unknown". In the end, you MUST choose for which type to go, 
>> and if you made that choice, then you can just as well set the 
>> BCSequence type, or in our case pick the proper subclass. Unless I do 
>> not see the better alternative. But even if you could read a fasta 
>> file in an untyped bcsequence with "untyped"  symbols, what happens 
>> if you feed this one to a "make_complement" wrapper? You get the same 
>> problem again and again, what is the complement of an A symbol, 
>> either nothing in the protein world (or perhaps a codon ;-) or a T (I 
>> know it doesn't make sense to ask a protein for its complement, but 
>> as an example I think it illustrates the problem well).
> You are absolutely right that it is a problem to create an untyped 
> BCSequence, that's not what I was trying to say. My point was that 
> readFasta cannot always know if it is a protein or nucleotide 
> sequence, so we let it just create a BCSequence.
Right, but what I try to make clear is that that is only shifting ahead 
the problem... The question is whether we want to make all methods 
compatible with untyped sequences as a consequence. I don't think so, 
but perhaps you guys think differently.

> Even if it is clear what the sequence is, we should not have 2 
> different readFasta methods, one for proteins, and one for dna/rna.
Totally agree! But this should be possible with typed BCSequences as 

> If we just create a BCSequence, the readFasta method will always work.
Sure, but I still haven't heard a solution of the most important 
problem. Those characters that have an equivalent BCSymbol in multiple 
types, like A (Alanine and Adenosine). You can only solve this problem 
if you also introduce untyped BCSymbols, but as you can't add MW's and 
other properties (because you don't know what it represents) to them, 
they are merely replacements for characters. Also, what in the world 
would you return if you feed such a thing to an object that calculates 
it molecular weight? Get the problems we will get ourselves into?

>  It's only task IMO should be to parse the file (which should have a 
> constant structure, independent of the sequence type, so it works 
> always), extract the requested data, and pass it on to the class that 
> actually creates a new BCSequence object.
Hmm, ok, if you see it that way that's a possibility yes. Still it 
sounds more complicated than necessary. If you read a fasta file, you 
want a BCSequence (or a group of them) right? Why do it in two steps? I 
think there's plenty to distill in general METHODS within the 
sequenceIO class that all readXXX methods can use. It would keep things 
limited to one class though.

> I think it is the responsibility of the user/caller to ask for either 
> protein or dna or rna, by passing the right sequence type or symbol 
> set.
So, then you can just as well ask him to tell us right away, and 
instantiate the right BCSequence type immediately!
                     ** Alexander Griekspoor **
              The Netherlands Cancer Institute
              Department of Tumorbiology (H4)
         Plesmanlaan 121, 1066 CX, Amsterdam
                    Tel:  + 31 20 - 512 2023
                    Fax:  + 31 20 - 512 2029
                   AIM: mekentosj at mac.com
                    E-mail: a.griekspoor at nki.nl
                Web: http://www.mekentosj.com

	Claiming that the Macintosh is inferior to Windows
	because most people use Windows, is like saying
	that all other restaurants serve food that is
	inferior to McDonalds


More information about the Biococoa-dev mailing list