[Biococoa-dev] more ramblings

Koen van der Drift kvddrift at earthlink.net
Thu Nov 18 20:33:32 EST 2004


On Nov 18, 2004, at 2:25 AM, Alexander Griekspoor wrote:

> Yes, but that's just pushing the problem ahead, and has a few more 
> consequences. For instance in the case of the fasta file, say we have 
> "AAAATTT" (worst case scenario I agree). Sure we can instantiate a 
> very general class for the sequence, but then which symbol do you pick 
> to fill it? The A for Alanine, or the A for Adenine? I hope not a "N" 
> or "Unknown". In the end, you MUST choose for which type to go, and if 
> you made that choice, then you can just as well set the BCSequence 
> type, or in our case pick the proper subclass. Unless I do not see the 
> better alternative. But even if you could read a fasta file in an 
> untyped bcsequence with "untyped"  symbols, what happens if you feed 
> this one to a "make_complement" wrapper? You get the same problem 
> again and again, what is the complement of an A symbol, either nothing 
> in the protein world (or perhaps a codon ;-) or a T (I know it doesn't 
> make sense to ask a protein for its complement, but as an example I 
> think it illustrates the problem well).


You are absolutely right that it is a problem to create an untyped 
BCSequence, that's not what I was trying to say. My point was that 
readFasta cannot always know if it is a protein or nucleotide sequence, 
so we let it just create a BCSequence. Even if it is clear what the 
sequence is, we should not have 2 different readFasta methods, one for 
proteins, and one for dna/rna. If we just create a BCSequence, the 
readFasta method will always work. It's only task IMO should be to 
parse the file (which should have a constant structure, independent of 
the sequence type, so it works always), extract the requested data, and 
pass it on to the class that actually creates a new BCSequence object. 
I think it is the responsibility of the user/caller to ask for either 
protein or dna or rna, by passing the right sequence type or symbol 
set.

Just for fun try the following. I have added two test sequences to the 
translation demo. Now edit the controller class so it will read the 
test2 file (a protein). The start the program, and hit translate. Tadaa 
;) To prevent these sort of situations, we just let the wrapper test 
first what the sequence type is, and either return the complement or 
nil/an error if it is a protein.

- Koen.




More information about the Biococoa-dev mailing list