[Biococoa-dev] BCSequenceReader

Thu Nov 11 02:30:09 EST 2004

Ha Koen,

Very nice, in fact I'll see if I can further expand that as well soon 
because this is the part that I'm most interested in on the short term. 
If that works OK, I'll implement it immediately in the update of 
EnzymeX. About the distinction between protein and sequence, I think we 
need some form of a "distinction algorithm" anyway, a small test wether 
a sequence is DNA (both with and without ambiguous bases), RNA and 
Proteins. For some formats this won't be a problem because they only 
support one or the other, but in most cases we should first identify 
the type. These methods would come in handy in many cases, for instance 
if someone enters text by hand to feed it in some methods, we could 
quickly check to see what the type is.

The best way to do it is to check for the presence of certain 
characters or look at  overall % of certain characters. Though you can 
never distinguish a stretch of 7 Alanines from 7 Adenosines I'm afraid. 
I would it that case default to either one, although it might be handy 
to have alternatives ready there. For instance, a read method which has 
the type you want as an argument, and also an argument that says what 
to do if the thing fails (i.e. skip or stop). The same holds true for 
the "checkType" methods, it would be nice if they return nil or a 
self-defined constant (BCSequenceTypeUnknown or something) if it can't 
be determined.

Finally, one thing we might already think about a bit. The DNA strider 
format is a binary one, to test this we need to work with paths instead 
of strings. Therefore, I suggest to pass the path to the readFile 
method instead of the already read file. Then in that method determine 
the type, and either read the file to a string and pass it to methods 
like the one for fasta files, or pass the path to methods that need 
direct access to the original file. The rest can stay the same because 
I like the way we could now also pass a string to the readFasta method 
without the need for a file per se.

One minor thing, after updating from CVS I did see the new files of 
BCSequenceReader, but not in the project. At first I thought this was 
an XCode thing, but even after a clean checkout they weren't there. 
Guess you forgot to update the project file as well, could you still do 
that?

Cheers,
Alex

Op 11-nov-04 om 1:42 heeft Koen van der Drift het volgende geschreven:

> Hi all,
>
> I have added an initial attempt for a new class BCSequenceReader. I 
> also added some code to the translation demo to test this. I am using 
> the original code from Peter, so the code figures out what the format 
> of the data is. For now I have only added a readFasta method. Fasta 
> files (and other formats as well) can contain DNA sequences or protein 
> sequences. But how do I figure out which of the two I am dealing with, 
> so I can return the proper subclass of BCSequence? Any suggestions how 
> to approach this?
>
> thanks,
>
> - Koen.
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
**************************************************************
                         ** Alexander Griekspoor **
**************************************************************
                  The Netherlands Cancer Institute
                  Department of Tumorbiology (H4)
             Plesmanlaan 121, 1066 CX, Amsterdam
                        Tel:  + 31 20 - 512 2023
                        Fax:  + 31 20 - 512 2029
                       AIM: mekentosj at mac.com
                       E-mail: a.griekspoor at nki.nl
                    Web: http://www.mekentosj.com

MacOS X: The power of UNIX with the simplicity of the Mac

***************************************************************