[Biococoa-dev] BCSequenceReader
Alexander Griekspoor
mek at mekentosj.com
Thu Nov 11 02:30:09 EST 2004
Ha Koen,
Very nice, in fact I'll see if I can further expand that as well soon
because this is the part that I'm most interested in on the short term.
If that works OK, I'll implement it immediately in the update of
EnzymeX. About the distinction between protein and sequence, I think we
need some form of a "distinction algorithm" anyway, a small test wether
a sequence is DNA (both with and without ambiguous bases), RNA and
Proteins. For some formats this won't be a problem because they only
support one or the other, but in most cases we should first identify
the type. These methods would come in handy in many cases, for instance
if someone enters text by hand to feed it in some methods, we could
quickly check to see what the type is.
The best way to do it is to check for the presence of certain
characters or look at overall % of certain characters. Though you can
never distinguish a stretch of 7 Alanines from 7 Adenosines I'm afraid.
I would it that case default to either one, although it might be handy
to have alternatives ready there. For instance, a read method which has
the type you want as an argument, and also an argument that says what
to do if the thing fails (i.e. skip or stop). The same holds true for
the "checkType" methods, it would be nice if they return nil or a
self-defined constant (BCSequenceTypeUnknown or something) if it can't
be determined.
Finally, one thing we might already think about a bit. The DNA strider
format is a binary one, to test this we need to work with paths instead
of strings. Therefore, I suggest to pass the path to the readFile
method instead of the already read file. Then in that method determine
the type, and either read the file to a string and pass it to methods
like the one for fasta files, or pass the path to methods that need
direct access to the original file. The rest can stay the same because
I like the way we could now also pass a string to the readFasta method
without the need for a file per se.
One minor thing, after updating from CVS I did see the new files of
BCSequenceReader, but not in the project. At first I thought this was
an XCode thing, but even after a clean checkout they weren't there.
Guess you forgot to update the project file as well, could you still do
that?
Cheers,
Alex
Op 11-nov-04 om 1:42 heeft Koen van der Drift het volgende geschreven:
> Hi all,
>
> I have added an initial attempt for a new class BCSequenceReader. I
> also added some code to the translation demo to test this. I am using
> the original code from Peter, so the code figures out what the format
> of the data is. For now I have only added a readFasta method. Fasta
> files (and other formats as well) can contain DNA sequences or protein
> sequences. But how do I figure out which of the two I am dealing with,
> so I can return the proper subclass of BCSequence? Any suggestions how
> to approach this?
>
> thanks,
>
> - Koen.
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
**************************************************************
** Alexander Griekspoor **
**************************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
MacOS X: The power of UNIX with the simplicity of the Mac
***************************************************************
More information about the Biococoa-dev
mailing list