[Biophp-dev] Fasta filetype parser updated

S Clark biophp-dev@bioinformatics.org
Tue, 6 May 2003 19:08:48 -0600


On Tuesday 06 May 2003 06:47 pm, nicos@itsa.ucsf.edu wrote:
> Only had a quick glance, but the methods:
> isFromFile()
> setSource()
> findNextLabel()
> readtoLabel()
> might be needed (unchanged) in many other parsers.  Should we have a
> parser class that can be extended by the specific parsers, or is that
> getting too ugly?

I was thinking about that - it might be a good idea overall.  A lot 
of the parsers will end up doing very similar things, so I imagine there'll
be a fair amount of code re-use that we can do.

I don't know how much of the code will necessarily be the same, but 
the methods could be included in an "abstract class" (i.e. the functions
empty but existing) if nothing else.

> I'll try to get started on the Genbank parsers soon.  That will extend
> the array returned by the parsers dramatically,so we should have a
> careful look at that array once the Genbank parser is done.  Also, I'll
> probably take the approach to have the findNextInfile in that parser read
> a whole record into an array,and do the actual parsing only in the array.

I think MOST of the fields in GenBank have a direct representative
in Serge's seq object, so for that at least I don't think there'll
be too much to do in seq_factory and so on.

> Sean, do you want to have a go at the clustalw parser too?

Heh...just committed it - I BELIEVE both clustalw and clustalx use
the same file structure (at least, as far as the way the parser
can tell).  There should now be an updated parse.inc.php (with clustal
auto-detect), the filetype parser for clustal, and an updated test.php
which also tests clustal.  Oh, and a lamin.aln file to test with...

Let me know if you spot any problems.