[Biophp-dev] Abstracting the parser backend

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Tue, 29 Apr 2003 18:59:49 PST

> > OK.  Are you going to set this up? Once there is an example and some
> > structure in place it will be possible for others to extend it.
> I will if you'd like, but only if you don't mind - it's bad enough
> that I've inadvertently dumped all over all the work you've just
> done without me editing it without asking first...

No problem. I simply want it to work, to be as good as possible, and be
available ASAP. 

> Presuming you don't mind, code-wise, here is what I'll do:
> 1."wrap" the existing parser functions into classes 
> complete with file reading code

OK.  So they will all have their own move_Next(), move_Previous, eof(),
bof(), fetch() and probably also move_First(), move_Last(), move_To()
functions?.  I guess the parser class constructors should take either a
filename or a string as an argument.  Preferably, there should be a way
to maintain and use an index in a file (as in Serge's seqdb class).
Hmmm, this is all straight forward to do in memory (like it is now), but
probably more difficult with a stream (what streams other than files are
there in php?  php treats URLs almost exactly like a file, so...)  How
important is it to deal with datastructures larger than the available
B.t.w., since we now have this large list of methods that every class
should have, Jo Dough will still have to do quite some work to get his
class to work in Biophp.  If possible, why not have them in the calling
class so that they don't have to be repeated over and over again? 

> 2.add instantiation of the (filetype)_parser class 
Will this mean that I instantiate a class Parser, and class Parser finds
the approriate (filetype)_parser class for me?  If so, it would be cool
to keep the current include scheme, where only the required parser is
actually included in the running script.

> 3.edit the fetch() function to reflect that it's calling a
> method from the filetype parser instead of one of its own
> methods

so fetch(),will simple call (filetype)_parser->fetch()?

> 4.create a "seq_factory.inc.php" class to churn out the sequence
> objects

so fetch() will get a datastructure from the parser, feed this to object
seq_factory,and get a Seq object back, which it sends back to the calling 

> 5.move the creation of the seq object up to the "Parse" class, via
> the "seq_factory".

Sounds OK to me.  I would still think about keeping the user functions
(move_Next(), etc..) in the Parse class (provided it is possible, I don't 
think interleaved data are a big problem with this scheme).  That will
keep the individual parsers more simple. 
The seq_factory is fine with me (it is a good idea to keep the
translation from what is in the file to our abstraction of the real world 
in one place).
The only issue is how to deal with stuff both in memory (small files,
strings, these are currently kept in an array of strings and an index of
the line-numbers with sequence entries is maintained) and in streams (big 
files, we read from a file pointer).  Is there an easy way to deal with both?


B.t.w what a relief to write and read about stuff that matters!