[Biophp-dev] Egad, this is getting long :-)

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Tue, 29 Apr 2003 16:47:31 PST


-- snip about the usual amount of unnecessary information--

> I am advocating that the direct handling of the files(streams/strings) be 
> abstracted out into dedicated classes, which concern themselves only with
> reading the data and separating it into sequence data.  It would return a 
> "standard" (by which I mean "agreed upon") ordinary array, containing at 
> a minimum a "label" (short name) and the actual sequence as the
> first two elements. (The returned data would NOT be different for 
> every parser, only not dependent on externally-declared data structures).
> 
> The "Parser" class stays exactly as it is:
> 
> It handles detection of the data type if it is not specified.
> 
> It instantiates the appropriate parser object and tells it to go.
> 
> It hands out the sequence objects created from the parsers' output.
> 
> with only two minor changes:
> 
> 1)the actual "moving back and forth in the file" (move_Next()
> move_Previous() 
> methods) and EOF and so on end up down in the individual parsing
> classes - in
> some file types this may require special handling (e.g. "streams"
> can't really
> be rewound - in those cases a call to the "move_Previous()" method would
> simply return false)
> 
> 2)The CREATION of the sequence object be abstracted out to a "seq_factory"
> class.  This is isn't really necessary to the goal of just abstracting the
> data parsers, but I think it would be very handy (and useful outside
> of the
> Parser object, for example in the event that that someone wants a set 
> of seq
> objects for fragments from the resten class instead of only strings.) 
> If the
> structure of the sequence object is ever changed, the abstraction
> provided by
> the "factory" object protects the objects further out from the
> "center" of the 
> GenePHP structure from having to worry about it.  It also means that, for
> example, if we ever decide we want to add support for some OTHER sequence
> format (perhaps to let the BioJava guys make use of some of our PHP
> routines)
> all that would be needed is an additional "factory" object and all of
> the classes that generate sequence data can immediately support it without
> any real changes.
> 
> That's all.  The "end-usage" of the Parser object doesn't change at
> all (like

OK.  Are you going to set this up? Once there is an example and some
structure in place it will be possible for others to extend it.

Best,

Nico