[Biophp-dev] Seeking comments on CVS, XML, and other TLA's

S Clark biophp-dev@bioinformatics.org
Wed, 9 Apr 2003 14:48:43 -0600


On Wednesday 09 April 2003 12:03 am, Dong Gregorio wrote:
[...]
> Uhuh, and what does that code supposed to do?  Let's take a look
> at it.

Well, the part *I* was interested in was a SAX XML (or is it XML SAX?) 
parser for ESearch - I wanted to see how it was implemented.  (That
was in the set of code I mentioned in the previous message).

I was right - it IS a pain. :-)  At least, it LOOKS that way.

Put simply, it appears to have the same need to have an individual
"function"(/pattern) written for every single tag that you want to get the
data out of that using regular expressions do, with the added overhead
of having to break things up into a "recognize the start tag", "recognize
the end tag", and "do stuff with the contents between the two tags" functions
and tracking the "depth"/parentage via some sort of ueber-variable (all
the online example parser scripts seem to use global variables, but
I suspect "everybody" just uses top-level variables in parser objects...)

For really simple XML documents (which I would count ESearch results
as) regular expressions just seem so much easier, though it seems
pretty clear that as documents get more complex the overhead of a 
formal XML parser become more worthwhile.  Plus, for consistency, 
I still think a real XML parser is called for wherever possible.

It's just that I'm still going throught the "mental temper tantrum" 
of convincing myself to do it...

> Well, that's a whole debate in itself.  To rephrase it:
>
> Do we work on features that would differentiate BioPHP from BioXXX (and
> work backwards later on) or do we through the basics first (data file
> parsing, sequence analysis, etc.)?

The answer, of course, is "yes". :-)

I hadn't really intended to EXPLICITY differentiate BioPHP from 
other BioXXXXX projects as such, just that I wasn't intending to
simply "re-implement" those other projects in PHP, though at the
same time I am hoping to have functionality for handling things
that (as far as I know) the other BioXXXX projects haven't gotten
around to, such as working with phylogeny (or perhaps dealing with HPLC
chromatograms of cell metabolites and other such analyses?)
[actually - I just looked at the BioPerl Docs and they do have some
support for parsing phylogenetic analysis from e.g. PAML]

I'm hoping to approach the "basics" as they become necessary to
accomplish specific tasks (which may or may not differentiate
BioPHP from BioXXXX - some tasks will, some won't), rather than
to approach with a focus of "BioPerl has a Bio::Tools::Phylo::PAML::Result
object, so we need to write one for BioPHP" (for example).

My personal (and admittedly inexperienced) opinion is that our
structure be more "task oriented", so the EQUIVALENT to the aforementioned
module might be categorized more like "Bio::Phylogeny::Frontends::PAML" 
(comments to correct any gross ignorance of good design that this opinion
may reveal are welcome - after all, one of my primary reasons for this
project is educating myself.  Thankfully, I like to think I'm a pretty
fast learner. )  Incidentally, if anyone's bored and wants to critique
my current presumably-grossly-idiosyncratic style, the sequence list object
should be a fairly representative example.  I'm particularly interested in
how easy other people find it to understand, and how and where I tend to
depart from what is generally considered good design practice...

> Taking the first approach would mean developing interfaces to other
> BioXXX (because we can't even do the simplest tasks ourselves using
> BioPHP).  Taking the second approach means it would take some time
> before we earn "bragging rights" vis-a-vis BioXXX.  Bragging rights being,
> "My dog can sing while standing on its head, yours cant!" * Laugh out loud
> *

With the possible exception of optional interfaces to BioJava for
processor-intensive analyses, I don't foresee any need to have interfaces to
other BioXXXX projects at the moment.  Well, unless, of course, it would
be easier to just use BioPerl's Bio::Dog::Activity::Singing::SongParsers::XML
module instead of writing our own.  (I don't think PHP has been ported to
the "Chow-Mixed-Breed" platform that my dog runs on, anyway...)