[Biophp-dev] Seeking comments on CVS, XML, and other TLA's

Tue, 8 Apr 2003 00:32:47 -0600

On Sunday 06 April 2003 11:35 pm, Dong Gregorio wrote:
> Welcome, Greg!  Nice to see another "Greg" on board.  =)
>
> As for the build, I'm PHP (4.2 I think) on a Windows machine,
> and Apache 1.3.  Haven't found the time yet to get the latest.
> How about you, Sean?  What are you using?

A CVS snapshot from about a week ago (4.3.2 pre-release) built
as CLI.  I need to update to a more recent snapshot as they've
apparently fixed the problem installing the Java support when you're
only building the CLI version...

On my server, it's 4.3.something with apache 1.3.27(as I recall)

> >Is it worth all the extra hassle ? In the mood for opening a "can of
> >worms" is see ;]
>
> Hehe, well, Sean, how about it?  =)

As I once saw in a .sig on Slashdot:
"The can's already open, the worms are EVERYWHERE..." :-)

I just keep having trouble really CONVINCING myself that
the apparently complex framework of tag-handling functions
that you seem to have to build to parse XML with an "official" 
parser is really any better than easier-to-follow regular expressions
(DESPITE the fact that on a purely "rational" level I can certainly
see that at a certain point the complexity of a set of regular expressions
for parsing eventually becomes worse than the XML parser, but still...)

I intend to stick as much as possible with "default" capabilities in PHP
(i.e. the SAX parser, which is included in PHP builds by default,  rather than
XML-DOM which has to be specifically asked for at compile-time), which in 
this case is probably just as well - with DOM you have to load the entire
XML structure into memory before you can start doing anything with it, whereas
the SAX parser deals with it bit by bit as it comes in.  Considering how large
some of the datafiles we may end up dealing with can be...

I'm thinking I should set up a "utilities" directory of classes with
not-specifically-bioinformatics classes for dealing with basic things
like POST'ing queries to web interfaces (and the "core" XML parser class).

> >- BioPHP consistency -> many "bio" formats are moving to xml

This much is true, and my gut feeling is that at least in my case, 
once I FINALLY get to the point where I have a "feel" for how
to actually use XML parsers that it won't be too bad.

I have to confess my weekend has been spent "slacking off" (well, if 
you can count manual labor hauling boxes out of storage and such as
"slacking off") so I haven't yet gotten to looking at the example code
that Greg was kind enough to forward - I'll try to get to that tomorrow.

With "parsing philosophy" being the real holdup at this point for me, I 
really need to get on that.  Once done, the rest ought to be comparatively
easy...

> >- Differentiate BioPHP as fundamentally supporting XML

That thought HAD crossed my mind, but since it seems you have to 
code up tag/structure handling functions for each document type
ANYWAY, and since ideally the "guts" of the BioPHP classes will
be a "black box" to people using it, I'm not sure that's inherently
worth more than mere "bragging rights" (not that I don't care about
"bragging rights", but...) 

(Of course, being open source, these "black boxes" have metaphorical
easy-open latches on them so people can look at that guts if they
really want to :-) )

> >- Why bother with flatfiles ? BioPerl/Python/Java probably do
> >  these already

BioPerl/Python/Java/Ruby/Lisp also already deal with DNA sequences...
should we ignore them as well, then? :-)

Besides, I'd consider formats such as Phylip, Clustal, ASN.1, etc. to
be "flat files", and parsers for them will be handy.