[Biophp-dev] Seeking comments on CVS, XML, and other TLA's

Greg Tyrelle biophp-dev@bioinformatics.org
Mon, 7 Apr 2003 15:13:57 +1000


*** mail-lists+biophpdev@dogphilosophy.net wrote: 
  |It's probably about time I figure out how to get CVS going - that means some
  |thought is called for at this point on how to structure things, at least
  |initially...
  |
  |Every developer with their own subdirectory?  Broken up by purpose?  Some
  |combination?
  |
  |Any thoughts?

Yes, a few...

If you have existing code you want to put in CVS then use some kind of
loose module structure would be my suggestion. This can be changed at
a latter date. No sense worrying about a perfect package structure now,
that can be thrashed out later. As an initial guide PEAR [1] might be a
good place to start.

Individual sandboxes for developers is often popular in OS
projects. You might like to try that ?


  |And about XML:

Uh oh, here we go...

  |I am at this point working with the NCBI BLAST query module and the EUtils
  |modules.  I have this feeling that I "ought" to be using the XML-parser
  |capabilities in PHP, but as I keep looking at it I keep thinking it's more 
  |trouble than it needs to be.  For example, I have the ESearch module working
  |at a minimal but usable stage - it sends the search terms, retrieves the
  |results, and extracts the returned ID's.  The ID's are obtained with a single, 
  |simple preg_match_all...

I sent you (off list) a working SAX parser for the NCBI ESearch XML
format [2] that might give you a starting point for comparison. 

  |Anybody with experience using the SAX-type parser in the default builds of
  |PHP have an opinion of it?  Is it worth all the extra hassle to deal with
  |rather than using simpler regular expressions for most things?

Yes.

It's expat based and therefore non-validating, I'm not sure about
other XML parser extensions for PHP, anyone ?

Is it worth all the extra hassle ? In the mood for opening a "can of
worms" is see ;]

There has been much debate recently on the xml-dev list [3] in
response to a weblog post by Tim Bray (co-author of the XML spec) [4]
to the effect that XML parsing is "too hard" and could be reasonably
done with regexes. 

In my oppinion it all depends on the task at hand and although the
esearch result XML can be parsed with a simple regex I would argue for
consistency in parsers for a project such as this. If you really think
it's overkill try REX [5].

Further points:

- BioPHP consistency -> many "bio" formats are moving to xml 
- Regexs are error prone
- Differentiate BioPHP as fundamentally supporting XML 
- Why bother with flatfiles ? BioPerl/Python/Java probably do these already 

More arguments when I think of them. Let the fun begin...

_greg

[1] http://pear.php.net/manual/en/
[2] http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
[3] http://www.xml.com/pub/a/2003/04/02/deviant.html
[4] http://tbray.org/ongoing/When/200x/2003/03/16/XML-Prog
[5] http://traumwind.de/computer/php/REX/index.html

-- 
Greg Tyrelle  (http://www.kinglab.unsw.edu.au/~greg)

"Logic only gives man what he needs, 
 magic gives man what he wants" - Tom Robbins