[Biophp-dev] Back on track

S Clark biophp-dev@bioinformatics.org
Mon, 21 Apr 2003 19:07:40 -0600


Actually, I have it working now, and I'm making good
progress getting a "semi-official" version of the 
modules ready for upload...

CURL is not necessary at all - what I'm currently doing is
that I have split out a "generic" xml-parser module that
accepts a filehandle to be parsed.  Extending the parser
for specific types of documents just means adding functions
that are specific to whatever tags you need to handle (and
putting those functions in the tagname=>function_name array)
and away it goes...

The SAX (expat) parser seems very appropriate for most of
the types of data we'll be dealing with - some of the data
streams might be quite large, so it's most efficient to 
just grab the data we're interested "on the fly" rather
than trying to read it all into memory (as you have to do
with the DOM-XML model).

My ESearch module based on this new set of code is working,
I'm just working on adding support for relative and absolute dates
and grabbing the "term translation" and hit count information to
the module and it'll be ready to go.

ESummary won't take much longer to finish after that - all the
HARD work was figuring out the XML parsing in general...

P.S. Welcome back :-)

Sean

On Monday 21 April 2003 06:09 pm, Dong Gregorio wrote:
> Hello all!
>
> Sorry I've been away for the "mail" for a while...
>
> Sean, I've finally read about (at NCBI's site) the eSearch,
> eFetch, eUtils you were mentioning last time.  From what I
> gather, you'd need CURL to actually save the data returned
> by the NCBI site to your local CPU.  And then you'd need
> to parse the data which is in XML format.  Is that correct?
>
> So, my next question is: what are you working on at the
> moment: the CURl part or the XML parsing part?  Both?
> Have you finally got PHP's XML parser functions (based
> on expat) to work?  Or are you using something else?
>
> Regards,
>
> Serge