[Biophp-dev] (insert profanity here) XML!...

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Wed, 16 Apr 2003 22:33:50 -0600


I am bepuzzled.

I seem to be having some form of bizarre context problem.  The XML parsing is
actually working the way I expect it to now that I've found out how to
encapsulate the parser function events inside an object.  I've set it up
such that particular tags that I am watching for are in a top-level array (and
the character data handler passes data to them if one is defined for the
current tag, or just shoves the data into a "generic" array otherwise).

Insertion of debug printing into the function associated with the "<Id>"
tag in the ESearch results shows that when the parser hits data in the Id
tag it is, as expected, passing the data to the character handler, which
is in turn correctly passing the data on to the "$this->add_id" function.
Debug printouts in this function show that, correctly, the Id string
has been added to the stack.

However, once the program returns to the level of the "xml_parse" call, the
ID's disappear.  There's no error regarding access to $this->IDs, but it's 
empty.  Yet every time the parser calls the character handler (which
then redirects to add_id) the debug statement shows the data's still there.

Very strange.  At this point it's confusing enough that I'm convinced I'm
just missing something stupid...

Later I will try splitting out the results into a SEPARATE object from the XML
parser object (rather than encapsulating the results inside the parser
object) and just putting a reference to it in the parser, but later. 
Meanwhile, if any bored souls want to examine what I'm trying to use now, 
It's pasted below.  Constructive criticism and "Hey, dummy, look at line XXX, 
you missed a letter"-type comments as appropriate are welcome.  Currently, the
only tag it specifically watches for are the returned ID's, but adding the
other relevant tags will be easy...once I figure out what I'm doing wrong
with saving the data in the first place.

I THINK the model that I'm trying to implement for XML parsers ought to be
easy to abstract out once I get it working and just extend for each type
of data, but we'll see...

----------------------------pasted below - esearch_parser_xml class----------
<?php
class esearch_parser_xml
{
    var $xml_parser;
    var $taghandlers=Array("Id"=>"add_id"); //functions for handling data 
within specific tags
    var $current_tag;
    var $current_tag_attribs;  //array, where applicable, of attrib=>value for 
current tag
    var $parent_tag=Array(); //"parent tag" stack
    var $parent_tag_attribs=Array(); //stack holding arrays of attributes for 
each tag
    var $chardata;
    var $filehandle;

    var $IDs=Array();
    var $other_results=Array();  //array of tag=>data, for data in tags not 
otherwise handled

//#####################Class Constructor#################
    function esearch_parser_xml($filehandle) {
        $this->filehandle=$filehandle;
        $this->xml_parser=xml_parser_create();
        xml_parser_set_option($this->xml_parser, XML_OPTION_CASE_FOLDING, 0);
        
xml_set_element_handler($this->xml_parser,array(&$this,"start_element"),array(&$this,"stop_element"));
        
xml_set_character_data_handler($this->xml_parser,array(&$this,"data_handler_wrapper"));
    }

//#####################XML setup functions###############
    function start_element($parser, $element, $attributes) {
        if($this->current_tag!="") {
            array_push($this->parent_tag,$this->current_tag);
            array_push($this->parent_tag_attribs,$this->current_tag_attribs);
        }
        $this->current_tag=$element;
        $this->current_tag_attribs=$attributes;
    }

    function stop_element($parser, $element) {
        $this->current_tag=array_pop($this->parent_tag);
        $this->current_tag_attribs=array_pop($this->parent_tag_attribs);
    }

    function data_handler_wrapper($parser, $data) {
        if(isset($this->taghandlers[$this->current_tag])) {
            
eval("\$this->".$this->taghandlers[$this->current_tag]."(\$data);");
        }
        else {
            if($data!="") {
                $this->other_results[$this->current_tag].=$data."\n";
            }
        }
    }

    function add_id($id) {
    //this one's easy...
        $this->IDs[]=$id;
    }

    function parse_it() {
        while($data=fgets($this->filehandle,4096)) {
            //debug
            print("parsing: $data\n");
            if (!xml_parse($this->xml_parser, $data, feof($this->filehandle))) 
{
                print("\nLike, DUDE!  Something went wrong!\n XML error:");
                $xmlerr=xml_get_error_code($this->xml_parser);
                print("$xmlerr (".xml_error_string($xmlerr).")\nAt line:".
                xml_get_current_line_number($this->xml_parser)."\n");
            }
        }
    fclose($this->filehandle);
    }
//#####################Get/Set interfaces#####################

    function count_IDs() {
        return (count($this->IDs));
    }

    function get_IDs() {
        return $this->id;
    }

    function get_other_result($tagname) {
        return $this->other_results[$tagname];
    }

//###################Clean up/"Destructor"###################

    function xml_done() {
        xml_parser_free($this->xml_parser);
    }
}
?>