Table of Contents

Module: Iterator Martel/Iterator.py

Iterate over records of a XML parse tree

The standard parser is callback based over all the elements of a file. If the file contains records, many people would like to be able to iterate over each record and only use the callback parser to analyze the record.

If the expression is a ParseRecords, then the code to do this is easy; use its make_reader to grab records and its record_expression to parse them. However, this isn't general enough. The use of a ParseRecords in the format definition should be strictly a implementation decision for better memory use. So there needs to be an API which allows both full and record oriented parsers.

Here's an example use of the API: >>> import sys >>> import swissprot38 # one is in Martel/test/testformats >>> from xml.dom import pulldom >>> iterator = swissprot38.format.make_iterator("swissprot38_record") >>> text = open("sample.swissprot").read() >>> for record in iterator.iterateString(text, pulldom.SAX2DOM()): .. print "Read a record with the following AC numbers:" ... for acc in record.document.getElementsByTagName("ac_number"): ... acc.writexml(sys.stdout) ... sys.stdout.write("\n") ...

There are several parts to this API. First is the 'Iterator

There are two parts to the API. One is the EventStream. This contains a single method called "next()" which returns a list of SAX events in the 2-ple (event_name, args). It is called multiple times to return successive event lists and returns None if no events are available.

The other is the Iterator

Sean McGrath has a RAX parser (Record API for XML) which uses a concept similar to this.

Imported modules   
import Parser
import sys
import traceback
import urllib
from xml.sax import saxutils
Functions   
_get_next_text
  _get_next_text 
_get_next_text ( reader )

Classes   
EventStream
HeaderFooterEventStream
Iterate
Iterator
IteratorHeaderFooter
IteratorRecords
RecordEventStream
StoreEvents

Table of Contents

This document was automatically generated on Mon Jul 1 12:03:20 2002 by HappyDoc version 2.0.1