implement Martel parsers
The classes in this module are used by other Martel modules and not
typically by external users.
There are two major parsers, Parser and RecordParser. The first
is the standard one, which parses the file as one string in memory
then generates the SAX events. The other reads a record at a time
using a RecordReader and generates events after each read. The
generated event callbacks are identical.
At some level, both parsers use "_do_callback" to convert mxTextTools
tags into SAX events.
XXX finish this documentation
XXX need a better way to get closer to the likely error position when
parsing.
XXX need to implement Locator
Imported modules
|
|
import Dispatch
import pprint
import string
import sys
import traceback
import urllib
from xml.sax import xmlreader, _exceptions, handler, saxutils
|
Functions
|
|
_do_callback
_do_dispatch_callback
_parse_elements
|
|
_do_callback
|
_do_callback (
s,
begin,
end,
taglist,
cont_handler,
attrlookup,
)
internal function to convert the tagtable into ContentHandler events
s is the input text
begin is the current position in the text
end is 1 past the last position of the text allowed to be parsed
taglist is the tag list from mxTextTools.parse
cont_handler is the SAX ContentHandler
attrlookup is a dict mapping the encoded tag name to the element info
Exceptions
|
|
AssertionError("Unknown special tag %s" % repr( tag ) )
|
|
|
_do_dispatch_callback
|
_do_dispatch_callback (
s,
begin,
end,
taglist,
start_table_get,
cont_handler,
save_stack,
end_table_get,
attrlookup,
)
internal function to convert the tagtable into ContentHandler events
THIS IS A SPECIAL CASE FOR Dispatch.Dispatcher objects
s is the input text
begin is the current position in the text
end is 1 past the last position of the text allowed to be parsed
taglist is the tag list from mxTextTools.parse
start_table_get is the Dispatcher._start_table
cont_handler is the Dispatcher
end_table_get is the Dispatcher._end_table
cont_handler is the SAX ContentHandler
attrlookup is a dict mapping the encoded tag name to the element info
|
|
_parse_elements
|
_parse_elements (
s,
tagtable,
cont_handler,
debug_level,
attrlookup,
)
parse the string with the tagtable and send the ContentHandler events
Specifically, it sends the startElement, endElement and characters
events but not startDocument and endDocument.
|
Classes
|
|
|
|