Module: LAX | Martel/LAX.py | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Many XML formats are very simple: all the fields are needed, there is no tree hierarchy, all the text inside of the tags is used, and the text is short (it can easily fit inside of memory). SAX is pretty good for this but it's still somewhat complicated to use. DOM is designed to handle tree structures so is a bit too much for a simple flat data structure. This module implements a new, simpler API, which I'll call LAX. It only works well when the elements are small and non-hierarchical. LAX has three callbacks.
LAX.LAX is an content handler which converts the SAX events to LAX events. Here is an example use: >>> from Martel import Word, Whitespace, Group, Integer, Rep1, AnyEol
>>> format = Rep1(Group("line", Word("name") + Whitespace() +
... Integer("age")) + AnyEol())
>>> parser = format.make_parser()
>>>
>>> from Martel import LAX
>>> class PrintFields(LAX.LAX):
... def element(self, tag, attrs, text):
... print tag, "has", repr(text)
...
>>> parser.setContentHandler(PrintFields())
>>> text = "Maggie 3\nPorter 1\n"
>>> parser.parseString(text)
name has Callbacks take some getting used to. Many people prefer an iterative
solution which returns all of the fields of a given record at one
time. The default implementation of LAX.LAX helps this case.
The If you need the element attributes as well as the name, use the LAX.LAXAttrs class, which stores a list of 2-ples (text, attrs) instead of just the text. For examples: >>> iterator = format.make_iterator("line") >>> for record in iterator.iterateString(text, LAX.LAX()): ... print record.groups["name"][0], "is", record.groups["age"][0] ... Maggie is 3 Porter is 1 >>> If you only want a few fields, you can pass the list to constructor, as in: >>> lax = LAX.LAX(["name", "sequence"]) >>>
|