Imported modules
|
|
import Expression
import Parser
import convert_re
import msre_parse
import string
|
Functions
|
|
|
|
_find_wanted_groupref_names
|
_find_wanted_groupref_names ( expression )
expression -> dict of group names wanted by elements of the tree
The dict is used to during tagtable generation to specify which
groups need to save their match text. There's match-time overhead
for doing that, and the code isn't thread safe, so the intent is
to save only those groups that are needed.
The dict value is 1 if the group name is needed, else there is
no entry in the dict.
XXX need to make this a method!
Exceptions
|
|
NotImplementedError, "What is a %s?" % repr( expression )
|
|
|
_generate
|
_generate ( expression, genstate )
The internal recursive call
Exceptions
|
|
AssertionError, "Unknown Expression object: %s" % repr( expression )
AssertionError, "Unknown debug level: %s" % genstate.debug_level
|
|
|
check_assert
|
check_assert (
text,
x,
end,
tag_words,
)
Used during a positive lookhead test.
Note the +1/-1 trick.
|
|
check_assert_not
|
check_assert_not (
text,
x,
end,
tagtable,
)
Used during a negative lookhead test.
Note the +1/-1 trick.
|
|
check_at_beginning
|
check_at_beginning (
text,
x,
end,
)
XXX Is this correct? This is the multiline behaviour which allows
"^" to match the beginning of a line.
|
|
generate
|
generate ( expression, debug_level=0 )
expression -> Parser for the Expression tree
Get the tagtable and the want_groupref_names
Main entry point for record oriented readers
|
|
generate_alt
|
generate_alt ( expression, genstate )
a|b|c|d|...
Implemented by creating N subtables. If table i fails, try i+i. If
it succeeds, jump to 1 past the end.
1. try table1: on fail +1, on success +N+1
2. try table2: on fail +1, on success +N
3. try table3: on fail +1, on success +N-1
N. try tableN: on fail +1, on success +2
N+1. Fail
N+2. <past the end of table> XXX Don't need to create a Table. Could use a single tagtable by
merging into one table and retargeting "succeeded" transitions
(which jumped to the end+1 of the given table) to transition to the
end of the merged table.
|
|
generate_any
|
generate_any ( expression, genstate )
Any character in the given list of characters
|
|
generate_assert
|
generate_assert ( expression, genstate )
Create the tagtable for doing a lookahead assertion.
Uses the +1/-1 trick.
|
|
generate_at_beginning
|
generate_at_beginning ( expression, genstate )
XXX Consider this code broken!
Uses the +1/-1 trick.
|
|
generate_at_end
|
generate_at_end ( expression, genstate )
XXX Consider this code broken!
XXX If check_at_beginning is correct, then this is wrong since it
doesn't implement the multiline behaviour.
|
|
generate_debug
|
generate_debug ( expression, genstate )
|
|
generate_dot
|
generate_dot ( expression, genstate )
Match any character except newline (by which I mean just "\012")
|
|
generate_eol
|
generate_eol ( expression, genstate )
Match any of the three standard newline conventions
|
|
generate_group
|
generate_group ( expression, genstate )
A group , either named or unnamed
|
|
generate_groupref
|
generate_groupref ( expression, genstate )
Make the tagtable needed for a named group backreference.
Uses the +1/-1 trick.
|
|
generate_literal
|
generate_literal ( expression, genstate )
A literal character, or a character which isn't the given character
|
|
generate_max_repeat
|
generate_max_repeat ( expression, genstate )
It isn't as bad as it looks :)
Basically, call some other code for named group repeats.
Everything else is of the form {i,j}.
Get the subexpression table:
generate i copies which must work
generate j-i copies which may work, but fail okay.
special case when j == 65535, which standard for "unbounded"
|
|
generate_named_max_repeat
|
generate_named_max_repeat ( expression, genstate )
Exceptions
|
|
NotImplementedError( "Cannot mix numeric and named repeat counts" )
NotImplementedError( "Only a single named repeat count allowed" )
|
|
|
generate_null_op
|
generate_null_op ( expression, genstate )
Doesn't do anything
|
|
generate_parser
|
generate_parser ( expression, debug_level=0 )
Get the parser. Main entry point for everything except record
oriented readers
|
|
generate_pass_through
|
generate_pass_through ( expression, genstate )
Used to define parsers which read a record at time. They contain no
parse information themselves, but only in their children
|
|
generate_seq
|
generate_seq ( expression, genstate )
sequence of successive regexps: abcdef...
Simply catenate the tagtables together, in order. Works because falling
off the end of one table means success for that table, so try the next.
|
|
generate_str
|
generate_str ( expression, genstate )
A string
|
|
track_position
|
track_position (
text,
x,
end,
)
store the start position of the farthest successful match
This value is more useful than mxTextTools' default, which only
points out the last text region successfully tagged at the top
level. This value is the last region successfully tagged
anywhere.
Uses a global variable so this is SINGLE THREADED!
|
Classes
|
|
|