Experimental transition table driven scanner. More...
Public Member Functions | |
add_pattern ($name, $pattern, $end=null, $consume=true) | |
Adds a pattern. | |
add_transition ($from, $to) | |
Adds a state transition. | |
load_transitions () | |
Loads legal state transitions for the current state. | |
main () | |
next_end_data () | |
Looks for the next state-pop sequence (close/end) for the current state. | |
next_start_data () | |
Looks for the next legal state transition. | |
pop_state () | |
Pops a state from the stack. | |
push_child ($child) | |
push_state ($state_data) | |
Pushes a state. | |
record ($str, $dummy1=null, $dummy2=null) | |
record_range ($from, $to, $type=null) | |
Helper function to record a range of the string. | |
record_token ($str, $type) | |
Records a complete token This is shorthand for pushing a new node onto the stack, recording its text, and then popping it. | |
state_name () | |
Gets the name of the current state. | |
tagged () | |
Returns the XML representation of the token stream. |
Protected Member Functions | |
collapse_token_tree ($node) | |
setup () | |
Sets up the FSM. |
Protected Attributes | |
$legal_transitions = array() | |
Legal transitions for the current state. | |
$patterns = array() | |
Pattern list. | |
$token_tree_stack = array() | |
The token tree. | |
$transitions = array() | |
Transition table. | |
Protected Attributes inherited from LuminousSimpleScanner | |
$overrides = array() | |
Overrides array. | |
Protected Attributes inherited from LuminousScanner | |
$case_sensitive = true | |
Whether or not the language is case sensitive. | |
$filters = array() | |
Individual token filters. | |
$ident_map = array() | |
A map of identifiers and their corresponding token names. | |
$rule_tag_map = array() | |
Rule remappings. | |
$state_ = array() | |
State stack. | |
$stream_filters = array() | |
Token stream filters. | |
$tokens = array() | |
The token stream. | |
$user_defs | |
Identifier remappings based on definitions identified in the source code. |
Private Attributes | |
$last_state = null | |
$setup = false | |
$transition_rule_cache = array() |
Additional Inherited Members | |
Static Public Member Functions inherited from LuminousScanner | |
static | guess_language ($src, $info) |
Language guessing. | |
Public Attributes inherited from LuminousScanner | |
$version = 'master' | |
scanner version. |
Experimental transition table driven scanner.
The stateful scanner follows a transition table and generates a hierarchical token tree. As such, the states follow a hierarchical parent->child relationship rather than a strict from->to
A node in the token tree looks like this:
Children is an ordered list and its elements may be either other token nodes or just strings. We override tagged to try to collapse this into XML while still applying filters.
We now store patterns as the following tuple:
The termination pattern may be null, in which case the $open_pattern is complete. No transitions can occur within a complete state because the patterns' match is fixed.
We have two stacks. One is LuminousStatefulScanner::$token_tree_stack, which stores the token tree, and the other is a standard state stack which stores the current state data. State data is currently a pattern, as the above tuple.
LuminousStatefulScanner::add_pattern | ( | $name, | |
$pattern, | |||
$end = null , |
|||
$consume = true |
|||
) |
Adds a pattern.
$name | the name of the pattern/state |
$pattern | Either the entire pattern, or just its opening delimiter |
$end | If $pattern was just the opening delimiter, $end is the closing delimiter. Separating the two delimiters like this makes the state flexible length, as state transitions can occur inside it. |
$consume | Not currently observed. Might never be. Don't specify this yet. |
LuminousStatefulScanner::add_transition | ( | $from, | |
$to | |||
) |
Adds a state transition.
This is a helper function for LuminousStatefulScanner::transitions, you can specify it directly instead
$from | The parent state |
$to | The child state |
|
protected |
Recursive function to collapse the token tree into XML
LuminousStatefulScanner::load_transitions | ( | ) |
Loads legal state transitions for the current state.
Loads in legal state transitions into the legal_transitions array according to the current state
LuminousStatefulScanner::main | ( | ) |
Generic main function which observes the transition table
Reimplemented from LuminousSimpleScanner.
LuminousStatefulScanner::next_end_data | ( | ) |
Looks for the next state-pop sequence (close/end) for the current state.
LuminousStatefulScanner::next_start_data | ( | ) |
Looks for the next legal state transition.
LuminousStatefulScanner::pop_state | ( | ) |
Pops a state from the stack.
The top token on the token_tree_stack is popped and appended as a child to the new top token.
The top state on the state stack is popped and discarded.
Exception | if there is only the initial state on the stack (we cannot pop the initial state, because then we have no state at all) |
LuminousStatefulScanner::push_child | ( | $child | ) |
Pushes a new token onto the stack as a child of the currently active token
LuminousStatefulScanner::push_state | ( | $state_data | ) |
Pushes a state.
$state_data | A tuple of ($name, $open_pattern, $teminate_pattern). This should be as it is stored in LuminousStatefulScanner::patterns |
This actually causes two push operations. One is onto the token_tree_stack, and the other is onto the actual stack. The former creates a new token, the latter is used for state information
LuminousStatefulScanner::record | ( | $str, | |
$dummy1 = null , |
|||
$dummy2 = null |
|||
) |
Records a string as a child of the currently active token
Reimplemented from LuminousScanner.
LuminousStatefulScanner::record_range | ( | $from, | |
$to, | |||
$type = null |
|||
) |
Helper function to record a range of the string.
$from | the start index |
$to | the end index |
$type | dummy argument This is shorthand for $this->record(substr($this->string(), $from, $to-$from) |
RangeException | if the range is invalid (i.e. $to < $from) |
An empty range (i.e. $to === $from) is allowed, but it is essentially a no-op.
Reimplemented from LuminousScanner.
LuminousStatefulScanner::record_token | ( | $str, | |
$type | |||
) |
Records a complete token This is shorthand for pushing a new node onto the stack, recording its text, and then popping it.
$str | the string |
$type | the token type |
|
protected |
Sets up the FSM.
If the caller has omitted to specify an initial state then one is created, with valid transitions to all other known states. We also push the initial state onto the tree stack, and add a type mapping from the initial type to NULL
.
LuminousStatefulScanner::state_name | ( | ) |
Gets the name of the current state.
LuminousStatefulScanner::tagged | ( | ) |
Returns the XML representation of the token stream.
This function triggers the generation of the XML output.
Reimplemented from LuminousScanner.
|
private |
remembers the state on the last iteration so we know whether or not to load in a new transition-set
|
protected |
Legal transitions for the current state.
|
protected |
Pattern list.
Pattern array. Each pattern is a tuple of
|
private |
Records whether or not the FSM has been set up for the first time.
|
protected |
The token tree.
The tokens we end up with are a tree which we build as we go along. The easiest way to build it is to keep track of the currently active node on top of a stack. When the node is completed, we pop it and insert it as a child of the element which is now at the top of the stack.
At the end of the process we end up with one element in here which is the root node.
|
private |
Cache of transition rules