Superclass for languages which may nest, i.e. web languages. More...
Public Member Functions | |
add_child_scanner ($name, $scanner) | |
adds a child scanner Adds a child scanner and indexes it against a name, convenience function | |
dirty_exit ($token_name) | |
Sets the exit data to signify the exit is dirty and will need recovering from. | |
resume () | |
Attempts to recover from a dirty exit. | |
script_break ($token_name, $match=null, $pos=null) | |
Checks for a script terminator tag inside a matched token. | |
server_break ($token_name, $match=null, $pos=null) | |
Checks for a server-side script inside a matched token. | |
string ($str=null) | |
Getter and setter for the source string. | |
Public Member Functions inherited from LuminousScanner | |
__construct ($src=null) | |
constructor | |
add_filter ($arg1, $arg2, $arg3=null) | |
Add an individual token filter. | |
add_identifier_mapping ($name, $matches) | |
Adds an identifier mapping which is later analysed by map_identifier_filter. | |
add_stream_filter ($arg1, $arg2=null) | |
Adds a stream filter. | |
highlight ($src) | |
Public convenience function for setting the string and highlighting it. | |
init () | |
Set up the scanner immediately prior to tokenization. | |
main () | |
the method responsible for tokenization | |
map_identifier_filter ($token) | |
Identifier mapping filter. | |
nestable_token ($token_name, $open, $close) | |
Handles tokens that may nest inside themselves. | |
pop () | |
Pops the top element of the stack, and returns it. | |
push ($state) | |
Pushes some data onto the stack. | |
record ($string, $type, $pre_escaped=false) | |
Records a string as a given token type. | |
record_range ($from, $to, $type) | |
Helper function to record a range of the string. | |
remove_filter ($name) | |
Removes the individual filter(s) with the given name. | |
remove_stream_filter ($name) | |
Removes the stream filter(s) with the given name. | |
skip_whitespace () | |
Skips whitespace, and records it as a null token. | |
start () | |
Flushes the token stream. | |
state () | |
Gets the top element on $state_ or null if it is empty. | |
tagged () | |
Returns the XML representation of the token stream. | |
token_array () | |
Gets the token array. | |
Public Member Functions inherited from Scanner | |
add_pattern ($name, $pattern) | |
Allows the caller to add a predefined named pattern. | |
bol () | |
Beginning of line? | |
check ($pattern) | |
Non-consuming lookahead. | |
eol () | |
End of line? | |
eos () | |
End of string? | |
get ($n=1) | |
Consume a given number of bytes. | |
get_next ($patterns) | |
Look for the next occurrence of a set of patterns. | |
get_next_named ($patterns) | |
Find the index of the next occurrence of a named pattern. | |
get_next_strpos ($patterns) | |
Look for the next occurrence of a set of substrings. | |
index ($pattern) | |
Find the index of the next occurrence of a pattern. | |
match () | |
Get the result of the most recent match operation. | |
match_group ($g=0) | |
Get a group from the most recent match operation. | |
match_groups () | |
Get the match groups of the most recent match operation. | |
match_pos () | |
Get the position (offset) of the most recent match. | |
next_match ($consume_and_log=true) | |
Automation function: returns the next occurrence of any known patterns. | |
peek ($n=1) | |
Lookahead into the string a given number of bytes. | |
pos ($new_pos=null) | |
Getter and setter for the current position (string pointer). | |
pos_shift ($offset) | |
Moves the string pointer by a given offset. | |
remove_pattern ($name) | |
Allows the caller to remove a named pattern. | |
reset () | |
Reset the scanner. | |
rest () | |
Gets the remaining string. | |
scan ($pattern) | |
Scans at the current pointer. | |
scan_until ($pattern) | |
Scans until the start of a pattern. | |
terminate () | |
Ends scanning of a string. | |
unscan () | |
Revert the most recent scanning operation. |
Public Attributes | |
$clean_exit = true | |
Clean exit or inconvenient, mid-token forced exit. | |
$embedded_html = false | |
Is the source embedded in HTML? | |
$embedded_server = false | |
Is the source embedded in a server-side script (e.g. PHP)? | |
$interrupt = false | |
I think this is ignored and obsolete. | |
$script_tags | |
closing HTML tag for our code, e.g </script> | |
$server_tags = '/<\?/' | |
Opening tag for server-side code. This is a regular expression. | |
Public Attributes inherited from LuminousScanner | |
$version = 'master' | |
scanner version. |
Protected Attributes | |
$child_scanners = array() | |
Child scanners. | |
$dirty_exit_recovery = array() | |
Recovery patterns for when we reach an untimely interrupt. | |
$exit_state = null | |
Name of interrupted token, in case of a dirty exit. | |
Protected Attributes inherited from LuminousScanner | |
$case_sensitive = true | |
Whether or not the language is case sensitive. | |
$filters = array() | |
Individual token filters. | |
$ident_map = array() | |
A map of identifiers and their corresponding token names. | |
$rule_tag_map = array() | |
Rule remappings. | |
$state_ = array() | |
State stack. | |
$stream_filters = array() | |
Token stream filters. | |
$tokens = array() | |
The token stream. | |
$user_defs | |
Identifier remappings based on definitions identified in the source code. |
Additional Inherited Members | |
Static Public Member Functions inherited from LuminousScanner | |
static | guess_language ($src, $info) |
Language guessing. | |
Protected Member Functions inherited from LuminousScanner | |
rule_mapper_filter ($tokens) | |
Rule re-mapper filter. | |
user_def_filter ($token) | |
Filter to highlight identifiers whose definitions are in the source. |
Superclass for languages which may nest, i.e. web languages.
Web languages get their own special class because they have to deal with server-script code embedded inside them and the potential for languages nested under them (PHP has HTML, HTML has CSS and JavaScript)
The relationship is strictly hierarchical, not recursive descent Meeting a '<?' in CSS bubbles up to HTML and then up to PHP (or whatever). The top-level scanner is ultimately what should have sub-scanner code embedded in its own token stream.
The scanners should be persistent, so only one JavaScript scanner exists even if there are 20 javascript tags. This is so they can keep persistent state, which might be necessary if they are interrupted by server-side tags. For this reason, the main() method might be called multiple times, therefore each web sub-scanner should
The init method of the class should be used to set relevant rules based on whether or not the embedded flags are set; and therefore the embedded flags should be set before init is called.
LuminousEmbeddedWebScript::dirty_exit | ( | $token_name | ) |
Sets the exit data to signify the exit is dirty and will need recovering from.
$token_name | the name of the token which is being interrupted |
Exception | if no recovery data is associated with the given token. |
LuminousEmbeddedWebScript::resume | ( | ) |
Attempts to recover from a dirty exit.
This method should be called on every iteration of the main loop when LuminousEmbeddedWebScript::$clean_exit is FALSE. It will attempt to recover from an interruption which left the scanner in the middle of a token. The remainder of the token will be in Scanner::match() as usual.
LuminousEmbeddedWebScript::script_break | ( | $token_name, | |
$match = null , |
|||
$pos = null |
|||
) |
Checks for a script terminator tag inside a matched token.
$token_name | The token name of the matched text |
$match | The string from the last match. If this is left NULL then Scanner::match() is assumed to hold the match. |
$pos | The position of the last match. If this is left NULL then Scanner::match_pos() is assumed to hold the offset. |
TRUE
if the scanner should break, else FALSE
This method checks whether the string provided as match contains the string in LuminousEmbeddedWebScript::script_tags. If yes, then it records the substring as $token_name, advances the scan pointer to immediately before the script tags, and returns TRUE
. Returning TRUE
is a signal that the scanner should break immediately and let its parent scanner take over.
This condition is a 'clean_exit'.
LuminousEmbeddedWebScript::server_break | ( | $token_name, | |
$match = null , |
|||
$pos = null |
|||
) |
Checks for a server-side script inside a matched token.
$token_name | The token name of the matched text |
$match | The string from the last match. If this is left NULL then Scanner::match() is assumed to hold the match. |
$pos | The position of the last match. If this is left NULL then Scanner::match_pos() is assumed to hold the offset. |
TRUE
if the scanner should break, else FALSE
This method checks whether an interruption by a server-side script tag, LuminousEmbeddedWebScript::server_tags, occurs within a matched token. If it does, this method records the substring up until that point as the provided $token_name, and also sets up a 'dirty exit'. This means that some type was interrupted and we expect to have to recover from it when the server-side language's scanner has ended.
Returning TRUE
is a signal that the scanner should break immediately and let its parent scanner take over.
LuminousEmbeddedWebScript::string | ( | $s = null | ) |
|
protected |
Child scanners.
Persistent storage of child scanners, name => scanner (instance)
LuminousEmbeddedWebScript::$clean_exit = true |
Clean exit or inconvenient, mid-token forced exit.
Signifies whether the program exited due to inconvenient interruption by a parent language (i.e. a server-side langauge), or whether it reached a legitimate break. A server-side language isn't necessarily a dirty exit, but if it comes in the middle of a token it is, because we need to resume from it later. e.g.:
var x = "this is \<?php echo 'a' ?\> string";
|
protected |
Recovery patterns for when we reach an untimely interrupt.
If we reach a dirty exit, when we resume we need to figure out how to continue consuming the rule that was interrupted. So essentially, this will be a regex which matches the rule without start delimiters.
This is a map of rule => pattern
LuminousEmbeddedWebScript::$embedded_html = false |
Is the source embedded in HTML?
Embedded in HTML? i.e. do we need to observe tag terminators like </script>
LuminousEmbeddedWebScript::$embedded_server = false |
Is the source embedded in a server-side script (e.g. PHP)?
Embedded in a server side language? i.e. do we need to break at (for example) <? tags?
|
protected |
Name of interrupted token, in case of a dirty exit.
exit state logs our exit state in the case of a dirty exit: this is the rule that was interrupted.