Base string scanning class. More...
Public Member Functions | |
__construct ($src=null) | |
constructor | |
add_pattern ($name, $pattern) | |
Allows the caller to add a predefined named pattern. | |
bol () | |
Beginning of line? | |
check ($pattern) | |
Non-consuming lookahead. | |
eol () | |
End of line? | |
eos () | |
End of string? | |
get ($n=1) | |
Consume a given number of bytes. | |
get_next ($patterns) | |
Look for the next occurrence of a set of patterns. | |
get_next_named ($patterns) | |
Find the index of the next occurrence of a named pattern. | |
get_next_strpos ($patterns) | |
Look for the next occurrence of a set of substrings. | |
index ($pattern) | |
Find the index of the next occurrence of a pattern. | |
match () | |
Get the result of the most recent match operation. | |
match_group ($g=0) | |
Get a group from the most recent match operation. | |
match_groups () | |
Get the match groups of the most recent match operation. | |
match_pos () | |
Get the position (offset) of the most recent match. | |
next_match ($consume_and_log=true) | |
Automation function: returns the next occurrence of any known patterns. | |
peek ($n=1) | |
Lookahead into the string a given number of bytes. | |
pos ($new_pos=null) | |
Getter and setter for the current position (string pointer). | |
pos_shift ($offset) | |
Moves the string pointer by a given offset. | |
remove_pattern ($name) | |
Allows the caller to remove a named pattern. | |
reset () | |
Reset the scanner. | |
rest () | |
Gets the remaining string. | |
scan ($pattern) | |
Scans at the current pointer. | |
scan_until ($pattern) | |
Scans until the start of a pattern. | |
string ($s=null) | |
Getter and setter for the source string. | |
terminate () | |
Ends scanning of a string. | |
unscan () | |
Revert the most recent scanning operation. |
Private Member Functions | |
__check ($pattern, $instant=true, $consume=true, $consume_match=true, $log=true) | |
The real scanning function. | |
__consume ($pos, $consume_match, $match_data) | |
Helper function to consume a match. | |
__log_match ($index, $match_pos, $match_data) | |
Helper function to log a match into the history. |
Private Attributes | |
$index | |
The current scan pointer (AKA the offset or index) | |
$match_history = array(null, null) | |
Match history. | |
$patterns = array() | |
Caller defined patterns used by next_match() | |
$src | |
Our local copy of the input string to be scanned. | |
$src_len | |
Length of input string (cached for performance) | |
$ss | |
LuminousStringSearch instance (caches preg_* results) |
Base string scanning class.
The Scanner class is the base class which handles traversing a string while searching for various different tokens. It is loosely based on Ruby's StringScanner.
The rough idea is we keep track of the position (a string pointer) and use scan() to see what matches at the current position.
It also provides some automation methods, but it's fairly low-level as regards string scanning.
Scanner is abstract as far as Luminous is concerned. LuminousScanner extends Scanner significantly with some methods which are useful for recording highlighting related data.
|
private |
The real scanning function.
$pattern | The pattern to scan for |
$instant | Whether or not the only legal match is at the current scan pointer or whether one beyond the scan pointer is also legal. |
$consume | Whether or not to consume string as a result of matching |
$consume_match | Whether or not to consume the actual matched string. This only has effect if $consume is TRUE . If $instant is TRUE , $consume is true and $consume_match is FALSE , the intermediate substring is consumed and the scan pointer moved to the beginning of the match, and the substring is recorded as a single-group match. |
$log | whether or not to log the matches into the match_register |
|
private |
Helper function to consume a match.
$pos | (int) The match position |
$consume_match | (bool) Whether or not to consume the actual matched text |
$match_data | The matching groups, as returned by PCRE. |
|
private |
Helper function to log a match into the history.
Scanner::add_pattern | ( | $name, | |
$pattern | |||
) |
Allows the caller to add a predefined named pattern.
Adds a predefined pattern which is visible to next_match.
$name | A name for the pattern. This does not have to be unique. |
$pattern | A regular expression pattern. |
Scanner::bol | ( | ) |
Beginning of line?
TRUE
if the scan pointer is at the beginning of a line (i.e. immediately following a newline character), or at the beginning of the string, else FALSE
Scanner::check | ( | $pattern | ) |
Non-consuming lookahead.
Looks for the given pattern at the current index and logs it if it is found, but does not consume it. This is a look-ahead.
$pattern | the pattern to search for |
null
if not found, else the matched string. Scanner::eol | ( | ) |
End of line?
TRUE
if the scan pointer is at the end of a line (i.e. immediately preceding a newline character), or at the end of the string, else FALSE
Scanner::eos | ( | ) |
End of string?
TRUE
if the scan pointer at the end of the string, else FALSE
. Scanner::get | ( | $n = 1 | ) |
Consume a given number of bytes.
$n | The number of bytes. |
Scanner::get_next | ( | $patterns | ) |
Look for the next occurrence of a set of patterns.
Finds the next match of the given patterns and returns it. The string is not consumed or logged. Convenience function.
$patterns | an array of regular expressions |
Scanner::get_next_named | ( | $patterns | ) |
Find the index of the next occurrence of a named pattern.
$patterns | A map of $name=>$pattern |
Scanner::get_next_strpos | ( | $patterns | ) |
Look for the next occurrence of a set of substrings.
Like get_next() but uses strpos instead of preg_*
Scanner::index | ( | $pattern | ) |
Find the index of the next occurrence of a pattern.
$pattern | the pattern to search for |
Scanner::match | ( | ) |
Get the result of the most recent match operation.
NULL
depending on whether or not the most recent scanning function matched anything.Exception | if no matches have been recorded. |
Scanner::match_group | ( | $g = 0 | ) |
Get a group from the most recent match operation.
$g | the group's numerical index or name, in the case of named subpatterns. |
Exception | if no matches have been recorded. |
Exception | if matches have been recorded, but the group does not exist. |
Scanner::match_groups | ( | ) |
Get the match groups of the most recent match operation.
NULL
depending on whether or not the most recent scanning function was successful. The map is the same as PCRE returns, i.e. group_name => match_string, where group_name may be a string or numerical index.Exception | if no matches have been recorded. |
Scanner::match_pos | ( | ) |
Get the position (offset) of the most recent match.
Exception | if no matches have been recorded. |
Scanner::next_match | ( | $consume_and_log = true | ) |
Automation function: returns the next occurrence of any known patterns.
Iterates over the predefined patterns array (add_pattern) and consumes/logs the nearest match, skipping unrecognised segments of string.
NULL
and nothing is logged.$consume_and_log | If this is FALSE , the pattern is not consumed or logged. |
Scanner::peek | ( | $n = 1 | ) |
Lookahead into the string a given number of bytes.
$n | The number of bytes. |
Scanner::pos | ( | $new_pos = null | ) |
Getter and setter for the current position (string pointer).
$new_pos | The new position (leave NULL to use as a getter), note that this will be clipped to a legal string index if you specify a negative number or an index greater than the string's length. |
Scanner::pos_shift | ( | $offset | ) |
Scanner::remove_pattern | ( | $name | ) |
Allows the caller to remove a named pattern.
$name | the name of the pattern to remove, this should be as it was supplied to add_pattern(). |
Scanner::reset | ( | ) |
Reset the scanner.
Resets the scanner: sets the scan pointer to 0 and clears the match history.
Scanner::rest | ( | ) |
Gets the remaining string.
Scanner::scan | ( | $pattern | ) |
Scans at the current pointer.
Looks for the given pattern at the current index and consumes and logs it if it is found.
$pattern | the pattern to search for |
null
if not found, else the full match. Scanner::scan_until | ( | $pattern | ) |
Scans until the start of a pattern.
Looks for the given pattern anywhere beyond the current index and advances the scan pointer to the start of the pattern. The match is logged.
The match itself is not consumed.
$pattern | the pattern to search for |
null
if it is not found. Scanner::string | ( | $s = null | ) |
Getter and setter for the source string.
$s | The new source string (leave as NULL to use this method as a getter) |
\n
Reimplemented in LuminousEmbeddedWebScript.
Scanner::terminate | ( | ) |
Ends scanning of a string.
Moves the scan pointer to the end of the string, terminating the current scan.
Scanner::unscan | ( | ) |
Revert the most recent scanning operation.
Unscans the most recent match. The match is removed from the history, and the scan pointer is moved to where it was before the match.
Calls to get(), and peek() are not logged and are therefore not unscannable.
|
private |
Match history.
History of matches. This is an array (queue), which should have at most two elements. Each element consists of an array:
0 => Scan pointer when the match was found, 1 => Match index (probably the same as scan pointer, but not necessarily), 2 => Match data (match groups, as map, as returned by PCRE)