Luminous  git-master
 All Classes Files Functions Variables
Public Member Functions | Private Member Functions | Private Attributes | List of all members
Scanner Class Reference

Base string scanning class. More...

Inheritance diagram for Scanner:
Inheritance graph
[legend]

Public Member Functions

 __construct ($src=null)
 constructor
 add_pattern ($name, $pattern)
 Allows the caller to add a predefined named pattern.
 bol ()
 Beginning of line?
 check ($pattern)
 Non-consuming lookahead.
 eol ()
 End of line?
 eos ()
 End of string?
 get ($n=1)
 Consume a given number of bytes.
 get_next ($patterns)
 Look for the next occurrence of a set of patterns.
 get_next_named ($patterns)
 Find the index of the next occurrence of a named pattern.
 get_next_strpos ($patterns)
 Look for the next occurrence of a set of substrings.
 index ($pattern)
 Find the index of the next occurrence of a pattern.
 match ()
 Get the result of the most recent match operation.
 match_group ($g=0)
 Get a group from the most recent match operation.
 match_groups ()
 Get the match groups of the most recent match operation.
 match_pos ()
 Get the position (offset) of the most recent match.
 next_match ($consume_and_log=true)
 Automation function: returns the next occurrence of any known patterns.
 peek ($n=1)
 Lookahead into the string a given number of bytes.
 pos ($new_pos=null)
 Getter and setter for the current position (string pointer).
 pos_shift ($offset)
 Moves the string pointer by a given offset.
 remove_pattern ($name)
 Allows the caller to remove a named pattern.
 reset ()
 Reset the scanner.
 rest ()
 Gets the remaining string.
 scan ($pattern)
 Scans at the current pointer.
 scan_until ($pattern)
 Scans until the start of a pattern.
 string ($s=null)
 Getter and setter for the source string.
 terminate ()
 Ends scanning of a string.
 unscan ()
 Revert the most recent scanning operation.

Private Member Functions

 __check ($pattern, $instant=true, $consume=true, $consume_match=true, $log=true)
 The real scanning function.
 __consume ($pos, $consume_match, $match_data)
 Helper function to consume a match.
 __log_match ($index, $match_pos, $match_data)
 Helper function to log a match into the history.

Private Attributes

 $index
 The current scan pointer (AKA the offset or index)
 $match_history = array(null, null)
 Match history.
 $patterns = array()
 Caller defined patterns used by next_match()
 $src
 Our local copy of the input string to be scanned.
 $src_len
 Length of input string (cached for performance)
 $ss
 LuminousStringSearch instance (caches preg_* results)

Detailed Description

Base string scanning class.

The Scanner class is the base class which handles traversing a string while searching for various different tokens. It is loosely based on Ruby's StringScanner.

The rough idea is we keep track of the position (a string pointer) and use scan() to see what matches at the current position.

It also provides some automation methods, but it's fairly low-level as regards string scanning.

Scanner is abstract as far as Luminous is concerned. LuminousScanner extends Scanner significantly with some methods which are useful for recording highlighting related data.

See Also
LuminousScanner

Member Function Documentation

Scanner::__check (   $pattern,
  $instant = true,
  $consume = true,
  $consume_match = true,
  $log = true 
)
private

The real scanning function.

Parameters
$patternThe pattern to scan for
$instantWhether or not the only legal match is at the current scan pointer or whether one beyond the scan pointer is also legal.
$consumeWhether or not to consume string as a result of matching
$consume_matchWhether or not to consume the actual matched string. This only has effect if $consume is TRUE. If $instant is TRUE, $consume is true and $consume_match is FALSE, the intermediate substring is consumed and the scan pointer moved to the beginning of the match, and the substring is recorded as a single-group match.
$logwhether or not to log the matches into the match_register
Returns
The matched string or null. This is subsequently equivalent to match() or match_groups()[0] or match_group(0).
Scanner::__consume (   $pos,
  $consume_match,
  $match_data 
)
private

Helper function to consume a match.

Parameters
$pos(int) The match position
$consume_match(bool) Whether or not to consume the actual matched text
$match_dataThe matching groups, as returned by PCRE.
Scanner::__log_match (   $index,
  $match_pos,
  $match_data 
)
private

Helper function to log a match into the history.

Scanner::add_pattern (   $name,
  $pattern 
)

Allows the caller to add a predefined named pattern.

Adds a predefined pattern which is visible to next_match.

Parameters
$nameA name for the pattern. This does not have to be unique.
$patternA regular expression pattern.
Scanner::bol ( )

Beginning of line?

Returns
TRUE if the scan pointer is at the beginning of a line (i.e. immediately following a newline character), or at the beginning of the string, else FALSE
Scanner::check (   $pattern)

Non-consuming lookahead.

Looks for the given pattern at the current index and logs it if it is found, but does not consume it. This is a look-ahead.

Parameters
$patternthe pattern to search for
Returns
null if not found, else the matched string.
Scanner::eol ( )

End of line?

Returns
TRUE if the scan pointer is at the end of a line (i.e. immediately preceding a newline character), or at the end of the string, else FALSE
Scanner::eos ( )

End of string?

Returns
TRUE if the scan pointer at the end of the string, else FALSE.
Scanner::get (   $n = 1)

Consume a given number of bytes.

Parameters
$nThe number of bytes.
Returns
The given number of bytes from the string from the current scan pointer onwards. The returned string will be at most n bytes long, it may be shorter or the empty string if the scanner is in the termination position.
Note
This method is identitical to peek(), but it does consume the string.
neither get nor peek logs its matches into the match history.
Scanner::get_next (   $patterns)

Look for the next occurrence of a set of patterns.

Finds the next match of the given patterns and returns it. The string is not consumed or logged. Convenience function.

Parameters
$patternsan array of regular expressions
Returns
an array of (0=>index, 1=>match_groups). The index may be -1 if no pattern is found.
Scanner::get_next_named (   $patterns)

Find the index of the next occurrence of a named pattern.

Parameters
$patternsA map of $name=>$pattern
Returns
An array: ($name, $index, $matches). If there is no next match, name will be null, index will be -1 and matches will be null.
Note
consider using this method to build a transition table
Scanner::get_next_strpos (   $patterns)

Look for the next occurrence of a set of substrings.

Like get_next() but uses strpos instead of preg_*

Returns
An array: 0 => index 1 => substring. If the substring is not found, index is -1 and substring is null
See Also
get_next()
Scanner::index (   $pattern)

Find the index of the next occurrence of a pattern.

Parameters
$patternthe pattern to search for
Returns
The next index of the pattern, or -1 if it is not found
Scanner::match ( )

Get the result of the most recent match operation.

Returns
The return value is either a string or NULL depending on whether or not the most recent scanning function matched anything.
Exceptions
Exceptionif no matches have been recorded.
Scanner::match_group (   $g = 0)

Get a group from the most recent match operation.

Parameters
$gthe group's numerical index or name, in the case of named subpatterns.
Returns
A string represeting the group's contents.
See Also
match_groups()
Exceptions
Exceptionif no matches have been recorded.
Exceptionif matches have been recorded, but the group does not exist.
Scanner::match_groups ( )

Get the match groups of the most recent match operation.

Returns
The return value is either an array/map or NULL depending on whether or not the most recent scanning function was successful. The map is the same as PCRE returns, i.e. group_name => match_string, where group_name may be a string or numerical index.
Exceptions
Exceptionif no matches have been recorded.
Scanner::match_pos ( )

Get the position (offset) of the most recent match.

Returns
The position, as integer. This is a standard zero-indexed offset into the string. It is independent of the scan pointer.
Exceptions
Exceptionif no matches have been recorded.
Scanner::next_match (   $consume_and_log = true)

Automation function: returns the next occurrence of any known patterns.

Iterates over the predefined patterns array (add_pattern) and consumes/logs the nearest match, skipping unrecognised segments of string.

Returns
An array: 0 => pattern name (as given to add_pattern) 1 => match index (although the scan pointer will have progressed to the end of the match if the pattern is consumed). When no more matches are found, return value is NULL and nothing is logged.
Parameters
$consume_and_logIf this is FALSE, the pattern is not consumed or logged.
Warning
this method is not the same as get_next. This does not return the match groups, instead it returns a name. The ordering of the return array is also different, but the array does in fact hold different data.
Scanner::peek (   $n = 1)

Lookahead into the string a given number of bytes.

Parameters
$nThe number of bytes.
Returns
The given number of bytes from the string from the current scan pointer onwards. The returned string will be at most n bytes long, it may be shorter or the empty string if the scanner is in the termination position.
Note
This method is identitical to get(), but it does not consume the string.
neither get nor peek logs its matches into the match history.
Scanner::pos (   $new_pos = null)

Getter and setter for the current position (string pointer).

Parameters
$new_posThe new position (leave NULL to use as a getter), note that this will be clipped to a legal string index if you specify a negative number or an index greater than the string's length.
Returns
the current string pointer
Scanner::pos_shift (   $offset)

Moves the string pointer by a given offset.

Parameters
$offsetthe offset by which to move the pointer. This can be positve or negative, but using a negative offset is currently generally unsafe. You should use unscan() to revert the last operation.
See Also
pos
unscan
Scanner::remove_pattern (   $name)

Allows the caller to remove a named pattern.

Parameters
$namethe name of the pattern to remove, this should be as it was supplied to add_pattern().
Warning
If there are multiple patterns with the same name, they will all be removed.
Scanner::reset ( )

Reset the scanner.

Resets the scanner: sets the scan pointer to 0 and clears the match history.

Scanner::rest ( )

Gets the remaining string.

Returns
The rest of the string, which has not yet been consumed
Scanner::scan (   $pattern)

Scans at the current pointer.

Looks for the given pattern at the current index and consumes and logs it if it is found.

Parameters
$patternthe pattern to search for
Returns
null if not found, else the full match.
Scanner::scan_until (   $pattern)

Scans until the start of a pattern.

Looks for the given pattern anywhere beyond the current index and advances the scan pointer to the start of the pattern. The match is logged.

The match itself is not consumed.

Parameters
$patternthe pattern to search for
Returns
The substring between here and the given pattern, or null if it is not found.
Scanner::string (   $s = null)

Getter and setter for the source string.

Parameters
$sThe new source string (leave as NULL to use this method as a getter)
Returns
The current source string
Note
This method triggers a reset()
Any strings passed into this method are converted to Unix line endings, i.e. \n

Reimplemented in LuminousEmbeddedWebScript.

Scanner::terminate ( )

Ends scanning of a string.

Moves the scan pointer to the end of the string, terminating the current scan.

Scanner::unscan ( )

Revert the most recent scanning operation.

Unscans the most recent match. The match is removed from the history, and the scan pointer is moved to where it was before the match.

Calls to get(), and peek() are not logged and are therefore not unscannable.

Warning
Do not call unscan more than once before calling a scanning function. This is not currently defined.

Member Data Documentation

Scanner::$match_history = array(null, null)
private

Match history.

History of matches. This is an array (queue), which should have at most two elements. Each element consists of an array:

0 => Scan pointer when the match was found, 1 => Match index (probably the same as scan pointer, but not necessarily), 2 => Match data (match groups, as map, as returned by PCRE)

Note
Numerical indices are used for performance.

The documentation for this class was generated from the following file: