Luminous  git-master
 All Classes Files Functions Variables
Public Member Functions | Static Public Member Functions | Public Attributes | Protected Member Functions | Protected Attributes | List of all members
LuminousScanner Class Reference

the base class for all scanners More...

Inheritance diagram for LuminousScanner:
Inheritance graph
[legend]
Collaboration diagram for LuminousScanner:
Collaboration graph
[legend]

Public Member Functions

 __construct ($src=null)
 constructor
 add_filter ($arg1, $arg2, $arg3=null)
 Add an individual token filter.
 add_identifier_mapping ($name, $matches)
 Adds an identifier mapping which is later analysed by map_identifier_filter.
 add_stream_filter ($arg1, $arg2=null)
 Adds a stream filter.
 highlight ($src)
 Public convenience function for setting the string and highlighting it.
 init ()
 Set up the scanner immediately prior to tokenization.
 main ()
 the method responsible for tokenization
 map_identifier_filter ($token)
 Identifier mapping filter.
 nestable_token ($token_name, $open, $close)
 Handles tokens that may nest inside themselves.
 pop ()
 Pops the top element of the stack, and returns it.
 push ($state)
 Pushes some data onto the stack.
 record ($string, $type, $pre_escaped=false)
 Records a string as a given token type.
 record_range ($from, $to, $type)
 Helper function to record a range of the string.
 remove_filter ($name)
 Removes the individual filter(s) with the given name.
 remove_stream_filter ($name)
 Removes the stream filter(s) with the given name.
 skip_whitespace ()
 Skips whitespace, and records it as a null token.
 start ()
 Flushes the token stream.
 state ()
 Gets the top element on $state_ or null if it is empty.
 tagged ()
 Returns the XML representation of the token stream.
 token_array ()
 Gets the token array.
- Public Member Functions inherited from Scanner
 add_pattern ($name, $pattern)
 Allows the caller to add a predefined named pattern.
 bol ()
 Beginning of line?
 check ($pattern)
 Non-consuming lookahead.
 eol ()
 End of line?
 eos ()
 End of string?
 get ($n=1)
 Consume a given number of bytes.
 get_next ($patterns)
 Look for the next occurrence of a set of patterns.
 get_next_named ($patterns)
 Find the index of the next occurrence of a named pattern.
 get_next_strpos ($patterns)
 Look for the next occurrence of a set of substrings.
 index ($pattern)
 Find the index of the next occurrence of a pattern.
 match ()
 Get the result of the most recent match operation.
 match_group ($g=0)
 Get a group from the most recent match operation.
 match_groups ()
 Get the match groups of the most recent match operation.
 match_pos ()
 Get the position (offset) of the most recent match.
 next_match ($consume_and_log=true)
 Automation function: returns the next occurrence of any known patterns.
 peek ($n=1)
 Lookahead into the string a given number of bytes.
 pos ($new_pos=null)
 Getter and setter for the current position (string pointer).
 pos_shift ($offset)
 Moves the string pointer by a given offset.
 remove_pattern ($name)
 Allows the caller to remove a named pattern.
 reset ()
 Reset the scanner.
 rest ()
 Gets the remaining string.
 scan ($pattern)
 Scans at the current pointer.
 scan_until ($pattern)
 Scans until the start of a pattern.
 string ($s=null)
 Getter and setter for the source string.
 terminate ()
 Ends scanning of a string.
 unscan ()
 Revert the most recent scanning operation.

Static Public Member Functions

static guess_language ($src, $info)
 Language guessing.

Public Attributes

 $version = 'master'
 scanner version.

Protected Member Functions

 rule_mapper_filter ($tokens)
 Rule re-mapper filter.
 user_def_filter ($token)
 Filter to highlight identifiers whose definitions are in the source.

Protected Attributes

 $case_sensitive = true
 Whether or not the language is case sensitive.
 $filters = array()
 Individual token filters.
 $ident_map = array()
 A map of identifiers and their corresponding token names.
 $rule_tag_map = array()
 Rule remappings.
 $state_ = array()
 State stack.
 $stream_filters = array()
 Token stream filters.
 $tokens = array()
 The token stream.
 $user_defs
 Identifier remappings based on definitions identified in the source code.

Detailed Description

the base class for all scanners

LuminousScanner is the base class for all language scanners. Here we provide a set of methods comprising a highlighting layer. This includes recording a token stream, and ultimately being responsible for producing some XML representing the token stream.

We also define here some filters which rely on state information expected to be recorded into the instance variables.

Highlighting a string at this level is a four-stage process:

 @li string() - set the string
 @li init() - set up the scanner
 @li main() - perform tokenization
 @li tagged() - build the XML

A note on tokens: Tokens are stored as an array with the following indices:

Member Function Documentation

LuminousScanner::add_filter (   $arg1,
  $arg2,
  $arg3 = null 
)

Add an individual token filter.

Adds an indivdual token filter. The filter is bound to the given token_name. The filter is a callback which should take a token and return a token.

The arguments are: [name], token_name, filter

Name is an optional argument.

LuminousScanner::add_identifier_mapping (   $name,
  $matches 
)

Adds an identifier mapping which is later analysed by map_identifier_filter.

Parameters
$nameThe token name
$matchesan array of identifiers which correspond to this token name, i.e. add_identifier_mapping('KEYWORD', array('if', 'else', ...));

This method observes LuminousScanner::$case_sensitive

LuminousScanner::add_stream_filter (   $arg1,
  $arg2 = null 
)

Adds a stream filter.

A stream filter receives the entire token stream and should return it.

The parameters are: ([name], filter). Name is an optional argument.

static LuminousScanner::guess_language (   $src,
  $info 
)
static

Language guessing.

Each real language scanner should override this method and implement a simple guessing function to estimate how likely the input source code is to be the language which they recognise.

Parameters
$srcthe input source code
Returns
The estimated chance that the source code is in the same language as the one the scanner tokenizes, as a real number between 0 (least likely) and 1 (most likely), inclusive
LuminousScanner::highlight (   $src)

Public convenience function for setting the string and highlighting it.

Alias for: $s->string($src) $s->init(); $s->main(); return $s->tagged();

Returns
the highlighted string, as an XML string
LuminousScanner::init ( )

Set up the scanner immediately prior to tokenization.

The init method is always called prior to main(). At this stage, all configuration variables are assumed to have been set, and it's now time for the scanner to perform any last set-up information. This may include actually finalizing its rule patterns. Some scanners may not need to override this if they are in no way dynamic.

LuminousScanner::main ( )

the method responsible for tokenization

The main method is fully responsible for tokenizing the string stored in string() at the time of its call. By the time main returns, it should have consumed the whole of the string and populated the token array.

Reimplemented in LuminousStatefulScanner, and LuminousSimpleScanner.

LuminousScanner::map_identifier_filter (   $token)

Identifier mapping filter.

Tries to map any 'IDENT' token to a TOKEN_NAME in LuminousScanner::$ident_map This is implemented as the filter 'map-ident'

LuminousScanner::nestable_token (   $token_name,
  $open,
  $close 
)

Handles tokens that may nest inside themselves.

Convenience function. It's fairly common for many languages to allow things like nestable comments. Handling these is easy but fairly long winded, so this function will take an opening and closing delimiter and consume the token until it is fully closed, or until the end of the string in the case that it is unterminated.

When the function returns, the token will have been consumed and appended to the token stream.

Parameters
$token_namethe name of the token
$openthe opening delimiter pattern (regex), e.g. '% /\* x'
$closethe closing delimiter pattern (regex), e.g. '% \* /x'
Warning
Although PCRE provides recursive regular expressions, this function is far preferable. A recursive regex will easily crash PCRE on garbage input due to it having a fairly small stack: this function is much more resilient.
Exceptions
Exceptionif called at a non-matching point (i.e. $this->scan($open) does not match)
LuminousScanner::pop ( )

Pops the top element of the stack, and returns it.

Exceptions
Exceptionif the state stack is empty
LuminousScanner::record (   $string,
  $type,
  $pre_escaped = false 
)

Records a string as a given token type.

Parameters
$stringThe string to record
$typeThe name of the token the string represents
$pre_escapedLuminous works towards getting this in XML and therefore at some point, the $string has to be escaped. If you have already escaped it for some reason (or if you got it from another scanner), then you want to set this to TRUE
See Also
LuminousUtils::escape_string
Exceptions
Exceptionif $string is NULL

Reimplemented in LuminousStatefulScanner.

LuminousScanner::record_range (   $from,
  $to,
  $type 
)

Helper function to record a range of the string.

Parameters
$fromthe start index
$tothe end index
$typethe type of the token This is shorthand for $this->record(substr($this->string(), $from, $to-$from)
Exceptions
RangeExceptionif the range is invalid (i.e. $to < $from)

An empty range (i.e. $to === $from) is allowed, but it is essentially a no-op.

Reimplemented in LuminousStatefulScanner.

LuminousScanner::rule_mapper_filter (   $tokens)
protected

Rule re-mapper filter.

Re-maps token rules according to the LuminousScanner::rule_tag_map map. This is called as the filter 'rule-map'

LuminousScanner::skip_whitespace ( )

Skips whitespace, and records it as a null token.

Convenience function

LuminousScanner::tagged ( )

Returns the XML representation of the token stream.

This function triggers the generation of the XML output.

Returns
An XML-string which represents the tokens recorded by the scanner.

Reimplemented in LuminousStatefulScanner.

LuminousScanner::token_array ( )

Gets the token array.

Returns
The token array
LuminousScanner::user_def_filter (   $token)
protected

Filter to highlight identifiers whose definitions are in the source.

maps anything recorded in LuminousScanner::user_defs to the recorded type. This is called as the filter 'user-defs'

Member Data Documentation

LuminousScanner::$case_sensitive = true
protected

Whether or not the language is case sensitive.

Whether or not the scanner is dealing with a case sensitive language. This currently affects map_identifier_filter

LuminousScanner::$filters = array()
protected

Individual token filters.

A list of lists, each filter is an array: (name, token_name, callback)

LuminousScanner::$ident_map = array()
protected

A map of identifiers and their corresponding token names.

A map of recognised identifiers, in the form identifier_string => TOKEN_NAME

This is currently used by map_identifier_filter

LuminousScanner::$rule_tag_map = array()
protected

Rule remappings.

A map to handle re-mapping of rules, in the form: OLD_TOKEN_NAME => NEW_TOKEN_NAME

This is used by rule_mapper_filter()

LuminousScanner::$state_ = array()
protected

State stack.

A stack of the scanner's state, should the scanner wish to use a stack based state mechanism.

The top element can be retrieved (but not popped) with stack()

TODO More useful functions for manipulating the stack

LuminousScanner::$stream_filters = array()
protected

Token stream filters.

A list of lists, each filter is an array: (name, callback)

LuminousScanner::$tokens = array()
protected

The token stream.

The token stream is recorded as a flat array of tokens. A token is made up of 3 parts, and stored as an array:

  • 0 => Token name
  • 1 => Token string (from input source code)
  • 2 => XML-Escaped?
LuminousScanner::$user_defs
protected

Identifier remappings based on definitions identified in the source code.

A map of remappings of user-defined types/functions. This is a map of identifier_string => TOKEN_NAME

This is used by user_def_filter()


The documentation for this class was generated from the following file: