the base class for all scanners More...

Inheritance diagram for LuminousScanner:

Collaboration diagram for LuminousScanner:

Public Member Functions
	__construct ($src=null)
	constructor
	add_filter ($arg1, $arg2, $arg3=null)
	Add an individual token filter.
	add_identifier_mapping ($name, $matches)
	Adds an identifier mapping which is later analysed by map_identifier_filter.
	add_stream_filter ($arg1, $arg2=null)
	Adds a stream filter.
	highlight ($src)
	Public convenience function for setting the string and highlighting it.
	init ()
	Set up the scanner immediately prior to tokenization.
	main ()
	the method responsible for tokenization
	map_identifier_filter ($token)
	Identifier mapping filter.
	nestable_token ($token_name, $open, $close)
	Handles tokens that may nest inside themselves.
	pop ()
	Pops the top element of the stack, and returns it.
	push ($state)
	Pushes some data onto the stack.
	record ($string, $type, $pre_escaped=false)
	Records a string as a given token type.
	record_range ($from, $to, $type)
	Helper function to record a range of the string.
	remove_filter ($name)
	Removes the individual filter(s) with the given name.
	remove_stream_filter ($name)
	Removes the stream filter(s) with the given name.
	skip_whitespace ()
	Skips whitespace, and records it as a null token.
	start ()
	Flushes the token stream.
	state ()
	Gets the top element on $state_ or null if it is empty.
	tagged ()
	Returns the XML representation of the token stream.
	token_array ()
	Gets the token array.
Public Member Functions inherited from Scanner
	add_pattern ($name, $pattern)
	Allows the caller to add a predefined named pattern.
	bol ()
	Beginning of line?
	check ($pattern)
	Non-consuming lookahead.
	eol ()
	End of line?
	eos ()
	End of string?
	get ($n=1)
	Consume a given number of bytes.
	get_next ($patterns)
	Look for the next occurrence of a set of patterns.
	get_next_named ($patterns)
	Find the index of the next occurrence of a named pattern.
	get_next_strpos ($patterns)
	Look for the next occurrence of a set of substrings.
	index ($pattern)
	Find the index of the next occurrence of a pattern.
	match ()
	Get the result of the most recent match operation.
	match_group ($g=0)
	Get a group from the most recent match operation.
	match_groups ()
	Get the match groups of the most recent match operation.
	match_pos ()
	Get the position (offset) of the most recent match.
	next_match ($consume_and_log=true)
	Automation function: returns the next occurrence of any known patterns.
	peek ($n=1)
	Lookahead into the string a given number of bytes.
	pos ($new_pos=null)
	Getter and setter for the current position (string pointer).
	pos_shift ($offset)
	Moves the string pointer by a given offset.
	remove_pattern ($name)
	Allows the caller to remove a named pattern.
	reset ()
	Reset the scanner.
	rest ()
	Gets the remaining string.
	scan ($pattern)
	Scans at the current pointer.
	scan_until ($pattern)
	Scans until the start of a pattern.
	string ($s=null)
	Getter and setter for the source string.
	terminate ()
	Ends scanning of a string.
	unscan ()
	Revert the most recent scanning operation.

Static Public Member Functions
static	guess_language ($src, $info)
	Language guessing.

Public Attributes
	$version = 'master'
	scanner version.

Protected Member Functions
	rule_mapper_filter ($tokens)
	Rule re-mapper filter.
	user_def_filter ($token)
	Filter to highlight identifiers whose definitions are in the source.

Protected Attributes
	$case_sensitive = true
	Whether or not the language is case sensitive.
	$filters = array()
	Individual token filters.
	$ident_map = array()
	A map of identifiers and their corresponding token names.
	$rule_tag_map = array()
	Rule remappings.
	$state_ = array()
	State stack.
	$stream_filters = array()
	Token stream filters.
	$tokens = array()
	The token stream.
	$user_defs
	Identifier remappings based on definitions identified in the source code.

Detailed Description

the base class for all scanners

LuminousScanner is the base class for all language scanners. Here we provide a set of methods comprising a highlighting layer. This includes recording a token stream, and ultimately being responsible for producing some XML representing the token stream.

We also define here some filters which rely on state information expected to be recorded into the instance variables.

Highlighting a string at this level is a four-stage process:

 @li string() - set the string
 @li init() - set up the scanner
 @li main() - perform tokenization
 @li tagged() - build the XML

A note on tokens: Tokens are stored as an array with the following indices:

0: Token name (e.g. 'COMMENT'
1: Token string (e.g. '// foo')
2: escaped? (bool) Because it's often more convenient to embed nested tokens by tagging token string, we need to escape it. This index stores whether or nto it has been escaped.

Member Function Documentation

LuminousScanner::add_filter	(	$arg1,
		$arg2,
		$arg3 = `null`
	)

Add an individual token filter.

Adds an indivdual token filter. The filter is bound to the given token_name. The filter is a callback which should take a token and return a token.

The arguments are: [name], token_name, filter

Name is an optional argument.

LuminousScanner::add_identifier_mapping	(	$name,
		$matches
	)

Adds an identifier mapping which is later analysed by map_identifier_filter.

Parameters

$name	The token name
$matches	an array of identifiers which correspond to this token name, i.e. add_identifier_mapping('KEYWORD', array('if', 'else', ...));

This method observes LuminousScanner::$case_sensitive

LuminousScanner::add_stream_filter	(	$arg1,
		$arg2 = `null`
	)

Adds a stream filter.

A stream filter receives the entire token stream and should return it.

The parameters are: ([name], filter). Name is an optional argument.

static LuminousScanner::guess_language	(	$src,
		$info
	)

static

Language guessing.

Each real language scanner should override this method and implement a simple guessing function to estimate how likely the input source code is to be the language which they recognise.

Parameters

$src	the input source code

Returns: The estimated chance that the source code is in the same language as the one the scanner tokenizes, as a real number between 0 (least likely) and 1 (most likely), inclusive

LuminousScanner::highlight ( $src )

Public convenience function for setting the string and highlighting it.

Alias for: $s->string($src) $s->init(); $s->main(); return $s->tagged();

Returns: the highlighted string, as an XML string

LuminousScanner::init ( )

Set up the scanner immediately prior to tokenization.

The init method is always called prior to main(). At this stage, all configuration variables are assumed to have been set, and it's now time for the scanner to perform any last set-up information. This may include actually finalizing its rule patterns. Some scanners may not need to override this if they are in no way dynamic.

LuminousScanner::main ( )

the method responsible for tokenization

The main method is fully responsible for tokenizing the string stored in string() at the time of its call. By the time main returns, it should have consumed the whole of the string and populated the token array.

Reimplemented in LuminousStatefulScanner, and LuminousSimpleScanner.

LuminousScanner::map_identifier_filter ( $token )

Identifier mapping filter.

Tries to map any 'IDENT' token to a TOKEN_NAME in LuminousScanner::$ident_map This is implemented as the filter 'map-ident'

LuminousScanner::nestable_token	(	$token_name,
		$open,
		$close
	)

Handles tokens that may nest inside themselves.

Convenience function. It's fairly common for many languages to allow things like nestable comments. Handling these is easy but fairly long winded, so this function will take an opening and closing delimiter and consume the token until it is fully closed, or until the end of the string in the case that it is unterminated.

When the function returns, the token will have been consumed and appended to the token stream.

Parameters

$token_name	the name of the token
$open	the opening delimiter pattern (regex), e.g. '% /\* x'
$close	the closing delimiter pattern (regex), e.g. '% \* /x'

Warning: Although PCRE provides recursive regular expressions, this function is far preferable. A recursive regex will easily crash PCRE on garbage input due to it having a fairly small stack: this function is much more resilient.

Exceptions

Exception if called at a non-matching point (i.e. $this->scan($open) does not match)

LuminousScanner::pop ( )

Pops the top element of the stack, and returns it.

Exceptions

Exception if the state stack is empty

LuminousScanner::record	(	$string,
		$type,
		$pre_escaped = `false`
	)

Records a string as a given token type.

Parameters

$string	The string to record
$type	The name of the token the string represents
$pre_escaped	Luminous works towards getting this in XML and therefore at some point, the $string has to be escaped. If you have already escaped it for some reason (or if you got it from another scanner), then you want to set this to `TRUE`

See Also: LuminousUtils::escape_string

Exceptions

Exception if $string is NULL

Reimplemented in LuminousStatefulScanner.

LuminousScanner::record_range	(	$from,
		$to,
		$type
	)

Helper function to record a range of the string.

Parameters

$from	the start index
$to	the end index
$type	the type of the token This is shorthand for `$this->record(substr($this->string(), $from, $to-$from)`

Exceptions

RangeException if the range is invalid (i.e. $to < $from)

An empty range (i.e. $to === $from) is allowed, but it is essentially a no-op.

Reimplemented in LuminousStatefulScanner.

LuminousScanner::rule_mapper_filter ( $tokens )

protected

Rule re-mapper filter.

Re-maps token rules according to the LuminousScanner::rule_tag_map map. This is called as the filter 'rule-map'

LuminousScanner::skip_whitespace ( )

Skips whitespace, and records it as a null token.

Convenience function

LuminousScanner::tagged ( )

Returns the XML representation of the token stream.

This function triggers the generation of the XML output.

Returns: An XML-string which represents the tokens recorded by the scanner.

Reimplemented in LuminousStatefulScanner.

LuminousScanner::token_array ( )

Gets the token array.

Returns: The token array

LuminousScanner::user_def_filter ( $token )

protected

Filter to highlight identifiers whose definitions are in the source.

maps anything recorded in LuminousScanner::user_defs to the recorded type. This is called as the filter 'user-defs'

Member Data Documentation

LuminousScanner::$case_sensitive = true

protected

Whether or not the language is case sensitive.

Whether or not the scanner is dealing with a case sensitive language. This currently affects map_identifier_filter

LuminousScanner::$filters = array()

protected

Individual token filters.

A list of lists, each filter is an array: (name, token_name, callback)

LuminousScanner::$ident_map = array()

protected

A map of identifiers and their corresponding token names.

A map of recognised identifiers, in the form identifier_string => TOKEN_NAME

This is currently used by map_identifier_filter

LuminousScanner::$rule_tag_map = array()

protected

Rule remappings.

A map to handle re-mapping of rules, in the form: OLD_TOKEN_NAME => NEW_TOKEN_NAME

This is used by rule_mapper_filter()

LuminousScanner::$state_ = array()

protected

State stack.

A stack of the scanner's state, should the scanner wish to use a stack based state mechanism.

The top element can be retrieved (but not popped) with stack()

TODO More useful functions for manipulating the stack

LuminousScanner::$stream_filters = array()

protected

Token stream filters.

A list of lists, each filter is an array: (name, callback)

LuminousScanner::$tokens = array()

protected

The token stream.

The token stream is recorded as a flat array of tokens. A token is made up of 3 parts, and stored as an array:

0 => Token name
1 => Token string (from input source code)
2 => XML-Escaped?

LuminousScanner::$user_defs

protected

Identifier remappings based on definitions identified in the source code.

A map of remappings of user-defined types/functions. This is a map of identifier_string => TOKEN_NAME

This is used by user_def_filter()

The documentation for this class was generated from the following file:

src/core/scanner.class.php

Public Member Functions

Static Public Member Functions

Public Attributes

Protected Member Functions

Protected Attributes

Detailed Description

Member Function Documentation

Member Data Documentation