Luminous  git-master
 All Classes Files Functions Variables
Public Member Functions | Public Attributes | Protected Attributes | List of all members
LuminousEmbeddedWebScript Class Reference

Superclass for languages which may nest, i.e. web languages. More...

Inheritance diagram for LuminousEmbeddedWebScript:
Inheritance graph
[legend]
Collaboration diagram for LuminousEmbeddedWebScript:
Collaboration graph
[legend]

Public Member Functions

 add_child_scanner ($name, $scanner)
 adds a child scanner Adds a child scanner and indexes it against a name, convenience function
 dirty_exit ($token_name)
 Sets the exit data to signify the exit is dirty and will need recovering from.
 resume ()
 Attempts to recover from a dirty exit.
 script_break ($token_name, $match=null, $pos=null)
 Checks for a script terminator tag inside a matched token.
 server_break ($token_name, $match=null, $pos=null)
 Checks for a server-side script inside a matched token.
 string ($str=null)
 Getter and setter for the source string.
- Public Member Functions inherited from LuminousScanner
 __construct ($src=null)
 constructor
 add_filter ($arg1, $arg2, $arg3=null)
 Add an individual token filter.
 add_identifier_mapping ($name, $matches)
 Adds an identifier mapping which is later analysed by map_identifier_filter.
 add_stream_filter ($arg1, $arg2=null)
 Adds a stream filter.
 highlight ($src)
 Public convenience function for setting the string and highlighting it.
 init ()
 Set up the scanner immediately prior to tokenization.
 main ()
 the method responsible for tokenization
 map_identifier_filter ($token)
 Identifier mapping filter.
 nestable_token ($token_name, $open, $close)
 Handles tokens that may nest inside themselves.
 pop ()
 Pops the top element of the stack, and returns it.
 push ($state)
 Pushes some data onto the stack.
 record ($string, $type, $pre_escaped=false)
 Records a string as a given token type.
 record_range ($from, $to, $type)
 Helper function to record a range of the string.
 remove_filter ($name)
 Removes the individual filter(s) with the given name.
 remove_stream_filter ($name)
 Removes the stream filter(s) with the given name.
 skip_whitespace ()
 Skips whitespace, and records it as a null token.
 start ()
 Flushes the token stream.
 state ()
 Gets the top element on $state_ or null if it is empty.
 tagged ()
 Returns the XML representation of the token stream.
 token_array ()
 Gets the token array.
- Public Member Functions inherited from Scanner
 add_pattern ($name, $pattern)
 Allows the caller to add a predefined named pattern.
 bol ()
 Beginning of line?
 check ($pattern)
 Non-consuming lookahead.
 eol ()
 End of line?
 eos ()
 End of string?
 get ($n=1)
 Consume a given number of bytes.
 get_next ($patterns)
 Look for the next occurrence of a set of patterns.
 get_next_named ($patterns)
 Find the index of the next occurrence of a named pattern.
 get_next_strpos ($patterns)
 Look for the next occurrence of a set of substrings.
 index ($pattern)
 Find the index of the next occurrence of a pattern.
 match ()
 Get the result of the most recent match operation.
 match_group ($g=0)
 Get a group from the most recent match operation.
 match_groups ()
 Get the match groups of the most recent match operation.
 match_pos ()
 Get the position (offset) of the most recent match.
 next_match ($consume_and_log=true)
 Automation function: returns the next occurrence of any known patterns.
 peek ($n=1)
 Lookahead into the string a given number of bytes.
 pos ($new_pos=null)
 Getter and setter for the current position (string pointer).
 pos_shift ($offset)
 Moves the string pointer by a given offset.
 remove_pattern ($name)
 Allows the caller to remove a named pattern.
 reset ()
 Reset the scanner.
 rest ()
 Gets the remaining string.
 scan ($pattern)
 Scans at the current pointer.
 scan_until ($pattern)
 Scans until the start of a pattern.
 terminate ()
 Ends scanning of a string.
 unscan ()
 Revert the most recent scanning operation.

Public Attributes

 $clean_exit = true
 Clean exit or inconvenient, mid-token forced exit.
 $embedded_html = false
 Is the source embedded in HTML?
 $embedded_server = false
 Is the source embedded in a server-side script (e.g. PHP)?
 $interrupt = false
 I think this is ignored and obsolete.
 $script_tags
 closing HTML tag for our code, e.g </script>
 $server_tags = '/<\?/'
 Opening tag for server-side code. This is a regular expression.
- Public Attributes inherited from LuminousScanner
 $version = 'master'
 scanner version.

Protected Attributes

 $child_scanners = array()
 Child scanners.
 $dirty_exit_recovery = array()
 Recovery patterns for when we reach an untimely interrupt.
 $exit_state = null
 Name of interrupted token, in case of a dirty exit.
- Protected Attributes inherited from LuminousScanner
 $case_sensitive = true
 Whether or not the language is case sensitive.
 $filters = array()
 Individual token filters.
 $ident_map = array()
 A map of identifiers and their corresponding token names.
 $rule_tag_map = array()
 Rule remappings.
 $state_ = array()
 State stack.
 $stream_filters = array()
 Token stream filters.
 $tokens = array()
 The token stream.
 $user_defs
 Identifier remappings based on definitions identified in the source code.

Additional Inherited Members

- Static Public Member Functions inherited from LuminousScanner
static guess_language ($src, $info)
 Language guessing.
- Protected Member Functions inherited from LuminousScanner
 rule_mapper_filter ($tokens)
 Rule re-mapper filter.
 user_def_filter ($token)
 Filter to highlight identifiers whose definitions are in the source.

Detailed Description

Superclass for languages which may nest, i.e. web languages.

Web languages get their own special class because they have to deal with server-script code embedded inside them and the potential for languages nested under them (PHP has HTML, HTML has CSS and JavaScript)

The relationship is strictly hierarchical, not recursive descent Meeting a '<?' in CSS bubbles up to HTML and then up to PHP (or whatever). The top-level scanner is ultimately what should have sub-scanner code embedded in its own token stream.

The scanners should be persistent, so only one JavaScript scanner exists even if there are 20 javascript tags. This is so they can keep persistent state, which might be necessary if they are interrupted by server-side tags. For this reason, the main() method might be called multiple times, therefore each web sub-scanner should

The init method of the class should be used to set relevant rules based on whether or not the embedded flags are set; and therefore the embedded flags should be set before init is called.

Member Function Documentation

LuminousEmbeddedWebScript::dirty_exit (   $token_name)

Sets the exit data to signify the exit is dirty and will need recovering from.

Parameters
$token_namethe name of the token which is being interrupted
Exceptions
Exceptionif no recovery data is associated with the given token.
LuminousEmbeddedWebScript::resume ( )

Attempts to recover from a dirty exit.

This method should be called on every iteration of the main loop when LuminousEmbeddedWebScript::$clean_exit is FALSE. It will attempt to recover from an interruption which left the scanner in the middle of a token. The remainder of the token will be in Scanner::match() as usual.

Returns
the name of the token which was interrupted
Note
there is no reason why a scanner should fail to recover from this, and failing is classed as an implementation error, therefore assertions will be failed and errors will be spewed forth. A failure can either be because no recovery regex is set, or that the recovery regex did not match. The former should never have been tagged as a dirty exit and the latter should be rewritten so it must definitely match, even if the match is zero-length or the remainder of the string.
LuminousEmbeddedWebScript::script_break (   $token_name,
  $match = null,
  $pos = null 
)

Checks for a script terminator tag inside a matched token.

Parameters
$token_nameThe token name of the matched text
$matchThe string from the last match. If this is left NULL then Scanner::match() is assumed to hold the match.
$posThe position of the last match. If this is left NULL then Scanner::match_pos() is assumed to hold the offset.
Returns
TRUE if the scanner should break, else FALSE

This method checks whether the string provided as match contains the string in LuminousEmbeddedWebScript::script_tags. If yes, then it records the substring as $token_name, advances the scan pointer to immediately before the script tags, and returns TRUE. Returning TRUE is a signal that the scanner should break immediately and let its parent scanner take over.

This condition is a 'clean_exit'.

LuminousEmbeddedWebScript::server_break (   $token_name,
  $match = null,
  $pos = null 
)

Checks for a server-side script inside a matched token.

Parameters
$token_nameThe token name of the matched text
$matchThe string from the last match. If this is left NULL then Scanner::match() is assumed to hold the match.
$posThe position of the last match. If this is left NULL then Scanner::match_pos() is assumed to hold the offset.
Returns
TRUE if the scanner should break, else FALSE

This method checks whether an interruption by a server-side script tag, LuminousEmbeddedWebScript::server_tags, occurs within a matched token. If it does, this method records the substring up until that point as the provided $token_name, and also sets up a 'dirty exit'. This means that some type was interrupted and we expect to have to recover from it when the server-side language's scanner has ended.

Returning TRUE is a signal that the scanner should break immediately and let its parent scanner take over.

LuminousEmbeddedWebScript::string (   $s = null)

Getter and setter for the source string.

Parameters
$sThe new source string (leave as NULL to use this method as a getter)
Returns
The current source string
Note
This method triggers a reset()
Any strings passed into this method are converted to Unix line endings, i.e. \n

Reimplemented from Scanner.

Member Data Documentation

LuminousEmbeddedWebScript::$child_scanners = array()
protected

Child scanners.

Persistent storage of child scanners, name => scanner (instance)

LuminousEmbeddedWebScript::$clean_exit = true

Clean exit or inconvenient, mid-token forced exit.

Signifies whether the program exited due to inconvenient interruption by a parent language (i.e. a server-side langauge), or whether it reached a legitimate break. A server-side language isn't necessarily a dirty exit, but if it comes in the middle of a token it is, because we need to resume from it later. e.g.:

var x = "this is \<?php echo 'a' ?\> string";

LuminousEmbeddedWebScript::$dirty_exit_recovery = array()
protected

Recovery patterns for when we reach an untimely interrupt.

If we reach a dirty exit, when we resume we need to figure out how to continue consuming the rule that was interrupted. So essentially, this will be a regex which matches the rule without start delimiters.

This is a map of rule => pattern

LuminousEmbeddedWebScript::$embedded_html = false

Is the source embedded in HTML?

Embedded in HTML? i.e. do we need to observe tag terminators like </script>

LuminousEmbeddedWebScript::$embedded_server = false

Is the source embedded in a server-side script (e.g. PHP)?

Embedded in a server side language? i.e. do we need to break at (for example) <? tags?

LuminousEmbeddedWebScript::$exit_state = null
protected

Name of interrupted token, in case of a dirty exit.

exit state logs our exit state in the case of a dirty exit: this is the rule that was interrupted.


The documentation for this class was generated from the following file: