\seekquarry\yioop\librarySuffixTree

Data structure used to maintain a suffix tree for a passage of words.

The suffix tree is constructed using the linear time algorithm of Ukkonen, E. (1995). "On-line construction of suffix trees". Algorithmica 14 (3): 249–260.

Summary

Methods
Properties
Constants
__construct()
buildTree()
makeNode()
edgeLength()
addSuffixLink()
walkDown()
suffixTreeExtend()
outputMaximal()
$root
$last_added
$pos
$need_sym_link
$remainder
$active_index
$active_edge_index
$active_len
$size
$text
$tree
INFTY
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Constants

INFTY

INFTY

Upper bound on the length of any path in the tree

Properties

$root

$root : array

The root node of the suffix trees

Type

array

$last_added

$last_added : integer

Index of last node added to the suffix tree in the array used to hold the suffix tree data structures

Type

integer

$pos

$pos : integer

Position in the $this->text up to which we have created a suffix tree so far

Type

integer

$need_sym_link

$need_sym_link : integer

If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. This variable keeps track of the index of the first node so we can do this.

Type

integer

$remainder

$remainder : integer

At a given stage in building the suffix tree how many new suffixes we need to insert

Type

integer

$active_index

$active_index : integer

Node which represents the left hand the start of the active edge This is the edge that contains the last suffix inserted

Type

integer

$active_edge_index

$active_edge_index : integer

Index into $this->text of starting word of active edge

Type

integer

$active_len

$active_len : integer

How many words from the start of the active edge label to get the last suffix. If active edge label was: "a black cat a black" and $active_len was 2, then would have "a black" from the first two chars.

Type

integer

$size

$size : integer

Number of elements in $this->text. i.e., count($this->text)

Type

integer

$text

$text : array

The sequence of terms, one array entry per term, that a suffix tree is to be made from

Type

array

$tree

$tree : array

Used to hold the suffix tree data structure (represented as a sequence of nodes)

Type

array

Methods

__construct()

__construct(array  $text) 

Initializes a suffix tree based on the supplied array of terms.

Parameters

array $text

a sequence of terms to build the suffix tree for

buildTree()

buildTree() 

Builds the complete suffix tree for the text currently stored in $this->text. If you change this text and call this method again, it build a new tree based on the new text. Uses Ukkonen

makeNode()

makeNode(integer  $start, integer  $end = self::INFTY) 

Makes a new node for the suffix tree structure. This node is inserted at the end of the tree so far. A node is associative array consisting of the fields "start" whose value is the starting location in $this->text for this node, "end" location in $this->text up to which this node is responsible, "sym_link" is a link to an isomorphic subtree for the purposes of building the suffix tree, and "next" is an array of next children in the tree.

Parameters

integer $start

what to use as the start value mentioned above

integer $end

what to use as the start value mentioned above

edgeLength()

edgeLength(\seekquarry\yioop\library\array&  $node) 

The number of elements out of $this->text that this node is currently responsible for

Parameters

\seekquarry\yioop\library\array& $node

the node to compute the length of

addSuffixLink()

addSuffixLink(integer  $index) 

If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. If in the current step it is necessary to add a sym_link this method sets the $this->need_sym_link node's "sym_link" field to $index which is supposed be the index of the second created node.

Parameters

integer $index

the index of the a created node in a given step. ($this->need_sym_link will be greater than 0 if it is the second created node of the step)

walkDown()

walkDown(integer  $index) : \seekquarry\yioop\library\if

Used to set the active point to the node given by $index

Parameters

integer $index

which node to use for setting

Returns

\seekquarry\yioop\library\if —

the current active edge is longer than $index's edge length then don't update and return false; otherwise, return true

suffixTreeExtend()

suffixTreeExtend() 

Given a suffix tree of the array of terms in $this->text up to $this->pos, adds one to pos and build the suffix tree up to this new value. i.e., the text with one more term added.

outputMaximal()

outputMaximal(integer  $index, string  $path, integer  $len, \seekquarry\yioop\library\array&  $maximal) 

Recursive function used to compute the maximal phrases in a document as well as their conditional maximal subphrases.

Parameters

integer $index

a node in the suffix tree

string $path

from root to current node

integer $len

number of nodes from root to current node in suffix tree

\seekquarry\yioop\library\array& $maximal

assoc array of phrase => (cond_max => pos of conditional maximal subphrase, [0] => pos_1st_occurrence of phrase, [1]=>pos_2nd_occurrence of phrase, etc)