\seekquarry\yioop\libraryWikiParser

Class with methods to parse mediawiki documents, both within Yioop, and when Yioop indexes mediawiki dumps as from Wikipedia.

Summary

Methods
Properties
Constants
__construct()
parse()
processRegexes()
processProvidedRegexes()
cleanLinksAndParagraphs()
makeTableOfContents()
makeReferences()
insertTableOfContents()
insertReferences()
fetchLinks()
$esc
$minimal
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$esc

$esc : string

Escape string to try to prevent incorrect nesting of div for some of the substitutions;

Type

string

$minimal

$minimal : boolean

Whether the parser should be configured only to do minimal substituitions or all available (minimal might be used for posts in discussion groups)

Type

boolean

Methods

__construct()

__construct(string  $base_address = "", array  $add_substitutions = array(), boolean  $minimal = false) 

Used to initialize the arrays of match/replacements used to format wikimedia syntax into HTML (not perfectly since we are only doing regexes)

Parameters

string $base_address

base url for link substitutions

array $add_substitutions

additional wiki rule subsitutions in addition to the default ones that should be used by this wiki parser

boolean $minimal

substitution list is shorter - suitable for posting to discussion

parse()

parse(string  $document, boolean  $parse_head_vars = true, boolean  $handle_big_files = false) : string

Parses a mediawiki document to produce an HTML equivalent

Parameters

string $document

a document which might have mediawiki markup

boolean $parse_head_vars

header variables are an extension of mediawiki syntax used to add meta variable and titles to the head tag of an html document. This flag controls whether to support this extension or not

boolean $handle_big_files

for indexing purposes Yioop by default truncates long documents before indexing them. If true, this method does not do this default truncation. The true value is more useful when using Yioop's built-in wiki.

Returns

string —

HTML document obtained by parsing mediawiki markup in $document

processRegexes()

processRegexes(string  $document) : string

Applies all the wiki subsitutions of this WikiParser to the document to create an html document makes use of @see processProvidedRegexes

Parameters

string $document

a document with wiki syntax

Returns

string —

result of subistutions to make html

processProvidedRegexes()

processProvidedRegexes(array  $matches, array  $replaces, string  $document) : string

Applies a set of transformations from wiki syntax to html to a document

Parameters

array $matches

an array of things to match for

array $replaces

what to replace matches with

string $document

wiki document to fix

Returns

string —

document after substitutions

cleanLinksAndParagraphs()

cleanLinksAndParagraphs(string  $document) : string

Replaces with underscores, links with spaces, fixes newline issues within span tags

Parameters

string $document

wiki document to fix

Returns

string —

document after substitutions

makeTableOfContents()

makeTableOfContents(string  $page) : string

Used to make a table of contents for a wiki page based on the level two headings on that page.

Parameters

string $page

a wiki document

Returns

string —

HTML table of contents to be inserted after wiki page processed

makeReferences()

makeReferences(string  $page) : string

Used to make a reference list for a wiki page based on the cite tags on that page.

Parameters

string $page

a wiki document

Returns

string —

HTML reference list to be inserted after wiki page processed

insertTableOfContents()

insertTableOfContents(string  $page, string  $toc) : string

After regex processing has been done on a wiki page this function inserts into the resulting page a table of contents just before the first h2 tag, then returns the result page

Parameters

string $page

page in which to insert table of contents

string $toc

HTML table of contents

Returns

string —

resulting page after insert

insertReferences()

insertReferences(string  $page, string  $references) : string

After regex processing has been done on a wiki page this function inserts into the resulting page a reference at {{reflist locations, then returns the result page

Parameters

string $page

page in which to insert the reference lists

string $references

HTML table of contents

Returns

string —

resulting page after insert

fetchLinks()

fetchLinks(array  $document) : array

Fetches internal links from wiki syntax.

Parameters

array $document

a wiki document

Returns

array —

of linked page names in the format page_name|relationship_type