RegexXMLReader Documentation: An Implementation of the XMLReader Interface | ||
---|---|---|
Prev | Chapter 2. The RegexXMLReader Stylesheet | Next |
This directive splits up the textual context stream on certain criteria found in its one of two attributes and then applies its children upon the relevant text.
Table 2-1. for-each Attributes
regex | Each matching part of this elements contextual text is separated and the children of this node are processed using each individual part of the matching text. |
split | The contextual text is broken up on this attribute's regular expression contents and the children of this node are applied on each individual part. |
During each itteration against each matching part of the incoming text stream or split part of it, the contextual text stream is modified to only that part that is pertinent. For example, <for-each regex=".">. Beacause this particular regular expression (".") matches each and every character in the incoming contextual text stream, its children are processed upon each and every character and that is they are aware of and have access to (there is one exception to this, however).
This directive modifies the current textual context according to the given criteria and applies the modified textual context upon its children. It is useful for such things as stripping out newlines or certain characters as well as normalizing text.
The group directive is used for processing either a regular expression that contains grouping commands (e.g. "([^,]+)") or an individual part of the split directive. The text of the group becomes the current contextual text.
A group's location determines which part of the match or split that is used unless the item attribute is used.
Sends the entire contextual text string to the output document.
Sends literal text to the output document.
Sends any textual-node children to the ErrorHandler as a warning.
Sends textual-node children (not raw-text) to the ErrorHandler as a non-fatal error, an exception is thrown and processing stops (thus, this is a fatal error although it is not reported to the ErrorHandler as such.
Example 2-3. error Example
<output xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <title re:match="^[A-Z]+$"> <re:match-string/> </title> <re:otherwise> <re:error> <re:text>No match for: </re:text> <re:match-string/> <re:text>Please correct this and try again.</re:text> </re:error> </re:otherwise> </output>
During the course of processing, a count is kept for each match upon a contextual text stream for each group of siblings (or, in other words, each group of children of a node has their own count beginning with zero [0]). The children of this directive are not processed unless the count is zero (0), which is to say that this node should only be processed if there has not been a match prior to this element.
This adds an attribute to the previous output element; it is not for adding an attribute to any stylsheet directive.
Example 2-5. attribute Example
<output xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <part re:match="^([A-Z]+)(.*)$"> <re:group> <re:attribute name="title"> <re:match-string/> </re:attribute> </re:group> <re:group> <re:match-string/> </re:group> </part> </output>
The above example places the first matching group in an attribute entitled "title" within the <part> element.
Causes the given regular expression in the supplied regex attribute to be applied upon the current contextual text stream and if there is a match then children of this directive are processed.
The split causes the current contextual text to be split using the regular expression found in one of the two attributes split or regex. It is important to note that both of these attributes do the exact same thing; two different attributes are provided for convenience.
The split does not itterate at all over the resulting parts of the contextual text. Rather, it simply breaks up whatever that contextual text stream is based on the given criteria. split is usually soon followed by one or more group directives. For itteration, however, the for-each directive is provided.
Table 2-6. split Attributes
split | The regular expression for which the contextutal text is split upon. |
regex | The regular expression for which the contextutal text is split upon. |
Example 2-7. split Example
<output xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <information> <re:split re:split=","> <re:group> <re:attribute name="title"> <re:match-string/> </re:attribute> </re:group> <re:group> <part-number> <re:match-string/> </part-number> </re:group> </re:split> </information> </output>
This directive causes the execution path to change to the node referenced by the ref-id attribute, which must be an id in the processor's namespace (http://regexxmlreader.sourceforge.net/1.0) although it does not need to be a directive.
Example 2-8. call-template Example
<output xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <page-number re:id="process-page-number" re:match="[0-9]+"> <re:match-string /> </page-number> <entry re:match="^([A-Z]+) +([0-9]+)$"> <re:group> <title> <re:match-string /> </title> </re:group> <re:group> <re:call-template ref-id="process-page-number"/> </re:group> </entry> </output>
The above is operates exactly the same as:
<output xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <page-number re:match="[0-9]+"> <re:match-string /> </page-number> <entry re:match="^([A-Z]+) +([0-9]+)$"> <re:group> <title> <re:match-string /> </title> </re:group> <re:group> <page-number re:match="[0-9]+"> <re:match-string /> </page-number> </re:group> </entry> </output>
This directive is simply a place holder and is meant to be used in conjunction with the call-template directive. wrapper is simply a place holder.
Example 2-9. wrapper Example
<output xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <information> <title re:match="^[A-Z]+$" re:id="handle-title"> <re:match-string/> </title> <re:wrapper id="deal-with-pages"> <page re:match="^[0-9]+$"> <re:match-string/> </page> <page-range re:match="^([0-9]+)-([0-9]+)$"> <re:group> <start> <re:match-string/> </start> </re:group> <re:group> <end> <re:match-string/> </end> </re:group> </page-range> <!-- note that wrapper resets the match count --> <re:otherwise> <re:warning> <re:text>Pages did not match: </re:text> <re:match-string/> </re:warning> </re:otherwise> </re:wrapper> <re:match regex="^([A-Z]+) ([-0-9]+)$"> <re:group> <re:call-template ref-id="handle-title"/> </re:group> <re:group> <re:call-template ref-id="deal-with-pages"/> </re:group> </re:match> </information> </output>