As you may or may not know, CSV stands for "comma separated values" and it is a common method to group fields and records; the fields are separated by commas and the records are separated by new lines.
For this example, we shall use the following text file:
one,two,three,four,five six,seven,eight,nine,ten eleven,twelve,thirteen,fourteen,fifteen sixteen,seventeen,eighteen,nineteen,twenty
In order to turn this into XML, we'll need to itterate over each newline and then split up each line based on the comma.
The itteration bit is rather simple so lets flesh out a Regular Expression Stylesheet using the root element of "csv-data" along with the iteration over the lines themselves and then prints out each line encapsulated in a "line" tag:
<?xml version="1.0"?> <csv-data xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <re:for-each regex="(?m)^.*$"> <line> <re:match-string /> </line> </re:for-each> </csv-data>
One of the first things to note is the (?m) at the beginning of the regex in the for-each directive. This tells the regular-expression compiler within Java that we are to match multi-lines. Thus, with this regular expression, we are matching from the beginning of each line to the end of that very same line, and for each match found the children of this for-each directive are applied.
Let's take a look at the output by invoking the command line processor like: java net.sourceforge.regexxmlreader.Process -in tutorial-example-01.txt -rxl tutorial-example-01.rxl
<?xml version="1.0" encoding="us-ascii"?> <csv-data> <line>one,two,three,four,five</line> <line>six,seven,eight,nine,ten</line> <line>eleven,twelve,thirteen,fourteen,fifteen</line> <line>sixteen,seventeen,eighteen,nineteen,twenty</line> </csv-data>
Now we are close to what we want to do but not quite there; we also need to break up each line on the comma and there are a few ways that we can do this:
We can split the text up using the split directive and access each part with the group directive like so:
We can split the text up iterating over each part with the for-each directive:
We can use grouping within the regular expression of a match attribute within an external element:
We can do essentially the same thing using the match directive:
Now for the final bit of this part of the tutorial, we shall do change the CSV file into XML using the split directive as well as place an attribute in the containing, out-going element. Here is the stylesheet:
<?xml version="1.0"?> <csv-data xmlns:re="http://regexxmlreader.sourceforge.net/1.0"> <re:for-each regex="(?m)^.*$"> <line> <re:split regex=","> <re:group> <element-one> <re:match-string /> </element-one> </re:group> <re:group> <re:attribute name="element-two"> <re:match-string /> </re:attribute> </re:group> <re:group> <element-three> <re:match-string /> </element-three> </re:group> </re:split> </line> </re:for-each> </csv-data>
And here is the result:
<?xml version="1.0" encoding="us-ascii"?> <csv-data> <line element-two="two"> <element-one>one</element-one> <element-three>three</element-three> </line> <line element-two="seven"> <element-one>six</element-one> <element-three>eight</element-three> </line> <line element-two="twelve"> <element-one>eleven</element-one> <element-three>thirteen</element-three> </line> <line element-two="seventeen"> <element-one>sixteen</element-one> <element-three>eighteen</element-three> </line> </csv-data>