Using XML Queries and Transformations

XLST

In Chapter 2 we saw how we can specify the XML format that our application can work with using validation rules. When we want to exchange information with other applications, it would be nice if everyone would use the same document types (that is use the same validation rules). However, it is inevitable that, for comparable types of data, several document types will emerge. Some repositories will emerge, where schemas and DTDs can be stored and shared. Often these are industry-wide initiatives. However, several schemas for the same data will exist.

Therefore, it would be very handy to have a tool or tools to convert a document from one schema to another. These would consist of a set of rules that describe exactly how and where a piece of content in document type A should appear in document type B. These rules might as well be described in XML themselves. This is exactly what XSLT is – a language to specify how to transform an XML document of one type to another document type.

To be completely honest with you, when the XSLT initiative was started, this was not the goal. Back then it was called XSL (eXtensible Stylesheet Language) and its target was to convert an XML document to HTML. The specification was divided into two parts: the transformation part (which became XSLT) and the formatting objects part (XSL-FO). This decision was made because the development of the two parts of the XSL specification happened at different rates. Indeed, XSLT has recently become recommended, though XSL-FO is still in the early stages of development. In addition, the XSL query language, included in the earlier XSL specification, was removed and combined with the path syntax in XPointer to form XPath.

So we have two recommendations: XPath and XSLT, and some specifications that will still undergo serious changes. As XSL-FO is still so premature, it will not be covered in this book.

When the work was in progress, the editors started to understand that the fields of application of their work were much broader than just creating HTML. This is still one of the purposes of XSLT, but only one of many. In the remainder of this chapter we will focus on the broader possibilities of XSLT, and will show how to use it for HTML generation at the end of the chapter.

How Transformation Works

Transforming an XML document from one format into another always involves three documents: the source document, the destination document and the document holding the transformation rules, the XSLT stylesheet:

Each stylesheet in XSLT consists of a number of templates. A template defines how a certain kind of content in the source document appears in the destination document. A template always has an XPath expression that describes what nodes in the source the template applies to.

Most programming languages start their execution at a specific place in the program code (in Visual Basic, this is Sub Main()). XSLT is different. It starts with the data and searches for the right code to execute with that data. When a document is transformed with an XSLT stylesheet, the start node is the document root. Now the following steps will be taken:

1.    The processor searches for the most suitable template in the stylesheet for transforming this node. (We'll talk about what makes a template suitable later).

2.    This template defines certain output nodes, which are added to the result document.

3.    The template can also specify which nodes should be processed next. For all of these nodes, go to step 1.

The process ends when no more nodes are specified to process next. The most common form is that every template tells the processor to continue by processing the children of the current node. This makes sure that all nodes will get processed and that no infinite loops can occur.

Programming stylesheets is an art of its own and the very recursive nature of the task will sometimes puzzle the average VB programmer. It can help to think of a template as an event handler. At the start of the transformation, the event for processing the root is raised. The processor selects the best handler and executes this. This event handler produces nodes in the output document, but can also raise events itself. For all of these raised events, the XSLT processor will again search in the stylesheet for handlers, etc…

Before we look at writing stylesheets, let's take a look at the other requirement for transformations – XSLT processors.

Some Good XSLT Processors

At the time of writing, the XSLT specification was still very fresh, so implementations of the full specification were still scarce. The best one at the time was SAXON (at least the best implementation that I could find). SAXON is implemented in Java, with source code available, but also a Win32 binary can be downloaded (http://users.iclway.co.uk/mhkay/saxon/). This can be called as follows:

saxon –o destination.xml source.xml stylesheet.xsl

Another well-known implementation is XT by James Clark. Clark was one of the main contributors to the XSLT specification and has always tried to keep his implementation following the specification as close as possible. At the time of writing there were still a few features unimplemented in XT, but a full version will undoubtedly be released (download from www.jclark.com/xml/xt.html). Like SAXON, XT is distributed as Java classes and code, but can also be downloaded in binary form, allowing use like this:

xt source.xml stylesheet.xsl destination.xml

The third implementation that should be mentioned in a book for VB programmers is the Microsoft MSXML library. The version available at the time of writing was dated March 1999, and is therefore rather out-of-date. Microsoft has promised that the full specification will be included in a next release. The fact that these libraries can be used as COM objects from VB code or scripting gives them a huge advantage over the command-line based competition. The performance of the MSXML library is much better than that of the Java-based implementations at the moment, but of course, implementations with different functionality are hard to compare.

To give developers a head start when the newer library is released, Microsoft has published a 'developers preview' in January 2000 (this is the same preview that was mentioned when we discussed XPath). This preview can be used side by side with the older library and partially implements the final specification of XSLT. (Check Appendix D to see exactly which parts are implemented). With the MSXML library, you could do something like:

  Dim oDoc as new DOMDocument
  Dim oXSLT as new DOMDocument
  oDoc.async = false
  oXSLT.async = false
  oDoc.load "http://www.comp.com/sourceDocument.xml"
  oXSLT.load "http://www.comp.com/stylesheet.xsl"
  sResult = oDoc.transformNode(oXSLT)

The transformNode method returns a string holding the full transformed document. The current version of MSXML can be downloaded from http://msdn.microsoft.com/downloads/tools/xmlparser/xmlredist.exe, and the developers preview from http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Better train people and risk they leave – than do nothing and risk they stay.” - Anonymous