Using XML Queries and Transformations

XLST Elements

An XSLT document defines rules for transforming a specific kind of XML document into another kind of document. These rules are themselves defined in an XML-based document syntax. Most of this chapter will be used to describe all of the available elements in an XSLT document.

To differentiate the XSLT-specific elements in a stylesheet from other XML content, XSLT uses namespaces. The official XSLT namespace is http://www.w3.org/1999/XSL/Transform. Remember that this URI does not necessarily point to any resource. It only specifies to the XSLT processor that these elements are part of an XSLT stylesheet. In this chapter we will always use the xsl namespace prefix for XSLT elements. This assumes that all our stylesheets contain this namespace declaration:

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

For example, if we talk about the template element in the XSLT namespace, we will display it as xsl:template. Remember that this URL is not pointing to anything special. It is only used as a unique identifier to make these elements unique from all other kinds of elements (that are not specifying an XSLT stylesheet).

stylesheet

The root element of any XSLT stylesheet document is normally the stylesheet element (exceptions are the transform element and the simplified syntax; both will be explained later). It holds a number of templates and can hold some more elements that specify settings. Elements that can appear in the stylesheet element (and only there) are called top level elements. An example of a stylesheet element is shown:

  <xsl:stylesheet
    id = id 
  
    extension-element-prefixes = tokens
    exclude-result-prefixes = tokens 
  
    version = number>
  </xsl:stylesheet> 

The version attribute of the stylesheet element is necessary to ensure that later additions to the XSLT specification can be implemented without changing the old stylesheets. The current version is 1.0. When newer versions of the recommendation are specified, the version number can be increased (but the namespace for XSLT will remain stable, including the '1999'). If the version is set to anything higher than 1.0, this will also affect the way a 1.0 processor works. The processor will switch on forward compatibility mode. In this mode, the processor ignores any unknown elements or elements in unexpected places. You will rarely use the other attributes of the stylesheet element, but we'll discuss them here briefly anyway.

With the extension-element-prefixes attribute, it is possible to assign a number of namespace prefixes, other than the defined XSLT prefix, as XSLT extension prefixes. This tells the XSLT processors that support any extensions to watch out for these namespace extensions. They might be extensions that it knows. The prefixes must be defined namespaces.

If the source document contains namespace declarations, these will normally automatically appear in the result document as well. The only exception is the XSLT declaration itself. If there are any other namespaces in the source document that you do not want to show up in the output, these can be excluded with the exclude-result-prefixes attribute.

Just to give you the idea, we'll have a look at an extremely simple stylesheet here. We'll use some elements that we have not described yet, but we'll describe what happens afterwards.

  <?xml version="1.0"?>
  <xsl:stylesheet xmlns:xsl=" http://www.w3.org/1999/XSL/Transform" 
    version="1.0">
    <xsl:template match="/">
      <root_node/>
    </xsl:template>
  </xsl:stylesheet>

You will recognize the stylesheet element carrying the namespace declaration to indicate that this is an XSLT stylesheet. Inside the stylesheet is one xsl:template element. This element has a match attribute set to "/" and a child element root_node. This template matches ('is a suitable template for') the document root (indicated by '/'). The only content of the template is the root_node element. This is not an XSLT element, but a literal element that is added to the output when this template is executed. When this stylesheet is used to transform an arbitrary XML document, the processor will start processing the document root of the source document. It will find a suitable template in the stylesheet (the only template we have) and use it to process the document root. The only thing the template does is create a root_node element in the output document. This stylesheet will transform an arbitrary XML source document to:

<root_node/>

transform

The transform element is synonymous to the stylesheet element. It is included because the uses for XSLT have grown much wider than just giving style to XML content, but the stylesheet is still the most common way to define a transformation. Functionally, there is no difference.

import

To construct a stylesheet from several reusable fragments, the XSLT specification supports the importing of external stylesheet document fragments. This is done with either the import or include elements, for example:

<xsl:import href=uri-reference/>

The document retrieved from the URI should be a stylesheet document itself and the children of the stylesheet element are imported directly into the main stylesheet. The import element can only be used as a top-level element and must appear before any of the template elements in the document. If the XSLT processor is trying to match a node in the source document to a template in the stylesheet, it will first try to use one of the templates in the importing document before trying to use one of the imported templates. This allows for creating rules that are used in many stylesheets. Rules can be overridden by defining one of the rules again locally.

Both the import and the include elements may never reference themselves (not even indirectly).

include

The include element is the simpler brother of the import element:

<xsl:include href=uri-reference/>

It just inserts the rules from the referenced URI. These are parsed as if they were in the original document.

Like the import element, include can only appear at the top-level. There is no restriction on the location of this element in the document (unlike import).

 

template

The template element is one of the main building blocks of an XSLT stylesheet. It consists of two parts, the matching pattern and the implementation. Roughly, you can say that the pattern defines which nodes will be acceptable as input for the template. The implementation defines what the output will look like. We will cover the implementation later, when we discuss the elements that generate output.

  <xsl:template
    match = pattern 
    name = qname 
    priority = number
    mode = qname>
    <!-- Content: implementation-->
  </xsl:template>

The attributes name, priority and mode are used to differentiate between several templates that match on the same node. In these cases several rules exist for preference of templates over each other. In the section titled "What if Several Templates Match?" we will show the use of these attributes.

The match attribute holds the matching pattern for the template. The matching pattern defines for which nodes in the source document this template is the appropriate processing rule. The syntax used is a subset of XPath. It contains only the child and attribute axes (but it is also legal to use "//" from the abbreviated syntax, so the descendant axis is also available). A template matches a node, if the node is part of the result set of the pattern from any available context, which basically says that a node should be "selectable" with the pattern. We'll take a look at a few examples to clear this up.

Imagine that we are processing a document with chapters and paragraphs. The paragraphs are marked up with the element para, the chapters with chapter. We will look at possible values for the match attribute of the xsl:template element. This matches any para element that has a chapter element as a parent:

<xsl:template match="child::chapter/child::para"> </xsl:template>

Note that this will only work when the chapter element has a parent node. This parent node is the context we need to select the para element from with this pattern. Fortunately, all elements have a parent (the root element has the document root for a parent), so this pattern matches all para elements that have a chapter as a parent. This example will match with all para elements:

<xsl:template match="para"> </xsl:template>

This matches any para element as well as any chapter element:

<xsl:template match="(chapter|para)"> </xsl:template>

This matches any para element that has a chapter element as an ancestor:

<xsl:template match="chapter//para"> </xsl:template>

This matches the root node:

<xsl:template match="/"> </xsl:template>

This matches all nodes but not attributes and the root:

<xsl:template match="node()"> </xsl:template>

This matches any para element, which is the first para child of its parent:

<xsl:template match="para[position() = 1]"> </xsl:template>

This matches any title attribute (not an element that has a title attribute):

<xsl:template match="@title"> </xsl:template>

This matches only the odd-numbered para elements within its parent:

<xsl:template match="para[position() mod 2 = 1]"> </xsl:template>

Two interesting extra functions that you can use in the pattern are id() and key(). id('someLiteral') evaluates to the node that has 'someLiteral' as its ID value. This pattern matches all para elements that are children of the element with its ID attribute set to 'Table1':

<xsl:template match="id('Table1')/para"> </xsl:template>

Note that the ID attribute is not necessarily called ID – it can be any attribute that is declared as having type ID in the DTD or Schema. The key() method does something similar, but refers to defined keys instead of elements by ID. Refer to the section covering the xsl:key element to learn more about the key() method.

apply-templates

In the simple and rather non-functional example we looked at in the paragraph about the stylesheet element, we had only one template. This template matched on the document root. When the XSLT processor starts transforming a document with that stylesheet, it will first search for a template to match the document root. Our only template does this, so it is executed. It generates an output element and processing is stopped. All content held by nodes other than the document root is not processed. We need a way to tell the processor to carry on processing another node.

  <xsl:apply-templates
    select = node 
    set-expression 
    mode = qname>
  </xsl:apply-templates>

This is done using the xsl:apply-templates element. It selects the nodes that should be processed next using an XPath expression. The nodes in the node set that is selected by this XPath expression will become the new context nodes. For these new context nodes, the processor will search a new matching template. The transformed output of these nodes will appear within the output generated by the current template.

You may compare the use of the apply-templates element with calling a subroutine in a procedural programming language. There are only two possible attributes for the apply-templates element: select and mode.

The select attribute is the more important one. It specifies which nodes should be transformed now and have their transformed output shown. It holds an XPath expression. The expression is evaluated with the current context node. For each node in the result set, the processor will search for the appropriate template and transform it.

The default value for the select attribute is 'child::node()'. This matches all child nodes, but not attributes.

Let's make a few changes to our example and use xsl:apply-templates:

  <?xml version="1.0"?>
  <xsl:stylesheet xmlns:xsl=" http://www.w3.org/1999/XSL/Transform" 
    version="1.0">
    <xsl:template match="/">
      <root_node>
        <xsl:apply-templates/>
      </root_node>
    </xsl:template>
    <xsl:template match="*">
      <result_node>
        <xsl:apply-templates/>
      </result_node>
    </xsl:template>
  </xsl:stylesheet>

Now we'll use the following source document to test the transformation:

  <?xml version="1.0" ?> 
  <FAMILY>
    <PERSON name="Freddy" />
    <PERSON name="Maartje" />


    <PERSON name="Gerard"/>


    <PERSON name="Peter"/>
    <PET name="Bonzo" type="dog"/>
    <PET name="Arnie" type="cat"/>
  </FAMILY>

Let's first have a look at the changes in the stylesheet. Something was added to the original template: the root_node element now has a child element: xsl:apply-templates. This means that when the template is executed, the root_node element will still output a root_node element in the output document, but between outputting the start tag and the end tag, it will try to process all nodes that are selected by the xsl:apply-templates element. This element has no select attribute, so that defaults to child::node(), which selects all child nodes of the current context (excluding attributes).

Another change is that we added a new template, matching on "*". All it does is generate a result_node element in the output document (which does not mean anything, it is just test output). This node too has an xsl:apply-templates child element.

We saved the sample XML source as family.xml and the stylesheet as test.xsl. Then we called the SAXON processor like this:

saxon –o destination.xml family.xml test.xsl

We'll follow the XSLT processor step-by-step as it creates an output document from the sample source document and our test stylesheet:

1.    Try to match the root to one of the templates: the first template matches.

2.    Process the implementation of the first template, using the root as the context node.

3.    The implementation causes the output of a root_node element to the destination document and tells us to process all the child nodes of the root. These are only the XML declaration (<?xml version="1.0"?>) and the FAMILY element.

4.     The XML declaration has no matching template, and will not be processed. The FAMILY element matches the second template.

5.    The implementation causes the output of a result_node element to the destination document (as a child of the root_node element) and tells us to process all the child nodes of the FAMILY. These are all PERSON and PET elements.

6.    The processor tries to match the PERSON element to one of the templates: the second template matches.

7.    The second template generates a result_node element in the output and tells the processor to process the children of the element. It finds no children.

8.    Steps 6 and 7 are repeated for all PERSON and PET elements.

The result of all this processing looks like this:

  <root_node>
    <result_node>
      <result_node/>
      <result_node/>
      <result_node/>
      <result_node/>
      <result_node/>
      <result_node/>
    </result_node>
  </root_node>

The outer element (root_node) is the transformed result of the document root; the element within the root_node is the transformed result of the FAMILY element in the source. All of the PERSON and PET elements are transformed to the six empty result_node elements.

So, what about the mode attribute? We will discuss that in the section "What if Several Templates Match?"

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Java is to JavaScript what Car is to Carpet.” - Chris Heilmann