Using XML Queries and Transformations

XLST Examples

Let's have a look at some more examples to demonstrate the use of XSLT. In the last part of this section, we will look at using XSLT to style an XML document in HTML. There will be more examples there. Here we will cover examples that are not HTML-related, but targeted to converting one XML dialect into another. This will be a very common case in business-to-business e-commerce, where XML documents containing orders, inventories, product descriptions, etc., are sent automatically and converted on the fly to a format that is suitable for the target system.

Product Information Import

Think of a system that retrieves product descriptions from several suppliers to present users in the organization with a coherent view of all available products. Some of these suppliers will have their product range available in an XML format. In an ideal world, an agreement could be made with all suppliers about the format used for delivering the data. Unfortunately, in the real world suppliers will not be willing to do that, the user will have to settle for what he can get. Some will conform to an industry standard but, in the end, transformation from some other format to that which is required will be necessary.

The format that can be natively imported by our application looks like this:

<?xml version="1.0"?>
<Product>
  <ID>21456</ID>
  <Name>
    Nail clipper
  </Name>
  <Product_category>Personal care</Product_category>
  <Supplier>
    <Name>Clippers Inc.</Name>
    <Address>
      <Street_address>
      234, Wood lane
      Humblestown, MA
      </Street_address>
      <Country>USA</Country>
    </Address>
    <Contact>Macy Marble</Contact>
  </Supplier>
</Product>

The XML descriptions we receive from Clippers Inc look like this:

<Clipper product-reference="21456">
  <FullName>Solid quality nail clipper, San Juanito steel</FullName>
  <Short>Nail clipper</Short>
</Clipper>

We want to transform this delivered format into our native format using XSLT. We could create a stylesheet for the transformation like this:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <Product>
      <ID><xsl:value-of select="Clipper/@product-reference"/></ID>
      <Name>
        <xsl:value-of select="Clipper/short"/>
      </Name>
      <Product_category>Personal care</Product_category>
      <xsl:copy-of
       select="document('http://ourserver/supplier_lookup.xml')/suppliers/supplier
      [Name = 'Clippers Inc.']"/>
    </Product>
  </xsl:template>
</xsl:stylesheet>

Let's have a look at the sample little by little. There is only one template, matching the root. This template contains a framework for the output document. The Product element and its ID child element are inserted as literals. The value of the ID element is fetched from the source document, by inserting the value of the product-reference attribute from the source. The same thing is done for the name. We create a name element with literals and insert a value from the source document in it. Note that we chose to use the short name from the source and discard the long name. The Product_category element is hard-coded. We expect only products in this category from this supplier.

Now comes the hard part. The supplier information is not provided in this case. Some suppliers will, some will not. We could choose to hard-code the supplier information in the stylesheet. But that would force us to update the stylesheet every time the supplier changes its address or we get a new contact person. We decided to store all supplier information in our own format in one file. While transforming the document, the processor does a lookup in the supplier_lookup.xml document and copies a whole fragment from that document to the destination document using copy-of.

Author Summary

Our second example is for a publishing company; all books are stored in a giant XML document (in fact it is stored in a database, but this database allows access to the data as if it were an XML document). A fragment of this document looks like:

<publisher>
  <books>
    <book>
      <title>Stranger in a strange land</title>
      <ISBN>0441788386</ISBN>
      <author-ref ref="rh"/>
      <sold>2300000</sold>
    </book>
    <book>
      <title>Starman Jones</title>
      <ISBN>0345328116 </ISBN>
      <author-ref ref="rh"/>
      <author-ref ref="jldr"/>
      <sold>80000</sold>
    </book>
    ...
 
  </books>
  <authors>
    <author id="rh">
      <first_name>Robert</first_name>
      <last_name>Heinlein</last_name>
    </author>
    <author id="jldr">
      <first_name>Judy-Lyn</first_name>
      <last_name>Del Rey</last_name>
    </author>
  </authors>
</publisher>

Note how the second book has several authors. For making an overview of the most successful authors, the publisher wants to transform this huge books file to something like this:

<author>
  <name>Heinlein, Robert</name>
  <total_publications>67</total_publications>
  <total_sold>7343990</total_sold>
  <rank>1</rank>
</author>

Authors will be ranked by the total number of copies of books sold, and this should also determine their position in the document. So, the best selling author in the books document should be the highest on the list. This can be accomplished by this stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<bestsellers-list>
  <xsl:apply-templates select="/publisher/authors/author">
    <xsl:sort select="sum(/publisher/books/book
                                 [author- ref/@ref=current()/@id]/sold)"/>
    <xsl:sort select="last_name"/>
  </xsl:apply-templates>
</bestsellers-list>
</xsl:template>
<xsl:template match="author">
  <copy>
    <name><xsl:value-of select="last_name"/>,
    <xsl:value-of select="first_name"/></name>
    <total_publications>
      <xsl:value-of select="count(/publisher/books/book[author-
       ref/@ref=current()/@id])"/>
    </total_publications>
    <total_sold>
      <xsl:value-of select="sum(/publisher/books/book[author-
       ref/@ref=current()/@id]/sold)"/>
    </total_sold>
    <rank><xsl:value-of select="position()"/></rank>
  </copy>
</xsl:template>
</xsl:stylesheet>

Some things in this stylesheet are worthy of further comment. First, note how the sum() and count() functions are used, both in the author template for calculating the number of publications and total number sold for each author, and in the sort element within the apply-templates element. Note how the current() function is used to match the author-ref elements to the author elements they refer to. An interesting thing to note is that the current() function within the apply-templates element refers to the current context after selecting the new set.

If the source document is large, this stylesheet will probably take a long time to process. Many calculations are done in counting and summing the nodes. In these counting actions, a lot of searching is done on books that have an author-ref element with a certain ref attribute. We could also implement this using a key. If the processor is optimized for using keys, this will speed things up significantly (but I don't know of any such processor at the time of writing). Even if it doesn't give us a performance gain (it still might in the future), our code becomes somewhat cleaner. Then the stylesheet would look like this. See if you can figure it out.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:key match="/publisher/books/book"
use="author-ref/@ref" name="books-by-author"/>
<xsl:template match="/"><bestsellers-list>
  <xsl:apply-templates select="/publisher/authors/author">
    <xsl:sort select="sum(key('books-by-author', @id)/sold)"/>>
    <xsl:sort select="last_name"/>
  </xsl:apply-templates>
</bestsellers-list></xsl:template>
<xsl:template match="author">
  <copy>
    <name>
      <xsl:value-of select="last_name"/>,
      <xsl:value-of select="first_name"/>
    </name>
    <total_publications>
      <xsl:value-of select="count(key('books-by-author', @id))"/>
    </total_publications>
    <total_sold>
      <xsl:value-of select="sum(key('books-by-author', @id)/sold)"/>
    </total_sold>
    <rank>
      <xsl:value-of select="position()"/>
    </rank>
  </copy>
</xsl:template>
</xsl:stylesheet>

At the beginning of the document, we added an xsl:key element. It is called 'books-by-author'. The key will give us direct access to a set of nodes from the source document. With the match attribute we specify which nodes we want to be able to access. In our case, we want access through the key to all book elements in the document (match="/publisher/books/book"). With the use attribute we specify the key value we want to use to access a book element. This is apparently the ref attribute on the author-ref child element(s) of the book (use="author-ref/@ref").

Now if we use the key() function anywhere in the stylesheet like this:

key('books-by-author', 'rh')

This will return a node set containing all book elements that have an author-ref child element with ref="rh". Effectively these are all books by Robert Heinlein. Using this, we could simplify some of the expressions in the stylesheet significantly.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Engineers are all basically high-functioning autistics who have no idea how normal people do stuff.” - Cory Doctorow