Using XML Queries and Transformations

Giving Style to XML

With XML only consisting of data content, there is a clear need for ways to display this content. This is commonly referred to as 'styling the content'. At the time of writing, there are two W3C standard stylesheet languages: CSS (Cascading Stylesheets) and XSLT. Both can be used to assign certain looks to specific element types in an XML document.

Until now we have seen how XSLT can be used to transform XML documents from one format to another. The original goal for specifying XSLT was its use as a styling mechanism. But before we try to use XSLT to transform to HTML documents, we'll first have a look at the Cascading Stylesheets.

Using CSS in HTML

You have probably seen CSS before in an HTML context. It is a syntax for specifying the appearance of elements in an HTML document in a structured way. It allows for associating one stylesheet with many content documents, thus centralizing the common layout in one place. The 'Cascading' part of the name refers to the feature of overriding a global property locally by redefining and inheriting properties from parent elements in the document. CSS strikes a fine balance between centralization and developer flexibility.

The current recommendation of the W3C is at version 2. The most important difference between versions 1 and 2 is that support for other media-types was included (printing documents) and more complex selectors were introduced (a selector in CSS2 can be compared with the match attribute in an XSLT template). CSS properties on elements can specify:

  • Font size, family, color, variant (for example smallcaps), style (for example italic)
  • Color, background color, background image (including tiling and positioning)
  • Line, word and letter spacing
  • Alignment, underlining, overlining
  • The margins, borders, etc. of boxes (boxes are TABLE elements, but also P and BODY elements)
  • List styles (square bullets, etc.), display (not displayed, as block, inline)
  • Very detailed positioning and units (inches, cm, pixels, points)

Some simple uses of CSS in HTML are shown here before we head on to styling XML.

<P style="text-align:center;text-decoration:underline">This is text</P>

The above code would be displayed like this:

This is text

The same would be accomplished by inserting this code at the beginning of the HTML document:

<STYLE>
P {
  text-align:center;
text-decoration:underline;
}
</STYLE>

Or by associating the HTML document with an external stylesheet by doing this:

<LINK href="mystyle.css" rel="style sheet" type="text/css">

while a file is present in the same directory, called mystyle.css and with this content:

P {
  text-align:center;
  text-decoration:underline;
}

There is a lot more to using CSS from HTML, especially when you programmatically change the styles during display. We will not cover the use of classes and more complex concepts in CSS here.

Using CSS in XML

Using CSS to style an XML document is very simple if you know the way it works with HTML. The way of referencing the stylesheet is different, using the processing instruction xml-stylesheet instead of the LINK element. Inline stylesheets are not possible. Let's look at an example:

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet type="text/css" href="article.css"?>
<Article>
  <Authors>
    <Author>James Britt</Author>
    <Author>Teun Duynstee</Author>
  </Authors>
  <Title>A cool article</Title>
  <Intro>An introductory text here ... </Intro>
  <Body>The body text of the article comes here ... </Body>
  <Related>
    <Item type="URL" loc="http://www.asptoday.com/art2">Some other article</Item>
    <Item type="local" loc="2"/>
  </Related>
</Article>

This example refers to a cascading stylesheet with a relative URL. The type attribute contains a MIME type indicating the kind of stylesheet. Using a different MIME type, we can also use this syntax to associate an XSLT stylesheet with the document – more on that later.

If we leave the article.css stylesheet document empty, all text nodes will be displayed flowing over the whole page. What we want is the title to appear larger and have everything aligned, more like how we would expect an article to look, just like this:

The most important thing to realize is that the elements in our document have no style whatsoever. Normally in HTML styling, the P element (paragraph) has some properties set by default. For example: the P element, but also the H1 to H6 elements all have their display attribute set to 'block'. This indicates that the element requires its own line in the document. In XML, the CSS processor assumes nothing. So let's start styling the title of the article:

  • It should appear on its own line
  • It should be a little larger than the rest of the text
  • It should be centered and underlined
  • We want to use a sans-serif font

Converting this into a CSS statement, we would get:

Title {
  display:block;
  text-align:center;
  text-decoration:underline;
  font-size:14pt;
  font-family:helvetica
}

Doing this for all elements in the article document, we could come up with a stylesheet like this:

BODY {
  color:black;
  display:block;
  width:80%;
  margin-left:20%;
}
Intro {
  color:black;
  font-weight:bold;
  display:block;
  line-height:150%;
  width:80%;
  margin-left:20%;
}
Author {
  text-align:right;
  font-size:8pt;
  display:block;
  text-decoration:italic;
}
TITLE {
  display:block;
  text-align:center;
  text-decoration:underline;
  font-size:14pt;
  font-family:helvetica
}
Related {
  display: none;
}

The good thing is that it is a standard and it works. The bad thing is that it is a bit limited in areas other than the visible style, for example:

  • Reordering and sorting of elements is not possible
  • Generation of text is hard. It can be done using the before and after pseudo-elements, but for more than the really basic additions, it's too difficult and besides, these are not implemented in most browsers.
  • Adding functionality, such as creating a link from certain content elements, is not possible.

Some documents are suitable for styling this way. They have a content that is already in the order of reading and don't need much extra functionality beyond the formatting of the content. Often, the data in XML documents needs some more rigorous form of styling. In these cases, XSLT can be used.

Good points of CSS include:

  • Many web developers are familiar with the language
  • Good performance

Using XSLT for Adding Style

We have seen quite a lot of XSLT in this chapter. We saw that it is a language for converting one XML-based document into another. HTML looks very much like an XML-based syntax, only the rules to determine if the document is well-formed are less strict. If we use XSLT to transform XML to an HTML page, the result must always be valid XML. This means that you cannot create just any kind of HTML from XML with XSLT, but for any valid HTML document it is possible to create an HTML document that looks the same and can be created from XML. So if you want text displayed as:

Text text text text

the HTML you would normally use would look like this:

Text <B>text <I>text</B> text</I>

However, to be valid XML, it would have to be rewritten as:

Text <B>text <I>text</I></B><I> text</I>

Recently, the W3C specified XHTML. XHTML is the same as HTML, but must always be a valid XML document. DTDs for XHTML have been published. You can find the specification, including the associated DTDs, at www.w3.org/TR/xhtml1/. Any XHTML document can be generated from a source using XSLT. However, be careful – if you use these DTDs to validate your XHTML document, you must use HTML in lowercase. In most of the examples in this book, we use uppercase HTML elements (I think this makes the stylesheet elements and the literal HTML elements easier to distinguish). So, the examples do not generate valid XHTML.

As you may remember, the XSLT output element allowed us to choose the method html for outputting HTML instead of XML. If you do this, you can be sure that XML specifics such as processing instructions and closed empty elements like <BR/> will not confuse HTML browsers. But if you do so, you must also be aware of the fact that your output is not valid XML anymore and therefore also not valid XHTML.

So basically anything that can be shown in a browser can be the styled representation of an XML document using XSLT, and this was actually one of the main purposes of developing XSLT in the first place. Using it for transforming into formats other than HTML was only added later. We have already seen a lot of XSLT in this chapter. We will just have a look at some examples and common techniques.

Styling the Article

We'll take the same source documents that we used to show the use of CSS on XML. The documents contain the text of an article and include some references to both remote and other local articles. A sample article looks like this:

<?xml version="1.0" encoding="iso-8859-1"?>
<Article>
  <Authors>
    <Author>James Britt</Author>
    <Author>Teun Duynstee</Author>
  </Authors>
  <Title>A cool article</Title>
  <Intro>An introductory text here ... </Intro>
  <Body>The body text of the article comes here ... </Body>
  <Related>
    <Item type="URL" loc="http://www.asptoday.com/art2">Some other article</Item>
    <Item type="local" loc="2"/>
  </Related>
</Article>

Now, we want to use XSLT to go beyond the styling that CSS made possible. We will include a link for each of the related articles. The remote references we will display with the title in our source document, but for the local references, we will look up the details of those referred articles and include that information in our styled document.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="Article">
<HTML><BODY>
<xsl:apply-templates select="Title"/>
<xsl:apply-templates select="Intro"/>
<xsl:apply-templates select="Body"/>
<xsl:apply-templates select="Authors"/>
<xsl:apply-templates select="Related"/>
</BODY></HTML>
</xsl:template>
<xsl:template match="Title"><H1><xsl:apply-templates/></H1></xsl:template> <xsl:template match="Intro">
  <p style="width:80%;font-weight:bold"><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="Body">
  <p style="width:80%"><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="Authors">
  <p>Author(s): <xsl:apply-templates select="Author"/></p>
</xsl:template>
<xsl:template match="Author">
  <xsl:apply-templates/>
  <xsl:if test="position() != last()">, </xsl:if>
</xsl:template>
<xsl:template match="Related">
  <p>Related items:<br/><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="Item">
  <xsl:if test="@type='URL'">
    <a href="{@loc}"><xsl:value-of select="."/></a>
  </xsl:if>
  <xsl:if test="@type='local'">
    <a href="art{@loc}.xml">
    <xsl:value-of select="document(concat('art', @loc, '.xml'))/Article/Title"/>
    </a>(
    <xsl:apply-templates select=
      "document(concat('art', @loc, '.xml'))/Article/Authors/Author"/>
)
  </xsl:if>
  <br/>
</xsl:template>
</xsl:stylesheet>

Except for the part that generates the HTML for the related articles, everything is fairly simple in the above document. The template that matches on the document element Article, reorders several items in the required order, starting with the title and placing the authors and related articles at the end. The job of specifying how each of these items should look is delegated to other templates. The templates for Title, Intro and Body elements do nothing special. They just output a bit of extra formatting code around the content.

The Author template is a bit more interesting. It is designed to create a comma-separated list in the output. This is done by placing if tags around the literal comma. The test attribute checks if the current author happens to be the last one in the current node set. If so, it doesn't generate the comma.

Then we have the related articles. The template for Item generates a link to these articles. There are two kinds of article references. Some are external, referring to some URL on the web. These have a text node as their content. This is the title of the article, which we want to show as link text. The other type is a local reference. It refers to other articles in the same directory, written in the same XML format as this one. These articles have a title in them and we know where to find the title when we need it, so there is no need for storing the titles of these related articles along with the reference.

The Item template really consists of two parts, one for remote links and one for local links. The remote one is simple. It generates the HTML code for a link, using the href attribute as an attribute value template. The content of the Item element becomes the content of the A element in HTML.

The local references are more complicated. How are we going to get hold of the title from these other files in the same directory? By using the document() function! This is demonstrated twice, once from the value-of element, and once from the apply-templates. In the first case, the processor just opens the other document, finds the title element in the indicated spot and outputs this to the destination document. The second case (fetching a list of authors from the referenced document) uses the apply-templates element. Instead of passing a node set of local nodes to the processor to let it find appropriate templates for them, in this case we hand a set of nodes from another document. The processor does exactly the same. It uses the author template that was already used for creating a comma-separated list of authors of the article itself, but it is now used to create an author list for the referenced document.

Creating Internal Links on Shakespeare

The next example is an XSLT stylesheet that styles the play of Macbeth (and other plays in the same format). Apart from creating a readable layout and highlighting the stage directions, we want to create a sort of navigational structure that allows the reader to jump from the beginning of an act to the beginning of the next or previous act. To do this, we will have to introduce internal links in the HTML document.

An example XML document has the following structure:

<?xml version="1.0"?>
<!DOCTYPE PLAY SYSTEM "play.dtd">
<PLAY>
  <TITLE>The Tragedy of Macbeth</TITLE>
  <PERSONAE>
    <TITLE>Dramatis Personae</TITLE>
    <PERSONA>DUNCAN, king of Scotland.</PERSONA>
    <PGROUP>
      <PERSONA>MALCOLM</PERSONA>
      <PERSONA>DONALBAIN</PERSONA>
      <GRPDESCR>his sons.</GRPDESCR>
    </PGROUP>
  ...
  </PERSONAE>
<ACT><TITLE>ACT I</TITLE>   <SCENE>
    <TITLE>SCENE I. A desert place.</TITLE>
    <STAGEDIR>Thunder and lightning. Enter three Witches</STAGEDIR>
    <SPEECH>
      <SPEAKER>First Witch</SPEAKER>
      <LINE>When shall we three meet again</LINE>
      <LINE>In thunder, lightning, or in rain?</LINE>
    </SPEECH>
  </SCENE>
</ACT>
</PLAY>

A SPEECH element has always a SPEAKER child element and at least one LINE child element. SPEECH and STAGEDIR elements are children of SCENE elements; SCENE elements are children of ACT elements, which are children of the root PLAY element. Phew!

The corresponding stylesheet is called play.xsl and is part of the code download. It is rather big, and so has not been listed in full. Most of this stylesheet is dedicated to the visible layout of several of the content elements; however some of the templates deserve a closer look. For example the SPEECH template creates the speech in a table with the speaker's name on the first line:

<xsl:template match="SPEECH">
  <TR><TD style="font-weight:bold">
    <xsl:apply-templates select="SPEAKER"/>
  </TD><TD><xsl:value-of select="LINE[1]"/></TD></TR>
  <xsl:apply-templates select="LINE"/>
</xsl:template>
... <xsl:template match="LINE">
  <xsl:if test="position() > 1">
    <TR><TD></TD><TD>
      <xsl:value-of select="."/>
    </TD></TR>
  </xsl:if>
</xsl:template>

Note how the SPEECH template creates a row in the table, holding both the SPEAKER and the first LINE element. To prevent the line from showing up twice, the LINE template only creates output for LINE elements that have a position higher than 1.

The most interesting part of the stylesheet is the part that generates internal links forward and backward. We will have a look at the templates for ACT elements, one part at a time. First let's look at the part that generates the act title as an internal link target:

<A>
  <xsl:attribute name="name">
    <xsl:value-of select="generate-id()"/>
  </xsl:attribute>
  <xsl:value-of select="TITLE"/>
</A>

This fragment creates an A element, with the content of the TITLE child element contained within. On this element, a name attribute is generated. The value of this attribute is determined by the function generate-id() without passing a parameter. This causes the function to use the context node (the ACT element) to generate a unique identifier. How the identifier looks is processor specific.

The links forward and backward are also generated using the generate-id() function, but now by passing the next or previous ACT element to it:

<xsl:if test="following-sibling::ACT">
  <A>
    <xsl:attribute name="href">
    <xsl:text>#</xsl:text>
    <xsl:value-of select="generate-id(following-sibling::ACT[1])"/>
    </xsl:attribute>
    &gt;
  </A>
</xsl:if>

Using the if element, we make sure that the 'next' link is only generated if there is any ACT element to link to. If so, an A element is generated, bearing an href attribute with an internal link. The # is hardcoded, but the rest of the string is generated by the generate-id() function. We use the following-sibling axis, constrained by the ACT element name, and select the first node from the resulting set. This node is the next ACT element. Later in the destination document, this node will be used to generate the full text of the next act and creating a link target at the spot of the act title. This way, we make sure that the target name of the next act is the same string as used in the href attribute of this link.

A third technique that should be noticed is the use of a parameter in the transformation of a PERSONA element in the PERSONAE part. A PERSONA can appear as a direct child of the PERSONAE element or inside a PGROUP. If a PERSONA is inside a PGROUP, we want the name to appear indented. We use the same template for all PERSONA elements, but when called from within a PGROUP, we pass the parameter indented="yes" to it.

<xsl:template match="PERSONA">
  <xsl:param name="indented">no</xsl:param>
  <TR><TD>
    <xsl:if test="$indented = 'yes'">
      <xsl:attribute name="style">padding-left:20</xsl:attribute>
    </xsl:if>
    <xsl:value-of select="."/>
  </TD></TR>
</xsl:template>

Client Side XSLT Styling

There is one more way to style XML documents with an XSLT stylesheet. It can be done by the browser application. In this scenario, the web server sends a raw (without layout) XML document to the client, but containing a processing instruction that tells the browser which stylesheet to use. This processing instruction uses the same syntax we saw used for attaching a Cascading Stylesheet to an XML document. An example of this is as follows:

<?xml version="1.0" ?>
<?xml-stylesheet href="transformation.xsl" type="text/xsl"?>
<CONTENT>
</CONTENT>

A web browser that supports XSLT, will download the referred stylesheet (transformation.xsl) and transform the XML document with that stylesheet before showing it to the user. This technique can take a large part of the processing load from the server to the client machine.

If you have little control over the browser application used (as in most Internet scenarios), you will have to check on the server if the user uses an XSLT supporting browser and transform the content to HTML on the server if he doesn't.

Summary

Now that you have come this far, you have seen all the basic techniques you need to start programming and using XML in your Visual Basic applications. The next chapter will introduce you into the linking of XML documents to each other, but that technology is still very premature. So, we've now seen all the subjects that are ready for use.

In this chapter, we have learned a lot:

XPath

  • We learned how to use XPath to query a very specific subset of nodes from a loaded XML document.
  • We learned to create sub-queries using predicates on our XPath expressions.
  • We looked at the built-in functions of Xpath.
  • We covered the limited support of XPath in Internet Explorer 5.0 and the more complete support in the developer's preview MSXML 2.6.

XSLT

  • We learned how XSLT works and which processors can be used to try it out yourself.
  • We had a long and intensive look at all of the elements and functions supported in XSLT 1.0.
  • We looked at the level of implementation of XSLT in both IE5 (MSXML 2.0) and MSXML 2.6.
  • We looked at some examples that used XSLT to transform an XML source into another XML format. Converting from one schema to another schema.

Styling

  • We learned how you can give style to an XML document using Cascading Stylesheets.
  • We learned how to use XSLT to transform an XML document into an HTML document to display it.
  • We looked at some uses of XSLT to add functionality (internal navigation, external content) to a document when transforming.
  • We saw how to use client side XSLT processing capabilities to let users browse XML documents without first transforming them on the server.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Nine people can't make a baby in a month.” - Fred Brooks