The top-level settings are a set of elements that can only be used at the top level of an XSLT document, and hold settings that specify how the stylesheet should be used. They specify the behavior of the processor on a few points.
output
The output
element is a bag of attributes that indicate settings
about the style of output that is generated. The main setting is defined in
the method attribute. The possible values are xml
, html
and text
.
xml
If the method is set to xml
, the output document will be an XML
document. What this means depends largely on the other attributes of the output
element:
- The
version
attribute specifies which version of XML should be used – we only have version 1.0 now, but that will probably change in the future. This number will also appear in the XML declaration if one is generated. The default version is 1.0. - The
encoding
attribute sets the preferred encoding for the destination document. If it is not specified, XSLT processors will use UTF-8 or UTF-16. If an XML declaration is generated, this will contain the encoding string specified. - The
indent
attribute can be set toyes
to allow the processor to include additional whitespace in the destination document. This can improve readability. The default setting isno
. - The attribute
cdata-section-elements
tells the processor when to use CDATA sections in the destination and when to escape illegal characters by using entity references. The value can hold a whitespace-separated list of element names. Text nodes that have a parent node in this list will be output asCDATA
sections. All others will be escaped (characters like<
will be replaced by entities like<
). -
omit-xml-declaration
can be set toyes
to leave out the XML declaration. By default, XSLT will include one, reflecting the settings of encoding and version. Also, if thestandalone
attribute has any value, this value will show up in the XML declaration. - With the
doctype-system
anddoctype-public
attributes, the validation rules for the destination document can be set. If you use onlydoctype-system
, the processor will include a<!DOCTYPE
fragment just before the first element. The doctype will be the name of the root element. The system identifier (URL of the DTD) is the value of thedoctype-system
attribute. If you also specify adoctype-public
attribute, the output will contain adoctype
declaration referring to a publicDOCTYPE
, with the value ofdoctype-system
as its URL. If only doctype-public is used, it will be ignored. - Finally, the
media-type
attribute can be used to specify a MIME-type for the result. By default this istext/xml
, but some XML-based document types may have their own MIME types installed.
html
If the method
attribute on the output
element is
set to html, the results of some of the other attributes change a bit compared
to the xml method.
- The
version
attribute now refers to the version of HTML, with a default value of 4.0. The processor will try to make the output conform to the HTML specification. - Empty elements in the destination document will be outputted without a
closing tag. Think of HTML elements like
BR
,HR
,IMG
,INPUT
,LINK
,META
andPARAM
. - Textual content of the
script
andstyle
elements will not be escaped. So if the XSLT document contains this literal fragment:
<script>if (a > b) doSomething()</script>
This will be output as:
<script>if (a > b) doSomething()</script>
- If any non-ASCII characters are used, the processor should try to use HTML
escaping in the output (
ë
instead ofë
). - If an encoding is specified, the processor will try to add a
META
element to theHEAD
of the document. This will also contain the value for media-type (default istext/html
).
<HEAD>
<META http-equiv="Content-Type" content="text/html;
charset=EUC-JP">
...
text
If the method
attribute is set to text
, the output
will be restricted to only the string value of every node. The media-type
defaults to text/plain
, but you can use other MIME types. Think
of generating RTF documents from an XML source document. These have no XML mark
up, so the most appropriate method is text
, with media-type
set to application/msword
. The encoding attribute can still be
used, but the default value is system dependent (on most Windows PCs it will
be ISO-8859-1).
Let's have a look at an example. The following stylesheet is used:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="yes"/>
<xsl:template match="/">
<HTML><BODY>
<TEST>
This is literal text with an ëxtended
character
<BR/>
<TABLE>
<TR><TD>Cell data</TD>
<TD>Second
cell</TD></TR>
</TABLE>
</TEST>
</BODY></HTML>
</xsl:template>
</xsl:stylesheet>
We use this stylesheet on an arbitrary, valid XML document. Note that the output
will always be the same literal XML tree. We will now only change the output
method and have a look at the result. First the result for the xml
method:
<?xml version="1.0" encoding="utf-8"?>
<HTML>
<BODY>
<TEST>This is
literal text with an ëxtended character
<BR/>
<TABLE>
<TR>
<TD>Cell data</TD>
<TD>Second
cell</TD>
</TR>
</TABLE>
</TEST>
</BODY>
</HTML>
Note that every element starts on a new line. This is the result of the indent="yes"
attribute. If this had not been specified, all content would be concatenated
on one line. This XSLT processor has defaulted its output to encoding UTF-8.
UTF-8 supports the extended character ë, so this is not escaped.
Setting the method to html
would generate:
<HTML>
<BODY>
<TEST>This is
literal text with an ëxtended character
<BR>
<TABLE>
<TR>
<TD>Cell data</TD><TD>Second
cell</TD>
</TR>
</TABLE>
</TEST>
</BODY>
</HTML>
Note that the XML declaration has disappeared and the processor appears to
have decided on a slightly different formatting around the TD
elements.
The processor has been assigned to indenting the resulting document, but in
html
mode, this may only be done in places that cannot influence
the appearance of the document in a browser. Also, the ë character cannot be
used in HTML, so it is escaped using the preferred HTML entity ë
(not the numeric XML entity).
Using the text
method, the result would be:
This is literal text with an ëxtended character
Cell dataSecond
cell
Only the string values of the nodes have been printed. The specified encoding
is used, so the special character is no problem. Note that no whitespace appears
between the values of the two TD
elements. We will see more on
whitespace in the next sections.
strip-space and preserve-space
What exactly happens to the whitespace in a document and in the XSLT document
itself? This is one of the subjects that often puzzle XML developers. Spaces,
tabs and linefeeds seem to emerge and disappear at random. And then there are
the XSLT elements to influence them: strip-space
, preserve-space
and the indent
attribute on the output element. Let's take a closer
look.
During a transformation, there are basically two moments when whitespace can appear or vanish:
- When parsing the source and stylesheet documents and constructing a tree
- Encoding a generated XML tree to the destination document
Before any processing occurs, the XSLT processor loads the source and stylesheet into memory and starts to strip unnecessary whitespace. The parser removes all text nodes that:
- Consist entirely of whitespace characters
- Have no ancestor node with the
xml:space
attribute set to preserve - Are not children of a whitespace-preserving element
For the stylesheet, the only whitespace-preserving parent element is xsl:text
.
For the source element, the list of whitespace-preserving elements can be set
using the strip-space
and preserve-space
elements
from the stylesheet. By default, all elements in the source document preserve
whitespace. With the elements
attribute of strip-space
,
you can specify which elements should not preserve whitespace. Adding elements
to the list of elements that have their whitespace preserved is done with preserve-space
.
The elements
attributes accept a list of XPath expressions. If
an element in the source matches multiple expressions, the conflict is resolved
following the rules for conflicts between matching templates.
So if a stylesheet contained these whitespace elements:
<xsl:strip-space elements="*"/>
<xsl:preserve-space
elements="PRE CODE"/>
the processor would strip all text nodes in the source document, except for
those inside a PRE
element or a CODE
element.
After stripping space from the source and stylesheet documents, the processing
occurs. The generated tree of nodes is then persisted to a string or file. By
default, no new whitespace is added to the result document, except if the output
element has its indent
attribute set to yes.
attribute-set
On the document level, it is possible to define certain groups of attributes that you need to include in many elements together. By grouping them, the XSLT document can be smaller and easier to maintain:
<xsl:template match="chapter/heading">
<font
xsl:use-attribute-sets="title-style">
<xsl:apply-templates/>
</font>
</xsl:template>
<xsl:attribute-set
name="title-style">
<xsl:attribute name="size">3</xsl:attribute>
<xsl:attribute name="face">Arial</xsl:attribute>
</xsl:attribute-set>
Here the attribute-set
element defines a group of two attributes
that are often used together. In the template for chapter headings, the attribute-set
is applied to a literal element, but use-attribute-set
can also
be used on element
, copy
and attribute-set
elements. Be careful not to use use-attribute-set
by itself (directly
or indirectly), as this would generate an error.
namespace-alias
The namespace-alias
element is used in very special cases, especially
when transforming a source document to an XSLT document. In this case, you want
the destination document to hold the XSLT namespace and lots of literal XSLT
elements, but you don't want these to interfere with the transformation process.
See the problem? You are shooting yourself in the foot there.
Using namespace-alias
, you can use another namespace in the stylesheet,
but have the declaration for that namespace show up in the destination document
with another URI:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias">
<xsl:namespace-alias
stylesheet-prefix="axsl" result-prefix="xsl"/>
<xsl:template
match="/">
<axsl:stylesheet>
<xsl:apply-templates/>
</axsl:stylesheet>
</xsl:template>
...
</xsl:stylesheet>
Instead of declaring the literal XSLT output elements in their real namespace, they have a fake namespace in this document. In the destination document, the same prefixes will be used, but they will refer to another URI:
<?xml version="1.0" encoding="utf-8"?>
<axsl:stylesheet xmlns:axsl="http://www.w3.org/1999/XSL/Transform">
...
</axsl:stylesheet>
key
The key
element is a very special one. It will take a little time
to discover its full potential. It is more or less analogous to creating an
index on a table in a relational database. It allows you to access a set of
nodes in a document directly with the key()
function, using an
identifier of that node that you specify. Let's describe an example. We could,
using the key
element, define that the key person-by-name
gives us access to PERSON
elements by passing the value of their
name
attribute. If the key
is set up correctly, we
would use key('person-by-name'
, 'Teun')
to get a result
set of PERSON
elements that have their name
attribute
set to 'Teun'.
To set this key, you would have used the element like this:
<xsl:key name="person-by-name" match="PERSON" use="@name"/>
Try to see what each of the attributes name
, match
and use
specifies. The name
attribute is simple: it
just serves to refer to a specific key of which there may be many. The match
attribute holds a pattern that nodes must match to be indexed by this key; this
pattern is identical to the template match
attribute. It is not
a problem if the same node is indexed by multiple keys. For each node in the
selected set, the XPath expression in the use
attribute is evaluated.
The string value of the result of this expression is used to retrieve the indexed
node. Multiple nodes can have the same result when evaluating use
in their context. When the key
function is called with this value,
it will return a result set holding all nodes that had this result. The result
can be a node set. In this case, each of the nodes will be converted to a string
and each of these strings can be used to retrieve the selected node.
Don't worry if you can't see the point of this yet. We will do an extensive example on this. Suppose we have this XML document:
<?xml version="1.0"?>
<FAMILY>
<TRADITIONAL_NAMES>
<NAME>Peter</NAME>
<NAME>Mary</NAME>
</TRADITIONAL_NAMES>
<PERSON name="Peter">
<CHILDREN>
<PERSON name="Peter"/>
<PERSON
name="Archie"/>
</CHILDREN>
</PERSON>
</FAMILY>
We are transforming the XML source with an XSLT document that starts like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:key name="all-names" match="PERSON"
use="@name"/>
<xsl:key name="parents-names"
match="PERSON[CHILDREN/PERSON]" use="@name"/>
...
If we now use the key()
function, our results will be:
Expression Used in |
Result |
|
Both |
|
Only the |
|
Both |
Now what are the cases where using a key is a good idea? Think of situations
where XML elements often refer to each other using some sort of ID, but without
using the validation rules for IDs (because these are sometimes too rigid).
The key
construct can:
- Keep your code more readable.
- Depend on the implementation, which may help performance. The XSLT processor can keep a hash-table structure in memory of all key references in the source document. If these references are often used, performance gains can be substantial.
Comments