Using XML Queries and Transformations

Built-In Functions

As we have already seen, in the writing of predicates, functions that perform complex operations are very handy, if not absolutely necessary. Some of them we have already seen in some of the samples presented. We will show some important functions here, but all other built-in functions specified by the XPath recommendation are listed in Appendix C.

Node Set Functions

last()

The last() function returns the index number of the last node in the context. For example, this command selects the chapter elements (along the child axis, which is the default axis in the shorthand notation) that have exactly 5 paragraph children:

chapter[paragraph[last() = 5]]

position()

The position() function returns the position of the current context node in the current result set. For example, this command selects the chapter children that have a fifth paragraph:

chapter[paragraph[position() = 5]]

Note that here we create a predicate to filter the results of the outer expression, and this predicate uses an XPath expression that also has a predicate. This recursive use of XPath expressions in predicates is a powerful way to create sub-queries.

count(node set)

The count() function returns the number of nodes in the node set passed to it. This seems identical to the last() function, but it isn't; the context it works on is different. It can be used to do more or less equal things, but the syntax would be different. This example selects the chapters with exactly five paragraph children (identical to the example for the last() function):

chapter[count(paragraph) = 5]

Whereas this selects the chapters with five or more paragraph children (identical to the example for the position() function):

chapter[count(paragraph) >= 5]

id(object)

The id() function returns nodes that have the specified ID attribute. If the object passed to the function is a node set, each of the elements is converted to its string value. The function then returns all elements in the document that have one of the ID values in the set.

If the passed object is anything else, the query parser tries to convert it to a string and returns the element from the document that has this string for an ID. This can, by definition, be only one element, for example:

id(//book[@publisher = 'WROX']/@authors)

This query returns all nodes that have an ID that matches the content of the authors attributes on books that have their publisher attribute set to 'WROX'. This kind of query can be extremely powerful. However, they demand that the document is validated against a schema or DTD, because without validation, the processor cannot know which attributes are IDs. For doing things like this with invalidated documents, see the section on using keys.

namespace-uri(node-set)

If your application has to act only on information in a specific namespace (this is in fact very probable as soon as you are building real applications), you will love the namespace-uri() function. It returns a string containing the URI of the namespace of the passed node set. Normally, the node set you pass will only contain one node. In fact, if you pass a node set containing multiple nodes, the function will use the first node in the set. So, if the node you pass is an element of type mydata:chapter, the function will look for the declaration of the mydata namespace and will return the value of the URI used there.

This next query will return all elements in the specified namespace:

//*[namespace-uri() = 'http://www.w3.org/1999/XSL/Transform']

String Functions

For the handling of strings, several functions are included. We will not get into these in very much depth. Most are what you would expect from string handling functions. They cover concatenation, comparing and manipulating strings, and selecting a substring from a string. We will show just a few functions here; refer to Appendix C for the complete list.

string(object)

This function converts the passed object to a string. This may be a Boolean value that is converted to 'true', or a number value converted to its string value (i.e. the number 3 would be converted to the string "3"). If a node set is passed, the first node in the set is used.

starts-with(string, string)

This is for checking if the first string starts with the second string. The function returns true if so, otherwise false. For example, this query returns all employee elements that have a last-name attribute that starts with an 'A':

descendant::employee[starts-with(@last-name, "A") ]

Note the use of the shorthand notation @last-name for attribute::last-name.

translate(string, string, string)

The translate function takes a string and, character-by-character, translates characters which match the second string into the corresponding characters in the third string. This is the only way to convert from lower to upper case in XPath. That would look like this (with extra whitespace added for readability). This code would translate the employee last names to upper case and then select those employees whose last names begin with A:

descendant::employee[
    starts-with(
      translate(@last-name,
              "abcdefghijklmnopqrstuvwxyz",
              "ABCDEFGHIJKLMNOPQRSTUVWXYZ"),
      "A"
   )
]

If the second string has more characters than the third string, these extra characters will be removed from the first string. If the third string has more characters than the second string, the extra characters are ignored.

Number Functions

As for strings, a set of functions is available for number handling, but we will not list them all here. They are available in Appendix C. We will show a few of the most important and instructive examples.

number(object)

The number() function converts any passed value to a number. Its behaviour depends on the type of the passed parameter. Some possible situations:

  • If a string is passed, the value of the string is converted to the mathematical value that it displays (following the IEE 754 standard).
  • If a Boolean value is passed, true is converted to 1, false to 0.
  • If a node set is passed, it is first converted to a string (as if using the string() function). Then the string is converted to a number.
  • The number function has no support for language-specific formats. The string value passed in should be of a language neutral format.

    sum(node set)

    The sum() function returns the sum of the numerical values of all passed nodes. The numerical value is the result of the conversion of their string values. For example, this query selects the industry elements that have customer elements as children, whose totalturnover attributes sum to an amount larger than 1 million:

    //industry[sum(customer/@totalturnover) > 1000000]

    round(number)

    The round() function is a typical number function. It rounds a floating point value to the nearest integer value. Other ways of making an integer from a floating point value are floor() and ceiling().

     

    Boolean Functions

    The functions that handle Boolean values are not very special. The only really useful one is the not() function, which converts a Boolean value to its opposite. Other than that, there are the true() and false() functions that always return true and false respectively, and the lang() function that can be used to check the language of the content (if this is indicated with the xml:lang attribute).

    You might also like...

    Comments

    Contribute

    Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

    Our tools

    We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

    “A computer lets you make more mistakes faster than any other invention in human history, with the possible exceptions of handguns and tequila” - Mitch Ratcliffe