Using XML Queries and Transformations

XPath Query Syntax

Before we get into the syntax of an XPath query, we have to discuss the concept of a context node. In XPath, a query is not automatically done over the whole of the content, but always has a starting point or context node. This can be any node in the node tree that constitutes the document. From this "fixed point" you can issue queries like "give me all your children". This kind of query only makes sense if there is a starting point defined. This starting point may be the root node, of course, which would query the entire document.

This may seem a bit abstract now, but just remember: an XPath query is done from a certain starting point in the document.

Different Axes

Have a look at the following XPath query:

descendant::TABLE

This query would translate to plain English as: "Get the TABLE elements from all descendants (children, children's children, etc) of the context node". The first part of this query, descendant, is called the axis of the query. The second part, TABLE, is called the node test. The axis is the searching direction; if a node along the specified axis conforms to the node test, it is included in the result set. These patterns can be very complex and can have subqueries in them. We will look at that later. First, we will list all available axes that can be used in a query:

Axis

Description

child

All direct children of the context node. Excludes attributes.

descendant

All children and children's children etc… Excludes attributes.

parent

The direct parent (and only the direct parent) of the context node (if any).

ancestor

All ancestors of the context node. Always includes the root node (unless the root node is the context node).

following-sibling

All siblings to the context node that appear later in the document.

preceding-sibling

All siblings to the context node that appear earlier in the document.

following

All nodes in the document that come after it (in document order).

preceding

All nodes in the document that come before it (in document order).

attribute

Contains the attributes of the context node.

namespace

Contains the namespace nodes of the context node. This includes an entry for the default namespace and the implicitly declared XML namespace.

self

Only the context node itself.

descendant-or-self

All descendants and the context node itself.

ancestor-or-self

All ancestors and the context node itself.

The ancestor, descendant, following, preceding and self axes partition the document. This means that these five axes together contain all nodes of the tree (except attributes and namespaces), but do not overlap. This means that an ancestor is not on the preceding axis and that a descendant is not on the following axis, as illustrated in the following diagram:

Different Node Tests

The sample we showed before used a literal name (TABLE) as a node test. This is only one of the ways to specify what a selected node should look like. Other valid values are:

  • text() – which is true for all text nodes.
  • * – which is true for any node of the principal type and every axis has its own principal node type. For most axes the principal node type is 'element', but for the attribute axis it is 'attribute' and for the namespace axis, the principal type is 'namespace'.
  • comment() – which is true for all comment nodes.
  • processing-instruction() – which is true for all processing instruction nodes.
  • node() – which is true for all nodes
  • These node type tests take no arguments. Only the processing-instruction can be passed a literal; if an argument is passed, the node test is only true for a processing instruction that has a name equal to the argument.

    The following are examples of XPath queries using different axes and node tests. This selects all descendant elements from the context node:

    descendant::*

    This selects the name attribute from the context node:

    attribute::name

    This selects the parent node of the context node:

    parent::*

    This selects all namespaces that are valid in the context node:

    namespace::*

    This means that it includes the default namespace, the xml namespace, any namespaces that are declared in the context node, and any namespaces declared in ancestors of the context node that have not been overruled by declarations in their children. The overruling of a namespace happens when one element declares a prefix to a certain URI and a child node declares a namespace with the same prefix, but with another URI. In this case, the first declaration is removed and becomes invisible from nodes that are descendants of the element with the second declaration.

    Finally, this query selects all comment nodes that are a direct child of the context node:

    child::comment()

    You might also like...

    Comments

    Contribute

    Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

    Our tools

    We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

    “It works on my machine.” - Anonymous