Before we get into the syntax of an XPath query, we have to discuss the concept of a context node. In XPath, a query is not automatically done over the whole of the content, but always has a starting point or context node. This can be any node in the node tree that constitutes the document. From this "fixed point" you can issue queries like "give me all your children". This kind of query only makes sense if there is a starting point defined. This starting point may be the root node, of course, which would query the entire document.
This may seem a bit abstract now, but just remember: an XPath query is done from a certain starting point in the document.
Different Axes
Have a look at the following XPath query:
descendant::TABLE
This query would translate to plain English as: "Get the TABLE
elements from all descendants (children, children's children, etc) of the context
node". The first part of this query, descendant
,
is called the axis of the query. The second part, TABLE
, is called the node test.
The axis is the searching direction; if a node along the specified axis conforms
to the node test, it is included in the result set. These patterns can be very
complex and can have subqueries in them. We will look at that later. First,
we will list all available axes that can be used in a query:
Axis |
Description |
|
All direct children of the context node. Excludes attributes. |
|
All children and children's children etc… Excludes attributes. |
|
The direct parent (and only the direct parent) of the context node (if any). |
|
All ancestors of the context node. Always includes the root node (unless the root node is the context node). |
|
All siblings to the context node that appear later in the document. |
|
All siblings to the context node that appear earlier in the document. |
|
All nodes in the document that come after it (in document order). |
|
All nodes in the document that come before it (in document order). |
|
Contains the attributes of the context node. |
|
Contains the namespace nodes of the context node. This includes an entry for the default namespace and the implicitly declared XML namespace. |
|
Only the context node itself. |
|
All descendants and the context node itself. |
|
All ancestors and the context node itself. |
The ancestor
, descendant
,
following
,
preceding
and self
axes partition the document. This means that these five axes together contain
all nodes of the tree (except attributes and namespaces), but do not overlap.
This means that an ancestor is not on the preceding
axis and that a descendant
is not on the following
axis, as illustrated in the following diagram:
|
Different Node Tests
The sample we showed before used a literal name (TABLE
)
as a node test. This is only one of the ways to specify what a selected node
should look like. Other valid values are:
text()
– which is true for all text
nodes.*
– which is true for any node of the
principal type and every axis has its own principal node type. For most axes
the principal node type is 'element', but for the attribute axis it is 'attribute'
and for the namespace axis, the principal type is 'namespace'.comment()
– which is true for all comment
nodes.processing-instruction()
– which is
true for all processing instruction nodes.node()
– which is true for all nodesThese node type tests take no arguments. Only the processing-instruction
can be passed a literal; if an argument is passed, the node test is only true
for a processing instruction that has a name equal to the argument.
The following are examples of XPath queries using different axes and node tests.
This selects all descendant
elements from the context node:
descendant::*
This selects the name
attribute from the context node:
attribute::name
This selects the parent
node of the context node:
parent::*
This selects all namespaces that are valid in the context node:
namespace::*
This means that it includes the default namespace, the xml
namespace, any namespaces that are declared in the context node, and any namespaces
declared in ancestors of the context node that have not been overruled by declarations
in their children. The overruling of a namespace happens when one element declares
a prefix to a certain URI and a child node declares a namespace with the same
prefix, but with another URI. In this case, the first declaration is removed
and becomes invisible from nodes that are descendants of the element with the
second declaration.
Finally, this query selects all comment
nodes that are a direct child of the context node:
child::comment()
Comments