Extensible Markup Language (XML) Tutorial

Using a Document Type Definition (DTD)

Structured Data

Each element in an XML document has a relationship with other elements, which defines the structure of the data. Structuring data ensures that data is found in the correct place, and adds context to the document. This results in self-describing, organised information, separating content from style. Explicit rules state where a specific part of the document structure may exist. Structured data is easily processed by search engines, as they're able to index only the relevant elements.

The explicit rules that state where elements may exist are defined in a Document Type Definition (DTD). The DTD provides a formal definition of the document structure and elements that may be used. An XML document is said to be valid if it contains a DTD, and the content conforms to the constraints expressed in the DTD. The DTD is part of the prolog, and must be placed before the root element of the document.

The following is the contents of an XML document called user.xml. If your browser has an XML Parser, you can View the XML Document here. If your browser doesn't have an XML Parser, you will just see the contents of the XML document.

user.xml

<?xml version="1.0" ?>
<!DOCTYPE user [
    <!ELEMENT user     (name,email)>
    <!ELEMENT name     (forename, surname)>
    <!ELEMENT forename (#PCDATA)>
    <!ELEMENT surname (#PCDATA)>
    <!ELEMENT email     (#PCDATA)>
]>
<user>
    <name>
        <forename>Gez</forename>
        <surname>Lemon</surname>
    </name>
    <email>[email protected]</email>
</user>

The above example is an XML Document defining a DOCTYPE of "user", where "user" is the top-level element. Following the document type declaration are the element declarations. The element declarations determines how often, and in what context the elements appear in the document. The document consists of the elements "name" and "email", in that order. The "name" element is defined as having the child elements, forename, and surname, in that order.

Character Data (CDATA)

AN XML document consists of markup, and character data, where the character data is the text of the document. The markup provides information about the character data, and is differentiated from the character data using special characters. The special characters used to differentiate markup from character data are angled brackets ("<", and ">"), ampersands ("&"), and semicolons (";"). Data specified as Character Data (CDATA), will not be parsed by the XML parser. The following example uses a CDATA section to define a JavaScript section in an XHTML document.

<script type="text/javascript">
<![CDATA[
function someFunction()
{
    // Function definition
}
</script>

Parsed Character Data (#PCDATA)

The "forename", "surname", and "email" elements are defined as elements that can contain Parsed Character Data (#PCDATA). PCDATA is data validated to ensure it is valid. PCDATA may not contain the characters used to differentiate markup from character data.

The DTD can be stored in an external file. In this case, the DOCTYPE declaration contains the name of the external file.

user.dtd

< !ELEMENT user (name,email)>
<!ELEMENT name (forename, surname)>
<!ELEMENT forename (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT email (#PCDATA)>

The xml document then specifies the location for the external DTD.

user.xml

< ?xml version="1.0" ?>
<!DOCTYPE user SYSTEM "user.dtd">
<user>
    <name>
        <forename>Gez</forename>
        <surname>Lemon</surname>
    </name>
    <email>[email protected]</email>
</user>

You might also like...

Comments

About the author

Gez Lemon United Kingdom

I'm available for contract work. Please visit Juicify for details.

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The difference between theory and practice is smaller in theory than in practice.”