The role namespaces play is most obvious when you are validating an XML document against an XML Schema Definition (XSD) XSD schema. In case you are not familiar with XSD, it is an XML-based grammar used to define document structures and data types that you use in your document. You can think of XSD as a superset of Document Type Definitions (DTD). You don’t need to know much about XSD schemas for this article and I’ll explain the little bit you do need to know.
Imagine you have an XML document that contains employee information for a human resources application:
<employees>
<employee>
<id>49982</id>
<name>Bart Simpson</name>
<hireDate>2000-07-04</hireDate>
</employee>
<employee>
<id>12345</id>
<name>Hugo Simpson</name>
<hireDate>2000-05-29</hireDate>
</employee>
</employees>
You might create a schema for this document that defines a data type for the employee element like this:
<xsd:complexType name="employeeType">
<xsd:sequence>
<xsd:element name="id" type="xsd:int"/>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="hireDate" type="xsd:date"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="employee" type="employeeType" maxOccurs="unbounded"/>
Without going into the details of XSD, the above snippet does two things:
First, it defines a data type called employeeType that contains three elements:
id, name and hireDate. Second, it declares an element called employee of the
type employeeType. This is the XSD way of saying “the <employee>
element
will contain three elements in this order: <id>
, <name>
, and <hireDate>
”.
You can use the above schema snippet to validate the employees document and it’ll work just fine (provided you have a schema-aware validator like XML Spy V3.5). Now, imagine the payroll application wants to share this employee information and add some more to it. For example, the payroll application wants to keep track of employee salary and the taxes being deducted (this is way oversimplified, but who wants to learn the 2001 payroll taxes laws?):
<employees>
<employee>
<id>49982</id>
<name>Bart Simpson</name>
<hireDate>2000-07-04</hireDate>
<salary>4000765.00</salary>
<taxes>3980765.27</taxes>
</employee>
<employee>
<id>12345</id>
<name>Hugo Simpson</name>
<hireDate>2000-05-29</hireDate>
<salary>82000.00</salary>
<taxes>16567.87</taxes>
</employee>
</employees>
How should you handle this? Do you update the HR schema to reflect the new <salary>
and <taxes>
elements?
That might seem like a good choice at first, but it results in two separate
applications sharing the same schema document, which is likely to result in
ownership and maintenance problems. It would be much better if you can separate
the data types that belong to HR from the data types that belong to payroll
and allow each application’s team to have control over there data types with
no potential of messing up each other’s schemas.
You can do that by simply
placing those data types in different buckets when you define them. Those buckets
are called namespaces. Lets say you define a bucket or namespace called HRData
and another one called payrollData. You can then make the payroll application
team in charge of maintaining data types in the payrollData namespace and the
HR application team in charge of maintaining types in the HRData namespace.
You will need a way to indicate that the <salary>
and <taxes>
elements
belong to the payrollData namespace while all other elements belong to the
HRData namespace. To do this you prefix each element name with the namespace
and a colon like this:
<HRData:employees>
<HRData:employee>
<HRData:id>49982</HRData:id>
<HRData:name>Bart Simpson</HRData:name>
<HRData:hireDate>2000-07-04</HRData:hireDate>
<payrollData:salary>4000765.00</payrollData:salary>
<payrollData:taxes>3980765.27</payrollData:taxes>
</HRData:employee>
<HRData:employee>
<HRData:id>12345</HRData:id>
<HRData:name>Hugo Simpson</HRData:name>
<HRData:hireDate>2000-05-29</HRData:hireDate>
<payrollData:salary>82000.00</payrollData:salary>
<payrollData:taxes>16567.87</payrollData:taxes>
</HRData:employee>
</HRData:employees>
Don’t be fooled by the apparent complexity of this snippet. All I did is add the HRData and payrollData prefixes before each element name. I don’t know about you, but I’d rather keep the namespace prefixes as short as possible. To do this, you come up with a short prefix, possibly as short as one letter, and map that prefix to the real namespace name. For example, you might decide to use py for payrollData and hr for HRData:
<hr:employees xmlns:hr="HRData" xmlns:py="payrollData">
<hr:employee>
<hr:id>49982</hr:id>
<hr:name>Bart Simpson</hr:name>
<hr:hireDate>2000-07-04</hr:hireDate>
<py:salary>4000765.00</py:salary>
<py:taxes>3980765.27</py:taxes>
</hr:employee>
<hr:employee>
<hr:id>12345</hr:id>
<hr:name>Hugo Simpson</hr:name>
<hr:hireDate>2000-05-29</hr:hireDate>
<py:salary>82000.00</py:salary>
<py:taxes>16567.87</py:taxes>
</hr:employee>
</hr:employees>
The syntax for defining a namespace-prefix mapping is: xmlns:prefix=”namespace”
where
prefix is the short prefix you’ll use in the document and namespace is the
actual namespace name that the prefix refers to. Once you’ve defined the prefix,
you can use it in your document instead of writing out the entire namespace
name in front of each element name. When using namespaces, element and attribute
names have two parts: the prefix e.g. hr or py and the local name e.g. employee
or salary. The two parts together form the qualified name or QName, e.g. hr:employee
or py:salary.
Now you can easily create two different schemas, one that defines the HR types in the HRData namespace, and one that defines the payroll types in the payrollData namespace. The syntax you use to do this is part of XSD and is beyond the scope of this article.
Comments