An XML processor is more commonly called a parser, since it simply parses XML and provides the application with any information it needs. There are quite a number of XML parsers available, many of which are free.
The main reason for creating all of these rules about writing
well-formed XML documents is so that we can create a computer program to read
in the data, and easily tell markup from information.
According to the XML specification (http://www.w3.org/TR/1998/REC-xml-19980210#sec-intro):
"A software module called an XML processor is used to read XML documents and
provide access to their content and structure. It is assumed that an XML processor
is doing its work on behalf of another module, called the application."
Some of the better known parsers are listed below.
Microsoft Internet Explorer Parser
Microsoft's first XML parser shipped with Internet Explorer 4 and implemented
an early draft of the XML specification. With the release of IE5, the XML implementation
was upgraded to reflect the XML version 1 specification. The latest version of
the parser (March 2000 Technology Preview Release) is available for download
from http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp.
In this book we'll be mainly using the IE5 version.
James Clark's Expat
Expat is an XML 1.0 parser toolkit written in C. More information can be found
at http://www.jclark.com/xml/expat.html
and Expat can be downloaded from ftp://ftp.jclark.com/pub/xml/expat.zip.
It is free for both private and commercial use.
Vivid Creations ActiveDOM
Vivid Creations (http://www.vivid-creations.com)
offers several XML tools, including ActiveDOM. ActiveDOM contains a parser similar
to the Microsoft parser and, although it is a commercial product, a demonstration
version may be downloaded from the Vivid Creations web site.
DataChannel XJ Parser
DataChannel, a business solutions software company, worked with Microsoft to
produce an early XML parser written in Java. Their website (http://xdev.datachannel.com/directory/xml_parser.html)
provides a link to get their most recent version. However, they are no longer
doing parser development. They have opted instead to use the xml4j parser from
IBM.
IBM xml4j
IBM's AlphaWorks site (http://www.alphaworks.ibm.com)
offers a number of XML tools and applications, including the xml4j parser. This
is another parser written in Java, available for free, though there are some
licensing restrictions regarding its use.
Apache Xerces
The Apache Software Foundation's Xerces sub-project of the Apache XML Project
(http://xml.apache.org/)
has resulted in XML parsers in Java and C++, plus a Perl wrapper for the C++
parser. These tools are in beta, they are free, and the distribution of the code
is controlled by the GNU Public License.
Errors in XML
As well as specifying how a parser should get the information out of an XML document,
it is also specified how a parser should deal with errors in XML. There are two
types of errors in the XML specification: errors and fatal errors.
- An error is simply a violation of the rules in the specification, where the results are undefined; the XML processor is allowed to recover from the error and continue processing.
- Fatal errors are more serious: according to the specification a parser is not allowed to continue as normal when it encounters a fatal error. (It may, however, keep processing the XML document to search for further errors.) Any error which causes an XML document to cease being well-formed is a fatal error.
The reason for this drastic handling of non-well-formed XML is simple: it
would be extremely hard for parser writers to try and handle "well-formedness"
errors, and it is extremely simple to make XML well-formed. (HTML does not force
documents to be as strict as XML does, but this is one of the reasons why web
browsers are so incompatible; they must deal with all of the errors they may
encounter, and try to figure out what the person who wrote the document was really
trying to code.)
But draconian error handling doesn't just benefit the parser writers; it also
benefits us when we're creating XML documents. If I write an XML document that
doesn't properly follow XML's syntax, I can find out right away and fix my mistake.
On the other hand, if the XML parser tried to recover from these errors, it may
misinterpret what I was trying to do, but I wouldn't know about it because no
error would be raised. In this case, bugs in my software would be much harder
to track down, instead of being caught right at the beginning when I was creating
my data.
Comments