XML and Web Services by design

This article was originally published in VSJ, which is now part of Developer Fusion.
XML and web services are generally believed to be the two most important “revolutionary” features that will change the way software is built. The revolution has been fast, but it’s not over just yet.

SQL Server 2005 has some remarkable new features and core support for XML. It’s not a “bolt on” afterthought any more – it is an intrinsic part of the design. There are two aspects to XML; the first is its basic role as a universal way of representing arbitrarily complex data, and the second is the group of platform-independent technologies such as web services based on SOAP or the XQuery language. SQL Server 2005 has basic XML support and uses several of the XML-based technologies.

The most basic facility is the new xml data type. This allows you to store XML data, documents or fragments in columns, variables or parameters. The XML data can be associated with an XML schema for validation and data typing. An XML data type without a schema is said to be untyped, and one that is validated is typed. Thus XML and all of the standard XML documents and custom documents can now be treated on an equal footing with ints and other built-in types. You can also execute SQL queries to return results as XML simply by adding the FOR XML clause to the SELECT. The TSQL OPENXML function allows access to XML data as if it were a relational rowset by presenting a rowset view of documents’ data. To speed up processing you can define an index on an xml column. Without an index any query on the XML instances stored in a column requires that the XML is parsed to perform the query and this can be time consuming. An XML index parses the XML data just once to create a row for each node specified in the data. Each row stores the tag name, node value, node type, path of node in parse tree and so on. Enough information to construct XML results for any query without having to parse each XML document stored in the xml column once again.

All of this makes it possible to publish, store and process XML data. It is worth keeping in mind that XML, being character based, isn’t the most efficient form of storage. There is also the almost hidden process of schema validation that can slow things down. For example, if you modify a typed xml column each document will be checked for validity against the schema. However, as long as you understand the potential cost of using XML rather than a native data format, these should be avoidable problems.

XML has its own technology for performing queries – XQuery – and you can make use of this within TSQL to query data stored in xml data types. The implementation is only a subset of the XQuery Working Draft (2004), simply because the final W3C standard isn’t complete yet. XQuery is based on another XML technology – XPath.

One problem caused by XQuery’s unfinished state is that it has no data modification capabilities. To solve this problem Microsoft has extended it with the addition of XML DML (Data Modification Language). The additions are simply “insert”, “delete” and “replace value of”, and so are hardly likely to cause upgrading problems if and when XQuery is extended to include modification. For the moment DML gives us the ability to work with xml data types without resorting to even more ad-hoc methods.

XML file as data table
Visual Studio has lots of XML facilities – here an XML file is displayed as a data table

Being able to work with XML is all well and good, but it begs the question of why you would want to? The main reason that is likely to crop up in practice is the need to work with data involved in web services, where XML in the form of a SOAP request/response packet is used to transfer data over the wire. As such, generating and processing XML structured data is likely to be increasingly important, and being able to work with it in native form rather than having to parse it into relational form and then back again is clearly an advantage. However, SQL Server 2005 also has the ability to implement an XML web service without the need for help from a third party such as IIS. This opens up a huge range of new possibilities.

The most obvious is that SQL Server 2005 can now provide data access to remote clients via web services. What is more, because of the platform independence of the SOAP protocol, those clients can be in any language on any platform. In addition, as other applications gain support for XML web services the data exchange, query and workflow can happen without the need for substantial custom programming. This makes it all the more essential that SQL Server can store and work with XML without having to convert it to and from a relational format.

In addition to being able to expose data as a web service to a range of clients, SQL Server also now has the ability to consume web services, i.e. to act as a client. This makes it possible to design systems where data is automatically collected and updated and where multiple SQL Server databases talk to each other using web services. Specifically, one SQL Server can send another SOAP messages to run T-SQL batch statements or stored procedures complete with parameter values.

How easy is it to create a SQL web service? The answer is that it’s pretty trivial and requires almost no programming. All you have to do is create an HTTP endpoint and define its properties and methods that are part of the service (you can also create T-SQL, TCP, mirroring and service broker endpoints). To create an endpoint you use the Create Endpoint statement, the simplest example of which is:

CREATE ENDPOINT endpointname
	STATE=STARTED
AS HTTP
(
PATH=path,
AUTHENICATION=(INTEGRATED
PORTS=(CLEAR),
SITE=url
)
FOR SOAP
(
WEBMETHOD 'methodname'
(NAME='functionname'),
		BATCHES = DISABLED,
		WSDL = DEFAULT,
		DATABASE = 'databasename',
		NAMESPACE = 'namespace'
	)
The first part of the instruction sets up the HTTP server – SITE = URL of server, PATH = Path to director, PORTS specifies whether or not to use encryption, and AUTHENTICATION specifies, in this case, Windows security. The second part (starting at FOR SOAP) specifies the details of the web method that the client can call. The methodname is the name the client uses to call the function defined by functionname acting on databasename using namespace. BATCHES specifies whether or not the service supports ad-hoc SQL queries and WSDL specifies if it supports WSDL or a custom implementation. Following the creation of the endpoint a client can submit a SOAP request to execute the function and receive back its results. As well as CREATE ENDPOINT there are also ALTER ENDPOINT and DROP ENDPOINT commands.

XML is also faster in .NET 2.0
XML is also faster in .NET 2.0 (Source Microsoft)

Web services are easy to create, but do keep in mind that SOAP is an XML-based technology and hence the same potential problems arise in sending large quantities of data using XML as in storing them. There are also security concerns about connecting a database server directly to the Internet. In particular, the BATCHES capability to allow ad-hoc SQL commands to be used is probably best reserved for special-purpose internal use.

As well as XML facilities in SQL Server, there are, of course, XML facilities in the CLR, which can be used from within SQL Server. You can write SQLCLR functions in managed code that pass and process XML. This gives you access to the XSL transformation functions which make processing XML easy and efficient.

The addition of XML to SQL Server as a native data type and the availability of relevant XML technologies such as web services are without doubt a good thing, but they also present challenges. If you see XML support as something that enables easy use of web services, and as an “interface” with the rest of the XML-using applications, then it’s a useful tool that won’t cause many problems.

There is, however, a deeper, more complex and more controversial aspect concerning the addition of XML to the relational database model. It is worth knowing that database theoreticians are still debating the impact of, and the right way to handle, the hierarchical data component that XML introduces. At a superficial level, XML breaks the clean, disciplined relational model and introduces the possibility of complicating data management. If the data fits the relational model, then for long-term storage and use it is almost certainly better to parse the XML into relational form. If, however, for most of the time you treat the XML as being “atomic” items that you can query to find individual values then you are probably safe. The wider and more complicated issue of how XML and the relational modelling of data structure might fit together more intimately is still a matter for research.


Dr Mike James, editor of VSJ, has over 20 years of programming experience, both as a developer and lecturer. He has written numerous books and articles on programming, and his PhD is in computer science.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“I invented the term Object-Oriented, and I can tell you I did not have C++ in mind.” - Alan Kay