Inside Open XML

Extending Open XML

This article was originally published on DNJ Online
DNJ Online

Extending Open XML

It should already be apparent that you can manipulate Open XML documents extensively without the help of Office 2007. The .NET Framework makes it easy to reach inside OPC packages and XML documents and make changes, and as we have seen the use of relationship parts means that many modifications, such as altering the order of slides in a presentation or changing the style of a document, can be made without getting to grips with application-specific markup languages such as WordprocessingML.

What is also apparent is that consumer applications need only concern themselves with the parts within the package that are relevant to them, leaving other parts untouched. An image editor, for example, could reach inside the package shown in Figure 1 and edit image1.jpeg without affecting, or indeed having to know anything about, any of the other parts.

Open XML also supports Custom XML Data Storage parts which can contain XML data conforming to any schema you want. A part could contain data about a contract, for example, such as buyer and seller names and addresses, property details and so forth. Once the appropriate relationships have been set up within the package, this data can be dropped into the document where required and any changes to the custom data will be automatically reflected in the document. Furthermore, documents can contain more than one Custom XML Data Storage part, each tailored to the needs of a different application. This opens up many opportunities for document manipulation and workflow control.

Support for Open XML

The primary purpose of Open XML is to create a file format that is based on industry standards (namely ZIP and XML) and capable of accurately storing the full complexity of Word, Excel and PowerPoint documents since Office 2000. With this in mind, Microsoft has produced the Compatibility Pack.

Once this is installed, users of Office XP or Office 2003 will actually see the new Open XML file formats listed in their Save, Save As and Open dialog boxes. Office 2000 users can load Open XML documents by double-clicking them but have to convert saved files by right-clicking them in Windows Explorer and then selecting the appropriate Open XML file format from the Save As option.

The Compatibility Pack is a free download from the Microsoft Web site and works with Word Viewer 2003 and Excel Viewer 2003 as well. The new PowerPoint Viewer 2007 allows you to view any presentation produced by PowerPoint 97 onward. These are also free but do not allow you to edit or save documents.

Office 2007 will open documents created by Office 97 through to Office 2003 in a ‘compatibility mode’ from where they can be saved to either their original file format or as Open XML. If you want to convert a large number of files automatically then you need the Office File Converter which comes as part of the Office Migration Planning Manager (OMPM). You can download this from the Microsoft TechNet Web site.

It is early days but a number of third parties have already announced their support for Open XML. Corel has said that WordPerfect Office will support Open XML sometime this year, while Novell has announced that its version of the open source application OpenOffice.org will support Open XML, and that it will submit the relevant code back to the OpenOffice.org project.

Novell’s announcement is particularly interesting as the native file format for OpenOffice.org is Open Document Format for Office Applications (usually referred to as ODF or OpenDocument). This is an OASIS standard that was approved as an ISO/IEC standard in May 2006, and is also supported by Sun Microsystem’s StarOffice suite.

ODF, like Open XML, aims to encapsulate documents, spreadsheets and presentations, and does so by combining XML and binary files in a ZIP archive. However ODF content markup is much more like XHTML and does not attempt to capture all the information created by older versions of Microsoft Office. ODF does support custom metadata, but doesn’t have anything analogous to the relationship parts of Open XML.

There is an open source project at http://odf-converter.sourceforge.net that is developing an Open XML/ODF Translator Add-in for Office which allows Microsoft Office XP, 2003 and 2007 applications to open and save ODF files. It also promises to be able to handle batch conversions.

There is much and often heated debate over the relative merits of Open XML and ODF, but it is clear that the aims of the two specifications are different. Certainly, there is room for multiple standards – witness the JPEG, PNG and CGM image formats which are all ISO/IEC standards. As we said earlier, Ecma has submitted Open XML for consideration as an ISO/IEC standard, which would put it on an equal footing to ODF. You can submit comments on this to the BSI at http://www.microsoft.co.uk/openxml/.

Politics aside, Open XML opens up many new possibilities for office automation and, as we shall see over the page, is well supported by the .NET Framework.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“A computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are, in short, a perfect match” - Bill Bryson