Opening the package in OpenXML

Page 2 of 2
  1. Introduction
  2. Working with Documents

Working with Documents

This article was originally published on DNJ Online
DNJ Online

Working with documents

There are a couple of different approaches to working with Open XML. You can either create documents programmatically from scratch, or you can use an existing document as a template and modify it with programmatic content as needed. The second approach is often more convenient but it is still worth seeing how the raw approach works. This is how it is done, omitting the section that builds the actual XML to be saved:

public static void CreateDoc(string docPath)
{
//create the package
Package thePackage = Package.Open(docPath,
FileMode.Create, FileAccess.ReadWrite);
using (thePackage)
{
// create document.xml
Uri docUri = new Uri("/word/document.xml",
UriKind.Relative);
PackagePart docPart = thePackage.CreatePart(docUri,
docxContentType);

//build the XML document
XmlDocument xmlDoc = new XmlDocument();
//code to build the doc goes here....
//save the doc
xmlDoc.Save(docPart.GetStream(FileMode.Create,
FileAccess.Write));
//create the relationship part
thePackage.CreateRelationship(docUri,
TargetMode.Internal,docRelationshipType, "rID1");
thePackage.Flush();
}
}

This example presumes that the main document file requires no further relationships. In practice that is unlikely. If you type “Hello world” into Word 2007 and save the document it creates at least seven additional XML files; two for document properties which are related to the root package, and five which are related to document.xml and handle font and style tables, themes and other settings. All this makes the idea of starting with a template more attractive.

The Packaging API makes it easy to iterate through the parts and relationships within a document. Once you have instantiated a Package object, the GetParts() method returns a collection of all the PackageParts. You can explore the relationships for each part by calling PackagePart.GetRelationships() which returns a collection of PackageRelationship objects. Each PackageRelationship has a TargetUri property which is typically the Uri of one of the parts. The Package class also has a GetRelationships() method which returns the relationships for the root Package.

If you rename an Open XML document as a ZIP file and extract its contents, you will notice a tidy arrangement of folders and subfolders. This arrangement is incidental. The set of relationships for each part is entirely controlled by what is listed in its .rels file, not by the folder arrangement.

Our panel ‘Exploring the package’ (see below) shows an application which lets you explore the contents of an Open XML document without having to change the file type to ZIP. There is also an Update part button that writes edited XML back to the document. An application like this is easy to write and is a good way to learn the essentials of the Packaging API.

One feature you will notice is that Office 2007 does not include carriage returns in the XML it outputs. This saves a little space but makes the XML hard to read without additional formatting. We also noticed that a System.Windows.Forms TextBox has trouble working with what it sees as a very long line. The solution we found was to use a RichTextBox instead. Updating a Part using XmlDocument adds basic formatting as a side-effect, making the XML easier to follow.

Real-world usage

Developers who have struggled with parsing or generating RTF (Rich Text Format), or with Office binary formats, will find Open XML a joy in comparison. It enables many new scenarios for server-side manipulation of Office documents. Most of the work is in understanding the specification, not in the Packaging API itself. Microsoft has made available some code snippets for common tasks which you can find by searching Microsoft’s Web site for ‘2007 Open XML code snippets’. It is also worth downloading the Open XML specifications themselves. Despite their notorious length they are well presented with many XML samples.


Exploring the package

Open XML Packaging

The utility shown here lets you explore and update the contents of an Open XML document. When you select a document, a simple routine lists all its Parts by URI (the Package has already been opened in the same way as shown in the main article):

//add the root
theForm.lstParts.Items.Add(@"/");
//iterate through the parts
foreach (System.IO.Packaging.PackagePart part in
thePackage.GetParts())
{
theForm.lstParts.Items.Add(part.Uri.ToString());
}

When you select a Part in the list, another routine displays the XML content, if it is an XML file, and lists the relationships for that Part. There is not enough space to show the code here, but getting the relationships is a matter of calling GetRelationships either for the Package, if it is the root URI, or for the PackagePart. Here is how to retrieve a PackagePart from a string URI:

Uri theUri = new Uri(uriString,UriKind.Relative);
PackagePart thePart = thePackage.GetPart(theUri);

Note that the Parts which define relationships cannot have relationships of their own, and calling GetRelationships on such Parts throws an exception. Retrieving the XML content from a part is easily done:

StreamReader sr = new StreamReader(docPart.GetStream(
FileMode.Open, FileAccess.Read));
theForm.txtPart.Text = sr.ReadToEnd();

If you want to work with this as XML, you will probably want to instantiate an XmlDocument object, but that is not necessary if you just want to display the text. Saving the XML back is equally easy:

XmlDocument docXML = new XmlDocument();
try {
docXML.LoadXml(theForm.txtPart.Text);
// Save the XML back to its part.
docXML.Save(thePart.GetStream(
FileMode.Create, FileAccess.Write));
}
catch (XmlException exc)
{
MessageBox.Show("Error parsing XML: " + exc.Message);
}

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“C++: an octopus made by nailing extra legs onto a dog.” - Steve Taylor