Library tutorials & articles

Read and write Open XML files (MS Office 2007)

Introduction

Introduction

With Office 2007, Microsoft decided to change default application formats from old, proprietary, closed formats (DOC, XLS, PPT) to new, open and standardized XML formats (DOCX, XLSX and PPTX). New formats share some similarities with old Office XML formats (WordML, SpreadsheetML) and some similarities with competing OpenOffice.org OpenDocument formats, but there are many differences. Since new formats will be default in Office 2007 and Microsoft Office is the most predominant office suite, these formats are destined to be popular and you will probably have to deal with them sooner or later.

This article will explain basics of Open XML file format and specifically XLSX format, the new format for Excel 2007. Presented is a demo application which writes / reads tabular data to / from XLSX files. Application is written in C# using Visual Studio 2005. Created XLSX files can be opened using Excel 2007 Beta (we used build 12.0.3820.1003).

Microsoft Open XML format

Every Open XML file is essentially a ZIP archive containing many other files. Office-specific data is stored in multiple XML files inside that archive. This is in direct contrast with old WordML and SpreadsheetML formats which were single, non-compressed XML files. Although more complex, new approach offers few benefits:

  • You don’t need to process entire file in order to extract specific data.
  • Images and multimedia are now encoded in native format, not as text streams.
  • Files are smaller as a result of compression and native multimedia storage.
In Microsoft’s terminology, Open XML ZIP file is called a package. Files inside that package are called parts. It is important to know that every part has a defined content type and there are no default type presumptions based on the file extension. Content type can describe anything; application XML, user XML, images, sounds, video or any other binary objects. Every part must be connected to some other part using a relationship. Inside package are special XML files with “.rels” extension which define relationship between parts. There is also a start part (sometimes called “root”, which is a bit misleading because graph containing all parts doesn’t have to be a tree structure), so entire structure looks like in a Picture 1.
  

Picture 1: Parts and relations inside XLSX file.

To cut a long story short, in order to read the data from an Open XML file you need to:
1)    Open package as a ZIP archive – any standard ZIP library will do.
2)    Find parts that contain data you want to read – you can navigate through relationship graph (more complex) or you can presume that certain parts have defined name and path (Microsoft can change that in the future).
3)    Read parts you are interested in – using standard XML library (if they are XML) or some other method (if they are images, sounds or of some other type).

On the other side, if you want to create a new Open XML file, you need to:
1)    Create/get all necessary parts – by using some standard XML library (if they are XML), by copying them or by using some other method.
2)    Create all relationships – create “.rels” files.
3)    Create content types – create “[Content_Types].xml” file.
4)    Package everything into a ZIP file with appropriate extension (DOCX, XLSX or PPTX) – any standard ZIP library will do.
The whole story about packages, parts, content types and relations is the same for all Open XML documents (regardless of they originating application) and Microsoft refers to it as Open Packaging Conventions.

Comments

  1. 03 Jan 2009 at 10:06
    ilginç
  2. 10 Sep 2008 at 04:55
    Problem to create a MS Excel or Open office Calc file............ Both S Excel or Open office Calc are using the open xml file format. So we can unzip that archive and read the content including there. But the problem is creating MS Excel or Open office Calc file. I have the all files which want to contain inside the file. But how to create a suitable archive file. Can we use any stranded Zip library such as WinZip, WinRar, 7Zip, etc. How to create a single archive file (MS Excel or Open office) by using xml files. (Bundle the xml file is only the problem)
  3. 19 Oct 2007 at 09:03

    Not a bad post.  I was looking for a way to create Excel 2007 files from code and came accross the article.  However, seeing how much work was involved - I thought it would be much easier to create an Excel 2003 file and let my Office 2007 using colleagues open the file in compatibility mode.

     Fortunately, I came accross another article, which explains how to use the OfficeOpenXml package provided by Microsoft.  Using this namespace is a breeze!

     For more info - check here: http://www.codeproject.com/office/ExcelPackage.asp

     

  4. 24 May 2007 at 10:22
    Hallo I just installed Office 2007 and wanted to find out, how the xlsx-Format works. First I just unzipped an Excel file and tried to make sense of the structure on my own. But I failed. Your article helped my to understand the key concept behind the file structure. Thank you very much
  5. 08 Feb 2007 at 20:01

    At work I've been tasked with opening an xlsx file and extracting the information.

    Many thanks for this article which has made understanding the new file format easier.

  6. 14 Dec 2006 at 02:20

    Excellent article.  It will save me a least a couple of days of digging this information out from msdn.

  7. 01 Jan 1999 at 00:00

    This thread is for discussions of Read and write Open XML files (MS Office 2007).

Leave a comment

Sign in or Join us (it's free).

Zeljko Svedic Zeljko Svedic (Željko Švedic) works as a lead developer at GemBox Software. He primarily works on GemBox.Spreadsheet, a .NET component for reading and writing native XLS files. He got interested in...

Related discussion

Related podcasts

  • Vote of No Confidence

    In this episode Ward Bell (from IdeaBlade) and Jeremy Miller discuss the Vote of No Confidence on Microsoft's Entity Framework and what it means for the Alt.NET and .NET communities.This episode does not have a sponsor, so if you are interested in sponsoring the podcast please contact us.Ward and...

Events coming up

  • Dec 9

    GL.net Group Meeting - December 2009

    Gloucester, United Kingdom

    The beginning of this year holiday season will belong to mocks. Ronnie and Stephen will take us for a tour around exciting world of unit testing.

Want to stay in touch with what's going on? Follow us on twitter!