Read and write Open XML files (MS Office 2007)

Page 2 of 3
  1. Introduction
  2. Excel 2007 Open XML Specifics
  3. Implementation

Excel 2007 Open XML Specifics

Excel 2007 extends on the basis of Open Packaging Conventions by adding its own application-specific XML types. Reference schemas for all XML files used in Office can be downloaded from MSDN, but note some things are still open to change until the final Excel 2007 release.

We just want to write / read worksheet data, so we need to look in folder “\xl\worksheets” inside XLSX file, where all the worksheets are located. For every worksheet there is a separate XML file; “sheet1.xml”, “sheet2.xml” and so on. When you open such file you will notice that all of the sheet data is inside <sheetData> element. For every row there is a <row> element, for every cell there is a element. Finally, value of the cell is stored in a element.
However, real world XML is never simple as schoolbook XML. You will notice that numbers get encoded as numbers inside element:

<c r="A1">
   <v>100</v>
</c>

However, string value (like “John”), also gets encoded as number:  

<c r="B1" t="s"> 
<v>0</v>
</c>

That is because MS Excel uses internal table of unique strings (for performance reasons). Zero is an index of that string in an internal table of strings and attribute t="s" tells us that underlying type is a string, not a number. So where is the table of unique strings located? It is in “\xl\sharedStrings.xml” XML file, and contains all strings used in entire workbook, not just specific worksheet.

This approach is used for many other things; cell styles, borders, charts, number formats etc. In fact, that becomes the major programming problem when working with XLSX files – updating and maintaining various tables of some unique Excel objects. In this article we will just write / read data values, but if you require some complex formatting you should probably be better using some commercial component which does all tedious work for you.

You might also like...

Comments

About the author

Zeljko Svedic Croatia (Hrvatska)

Zeljko Svedic (Željko Švedic) works as a lead developer at GemBox Software. He primarily works on GemBox.Spreadsheet, a .NET component for reading and writing native XLS files. He got interested...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Every language has an optimization operator. In C++ that operator is ‘//’”