Library tutorials & articles

Read and write Open XML files (MS Office 2007)

Page 2 of 3
  1. Introduction
  2. Excel 2007 Open XML Specifics
  3. Implementation

Excel 2007 Open XML Specifics

Excel 2007 extends on the basis of Open Packaging Conventions by adding its own application-specific XML types. Reference schemas for all XML files used in Office can be downloaded from MSDN, but note some things are still open to change until the final Excel 2007 release.

We just want to write / read worksheet data, so we need to look in folder “\xl\worksheets” inside XLSX file, where all the worksheets are located. For every worksheet there is a separate XML file; “sheet1.xml”, “sheet2.xml” and so on. When you open such file you will notice that all of the sheet data is inside <sheetData> element. For every row there is a <row> element, for every cell there is a element. Finally, value of the cell is stored in a element.
However, real world XML is never simple as schoolbook XML. You will notice that numbers get encoded as numbers inside element:

<c r="A1">
   <v>100</v>
</c>

However, string value (like “John”), also gets encoded as number:  

<c r="B1" t="s"> 
<v>0</v>
</c>

That is because MS Excel uses internal table of unique strings (for performance reasons). Zero is an index of that string in an internal table of strings and attribute t="s" tells us that underlying type is a string, not a number. So where is the table of unique strings located? It is in “\xl\sharedStrings.xml” XML file, and contains all strings used in entire workbook, not just specific worksheet.

This approach is used for many other things; cell styles, borders, charts, number formats etc. In fact, that becomes the major programming problem when working with XLSX files – updating and maintaining various tables of some unique Excel objects. In this article we will just write / read data values, but if you require some complex formatting you should probably be better using some commercial component which does all tedious work for you.

Comments

  1. 03 Jan 2009 at 10:06
    ilginç
  2. 10 Sep 2008 at 04:55
    Problem to create a MS Excel or Open office Calc file............ Both S Excel or Open office Calc are using the open xml file format. So we can unzip that archive and read the content including there. But the problem is creating MS Excel or Open office Calc file. I have the all files which want to contain inside the file. But how to create a suitable archive file. Can we use any stranded Zip library such as WinZip, WinRar, 7Zip, etc. How to create a single archive file (MS Excel or Open office) by using xml files. (Bundle the xml file is only the problem)
  3. 19 Oct 2007 at 09:03

    Not a bad post.  I was looking for a way to create Excel 2007 files from code and came accross the article.  However, seeing how much work was involved - I thought it would be much easier to create an Excel 2003 file and let my Office 2007 using colleagues open the file in compatibility mode.

     Fortunately, I came accross another article, which explains how to use the OfficeOpenXml package provided by Microsoft.  Using this namespace is a breeze!

     For more info - check here: http://www.codeproject.com/office/ExcelPackage.asp

     

  4. 24 May 2007 at 10:22
    Hallo I just installed Office 2007 and wanted to find out, how the xlsx-Format works. First I just unzipped an Excel file and tried to make sense of the structure on my own. But I failed. Your article helped my to understand the key concept behind the file structure. Thank you very much
  5. 08 Feb 2007 at 20:01

    At work I've been tasked with opening an xlsx file and extracting the information.

    Many thanks for this article which has made understanding the new file format easier.

  6. 14 Dec 2006 at 02:20

    Excellent article.  It will save me a least a couple of days of digging this information out from msdn.

  7. 01 Jan 1999 at 00:00

    This thread is for discussions of Read and write Open XML files (MS Office 2007).

Leave a comment

Sign in or Join us (it's free).

Zeljko Svedic Zeljko Svedic (Željko Švedic) works as a lead developer at GemBox Software. He primarily works on GemBox.Spreadsheet, a .NET component for reading and writing native XLS files. He got interested in...

Related discussion

Related podcasts

  • Vote of No Confidence

    In this episode Ward Bell (from IdeaBlade) and Jeremy Miller discuss the Vote of No Confidence on Microsoft's Entity Framework and what it means for the Alt.NET and .NET communities.This episode does not have a sponsor, so if you are interested in sponsoring the podcast please contact us.Ward and...

Events coming up

  • Dec 9

    GL.net Group Meeting - December 2009

    Gloucester, United Kingdom

    The beginning of this year holiday season will belong to mocks. Ronnie and Stephen will take us for a tour around exciting world of unit testing.

Want to stay in touch with what's going on? Follow us on twitter!