Excel 2007 extends on the basis of Open Packaging Conventions by adding its own application-specific XML types. Reference schemas for all XML files used in Office can be downloaded from MSDN, but note some things are still open to change until the final Excel 2007 release.
We just want to write / read worksheet data, so we need to look in folder “\xl\worksheets” inside XLSX file, where all the worksheets are located. For every worksheet there is a separate XML file; “sheet1.xml”, “sheet2.xml” and so on. When you open such file you will notice that all of the sheet data is inside <sheetData>
element. For every row there is a <row> element, for every cell there is a element. Finally, value of the cell is stored in a element.
However, real world XML is never simple as schoolbook XML. You will notice that numbers get encoded as numbers inside element:
<c r="A1">
<v>100</v>
</c>
However, string value (like “John”), also gets encoded as number:
<c r="B1" t="s">
<v>0</v>
</c>
That is because MS Excel uses internal table of unique strings (for performance reasons). Zero is an index of that string in an internal table of strings and attribute t="s"
tells us that underlying type is a string, not a number. So where is the table of unique strings located? It is in “\xl\sharedStrings.xml” XML file, and contains all strings used in entire workbook, not just specific worksheet.
This approach is used for many other things; cell styles, borders, charts, number formats etc. In fact, that becomes the major programming problem when working with XLSX files – updating and maintaining various tables of some unique Excel objects. In this article we will just write / read data values, but if you require some complex formatting you should probably be better using some commercial component which does all tedious work for you.
Comments