Read and write Open XML files (MS Office 2007)

Implementation

Our demo is a Windows Forms application (see Picture 2), written in C# using Visual Studio 2005. Since there is no support for ZIP files in .NET Framework 2.0 (only for ZIP algorithm), our demo is using an open-source ZIP library called SharpZipLib. For demonstration purposes we will extract entire ZIP files to TEMP folder, so we can examine contents of that folder and files while debugging demo application. In real world application you may want to avoid extracting to temporary folder and just read to / write from ZIP file directly.

For XML processing, the choice is simple. For reading XML files we use XmlTextReader class and for writing we use XmlTextWriter class. Both come with .NET Framework, but you can also use any other XML processing library.



Picture 2: Demo application in action.

Data reading

We want to read a simple “In.xlsx” file (in the “Input” folder) and copy its contents to the DataTable. That file contains a list of people with their first and last names (text values) and their IDs (number values). When “Read input .xlsx file” button in clicked, the following code is executed:

  private void buttonReadInput_Click(object sender, EventArgs e)
{
// Get the input file name from the text box.
string fileName = this.textBoxInput.Text;
// Delete contents of the temporary directory.
ExcelRW.DeleteDirectoryContents(ExcelRWForm.tempDir);
// Unzip input XLSX file to the temporary directory.
ExcelRW.UnzipFile(fileName, ExcelRWForm.tempDir);
// Open XML file with table of all unique strings used in the workbook..
FileStream fs = newFileStream(ExcelRWForm.tempDir + @"\xl\sharedStrings.xml", FileMode.Open, FileAccess.Read);
// ..and call helper method that parses that XML and returns an array of strings.
ArrayList stringTable = ExcelRW.ReadStringTable(fs);
// Open XML file with worksheet data..
fs = newFileStream(ExcelRWForm.tempDir + @"\xl\worksheets\sheet1.xml", FileMode.Open, FileAccess.Read);
// ..and call helper method that parses that XML and fills DataTable with values.
ExcelRW.ReadWorksheet(fs, stringTable, this.data);
}

Nothing unusual happens here. XLSX file is unzipped to the TEMP folder and then necessary XML parts (now files) are processed. File “sharedStrings.xml” contains global table of unique strings while file “sheet1.xml” contains data for the first sheet. Helper methods are pretty straightforward XML reading code -- you can download demo application code to examine them in more detail.

If everything is OK, after the button click all data will show up in the DataGridView.
  
Data writing

Now we want to write data from a DataTable to the “Out.xlsx” file in the “Output” folder. You can change some data or add some new rows in the DataGridView. When “Write output .xlsx file” button is clicked, the following code is executed:

private void buttonWriteOutput_Click(object sender, EventArgs e)
{
// Get the output file name from the text box.
string fileName = this.textBoxOutput.Text;
// Delete contents of the temporary directory.
ExcelRW.DeleteDirectoryContents(ExcelRWForm.tempDir);
// Unzip template XLSX file to the temporary directory.
ExcelRW.UnzipFile(ExcelRWForm.templateFile, ExcelRWForm.tempDir);
// We will need two string tables; a lookup Hashtable for fast searching and
// an ordinary ArrayList where items are sorted by their index.
Hashtable lookupTable;
// Call helper methods which creates both tables from input data.
ArrayList stringTable = ExcelRW.CreateStringTables(this.data, out lookupTable);
// Create XML file..
FileStream fs = newFileStream(ExcelRWForm.tempDir + @"\xl\sharedStrings.xml", FileMode.Create);
// ..and fill it with unique strings used in the workbook
ExcelRW.WriteStringTable(fs, stringTable);
// Create XML file..
fs = newFileStream(ExcelRWForm.tempDir + @"\xl\worksheets\sheet1.xml", FileMode.Create);
// ..and fill it with rows and columns of the DataTable.
ExcelRW.WriteWorksheet(fs, this.data, lookupTable);
// ZIP temporary directory to the XLSX file.
ExcelRW.ZipDirectory(ExcelRWForm.tempDir, fileName);
// If checkbox is checked, show XLSX file in Excel 2007.
if (this.checkBoxOpenFile.Checked)
System.Diagnostics.Process.Start(fileName);
}

This time code is a bit more complicated. In order not to generate all necessary parts needed for XLSX file, we decide to use a template file. We extract template file to the temporary folder and then just change XML parts containing shared string table and worksheet data. All other parts, relationships and content types stay the same -- so we don’t need to generate any of that. Note that we use two string tables; a lookup Hashtable for fast searching and an ordinary ArrayList where items are sorted by their index. We could pull it out only with ArrayList but then we would need to search entire ArrayList every time we add a new string (to check if it is already there). CreateStringTables() helper method builds both string tables, WriteStringTable() helper method writes string table XML and WriteWorksheet() helper method writes worksheet data XML.

Again, download demo application code to examine helper methods in more detail.

Download links

You can download the latest version of the demo application (together with the C# source code) from here: http://www.gemboxsoftware.com/Excel2007/ASPNet/DemoApp.htm
Current version works with Beta version of Microsoft Excel 2007. If necessary, we will be providing an update for the final Excel 2007 version as soon as we get a hold on it.
  
Alternative ways

As always in programming, there is more than one method to achieve the same thing.

You could use Office automation to start an instance of Excel 2007 (or any other Office application) and then use interop calls to create a document and save it. However, using automation has some drawback I have already written about.

Next version of .NET Framework (codename WinFX) will have support for Open Packaging Conventions (package handling and navigating the relationships) but it seems there will be no support for accessing application specific data so you will still need to process XML parts manually.
As another option, you could use some third party component which will come with support for Open XML format. This will probably cost you some money but has advantage that usually more than one format (for example; XLS, XLSX, CSV) are supported within the same API, so your application will be able to target different file formats using the same code.

You might also like...

Comments

About the author

Zeljko Svedic Croatia (Hrvatska)

Zeljko Svedic (Željko Švedic) works as a lead developer at GemBox Software. He primarily works on GemBox.Spreadsheet, a .NET component for reading and writing native XLS files. He got interested...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Engineers are all basically high-functioning autistics who have no idea how normal people do stuff.” - Cory Doctorow