Library tutorials & articles

Read and write Open XML files (MS Office 2007)

Implementation

Our demo is a Windows Forms application (see Picture 2), written in C# using Visual Studio 2005. Since there is no support for ZIP files in .NET Framework 2.0 (only for ZIP algorithm), our demo is using an open-source ZIP library called SharpZipLib. For demonstration purposes we will extract entire ZIP files to TEMP folder, so we can examine contents of that folder and files while debugging demo application. In real world application you may want to avoid extracting to temporary folder and just read to / write from ZIP file directly.

For XML processing, the choice is simple. For reading XML files we use XmlTextReader class and for writing we use XmlTextWriter class. Both come with .NET Framework, but you can also use any other XML processing library.



Picture 2: Demo application in action.

Data reading

We want to read a simple “In.xlsx” file (in the “Input” folder) and copy its contents to the DataTable. That file contains a list of people with their first and last names (text values) and their IDs (number values). When “Read input .xlsx file” button in clicked, the following code is executed:

  private void buttonReadInput_Click(object sender, EventArgs e)
{
// Get the input file name from the text box.
string fileName = this.textBoxInput.Text;
// Delete contents of the temporary directory.
ExcelRW.DeleteDirectoryContents(ExcelRWForm.tempDir);
// Unzip input XLSX file to the temporary directory.
ExcelRW.UnzipFile(fileName, ExcelRWForm.tempDir);
// Open XML file with table of all unique strings used in the workbook..
FileStream fs = newFileStream(ExcelRWForm.tempDir + @"\xl\sharedStrings.xml", FileMode.Open, FileAccess.Read);
// ..and call helper method that parses that XML and returns an array of strings.
ArrayList stringTable = ExcelRW.ReadStringTable(fs);
// Open XML file with worksheet data..
fs = newFileStream(ExcelRWForm.tempDir + @"\xl\worksheets\sheet1.xml", FileMode.Open, FileAccess.Read);
// ..and call helper method that parses that XML and fills DataTable with values.
ExcelRW.ReadWorksheet(fs, stringTable, this.data);
}

Nothing unusual happens here. XLSX file is unzipped to the TEMP folder and then necessary XML parts (now files) are processed. File “sharedStrings.xml” contains global table of unique strings while file “sheet1.xml” contains data for the first sheet. Helper methods are pretty straightforward XML reading code -- you can download demo application code to examine them in more detail.

If everything is OK, after the button click all data will show up in the DataGridView.
  
Data writing

Now we want to write data from a DataTable to the “Out.xlsx” file in the “Output” folder. You can change some data or add some new rows in the DataGridView. When “Write output .xlsx file” button is clicked, the following code is executed:

private void buttonWriteOutput_Click(object sender, EventArgs e)
{
// Get the output file name from the text box.
string fileName = this.textBoxOutput.Text;
// Delete contents of the temporary directory.
ExcelRW.DeleteDirectoryContents(ExcelRWForm.tempDir);
// Unzip template XLSX file to the temporary directory.
ExcelRW.UnzipFile(ExcelRWForm.templateFile, ExcelRWForm.tempDir);
// We will need two string tables; a lookup Hashtable for fast searching and
// an ordinary ArrayList where items are sorted by their index.
Hashtable lookupTable;
// Call helper methods which creates both tables from input data.
ArrayList stringTable = ExcelRW.CreateStringTables(this.data, out lookupTable);
// Create XML file..
FileStream fs = newFileStream(ExcelRWForm.tempDir + @"\xl\sharedStrings.xml", FileMode.Create);
// ..and fill it with unique strings used in the workbook
ExcelRW.WriteStringTable(fs, stringTable);
// Create XML file..
fs = newFileStream(ExcelRWForm.tempDir + @"\xl\worksheets\sheet1.xml", FileMode.Create);
// ..and fill it with rows and columns of the DataTable.
ExcelRW.WriteWorksheet(fs, this.data, lookupTable);
// ZIP temporary directory to the XLSX file.
ExcelRW.ZipDirectory(ExcelRWForm.tempDir, fileName);
// If checkbox is checked, show XLSX file in Excel 2007.
if (this.checkBoxOpenFile.Checked)
System.Diagnostics.Process.Start(fileName);
}

This time code is a bit more complicated. In order not to generate all necessary parts needed for XLSX file, we decide to use a template file. We extract template file to the temporary folder and then just change XML parts containing shared string table and worksheet data. All other parts, relationships and content types stay the same -- so we don’t need to generate any of that. Note that we use two string tables; a lookup Hashtable for fast searching and an ordinary ArrayList where items are sorted by their index. We could pull it out only with ArrayList but then we would need to search entire ArrayList every time we add a new string (to check if it is already there). CreateStringTables() helper method builds both string tables, WriteStringTable() helper method writes string table XML and WriteWorksheet() helper method writes worksheet data XML.

Again, download demo application code to examine helper methods in more detail.

Download links

You can download the latest version of the demo application (together with the C# source code) from here: http://www.gemboxsoftware.com/Excel2007/ASPNet/DemoApp.htm
Current version works with Beta version of Microsoft Excel 2007. If necessary, we will be providing an update for the final Excel 2007 version as soon as we get a hold on it.
  
Alternative ways

As always in programming, there is more than one method to achieve the same thing.

You could use Office automation to start an instance of Excel 2007 (or any other Office application) and then use interop calls to create a document and save it. However, using automation has some drawback I have already written about.

Next version of .NET Framework (codename WinFX) will have support for Open Packaging Conventions (package handling and navigating the relationships) but it seems there will be no support for accessing application specific data so you will still need to process XML parts manually.
As another option, you could use some third party component which will come with support for Open XML format. This will probably cost you some money but has advantage that usually more than one format (for example; XLS, XLSX, CSV) are supported within the same API, so your application will be able to target different file formats using the same code.

Comments

  1. 03 Jan 2009 at 10:06
    ilginç
  2. 10 Sep 2008 at 04:55
    Problem to create a MS Excel or Open office Calc file............ Both S Excel or Open office Calc are using the open xml file format. So we can unzip that archive and read the content including there. But the problem is creating MS Excel or Open office Calc file. I have the all files which want to contain inside the file. But how to create a suitable archive file. Can we use any stranded Zip library such as WinZip, WinRar, 7Zip, etc. How to create a single archive file (MS Excel or Open office) by using xml files. (Bundle the xml file is only the problem)
  3. 19 Oct 2007 at 09:03

    Not a bad post.  I was looking for a way to create Excel 2007 files from code and came accross the article.  However, seeing how much work was involved - I thought it would be much easier to create an Excel 2003 file and let my Office 2007 using colleagues open the file in compatibility mode.

     Fortunately, I came accross another article, which explains how to use the OfficeOpenXml package provided by Microsoft.  Using this namespace is a breeze!

     For more info - check here: http://www.codeproject.com/office/ExcelPackage.asp

     

  4. 24 May 2007 at 10:22
    Hallo I just installed Office 2007 and wanted to find out, how the xlsx-Format works. First I just unzipped an Excel file and tried to make sense of the structure on my own. But I failed. Your article helped my to understand the key concept behind the file structure. Thank you very much
  5. 08 Feb 2007 at 20:01

    At work I've been tasked with opening an xlsx file and extracting the information.

    Many thanks for this article which has made understanding the new file format easier.

  6. 14 Dec 2006 at 02:20

    Excellent article.  It will save me a least a couple of days of digging this information out from msdn.

  7. 01 Jan 1999 at 00:00

    This thread is for discussions of Read and write Open XML files (MS Office 2007).

Leave a comment

Sign in or Join us (it's free).

Zeljko Svedic Zeljko Svedic (Željko Švedic) works as a lead developer at GemBox Software. He primarily works on GemBox.Spreadsheet, a .NET component for reading and writing native XLS files. He got interested in...
AddThis

Related discussion

Related podcasts

  • Vote of No Confidence

    In this episode Ward Bell (from IdeaBlade) and Jeremy Miller discuss the Vote of No Confidence on Microsoft's Entity Framework and what it means for the Alt.NET and .NET communities.This episode does not have a sponsor, so if you are interested in sponsoring the podcast please contact us.Ward and...

Events coming up

  • Nov 18

    15 Minutes of Fame

    Dresher, United States

    This is a yearly tradition. We select 10 of the favorite speakers from monthly meetings, code camps, and hands on labs. Each one does a 15 minute talk on their favorite .NET technology. This is our 10th anniversary so we plan a gala event with special prizes and refreshments.

We'd love to hear what you think! Submit ideas or give us feedback