All .NET developers love objects, for the simple reason that we have to. After all, .NET is exclusively the preserve of the OO developer, so if you’re not writing “classy” code then something is seriously wrong. Unfortunately, though, objects are like snowmen: they live happily for a brief period of time before disappearing into the spring sunshine.
In many situations the transient nature of objects is not a cause for concern. You write code to instantiate the object; you use it for a few seconds or minutes; and then you casually discard it, leaving it to the tender mercies of the garbage collector.
There are cases, though, where this approach simply doesn’t work, such as when managing session state in an ASP.NET Web Farm; storing users’ Profile information (again in ASP.NET); moving an object between application tiers via a .NET remoting or WCF call; or when dealing with long-running workflows in Windows Workflow Foundation. In all of these cases, you have to take a snapshot of the object and then do something with it, such as transmit it over the wire or persist it into a store. Then, at some future point in time, you will instantiate a new object and then populate its state using the information that you saved earlier.
This notion of taking a snapshot of an object’s state is well documented in a number of design patterns, such as the Memento pattern as catalogued by the Gang of Four (GoF). Similarly, in his book “Patterns of Enterprise Application Architecture”, Martin Fowler identifies a structural pattern, named Serialized LOB (large object), which relates to storing object graphs in a single column in a database. He then goes on to describe the use of Serialized LOB in further patterns, such as Database Session State, which is a pattern that will be familiar to anyone who’s used ASP.NET’s SQL Server-based session store.
So in this article, I’ll take you on a guided tour of the many serialization features that exist in the .NET Framework that provide support for Serialized LOB.
A quick introduction to Serialized LOB
Serialized LOB comes into its own when you want to store an object graph into a persistent store, but either the database schema becomes too complex to flatten the object into multiple columns, or you want to store heterogeneous objects in a table. As an example of this latter case, Figure 1 shows the table used by Windows Workflow Foundation’s SQL persistence service to hold the state of persisted workflows.
Figure 1: Using Serialized LOB in Windows Workflow Foundation
Note how the entire state of the Workflow is held in a single image column (highlighted in the diagram) in an opaque binary format. This enables the persistence service to store any type of workflow in the table, with the proviso that the workflow and all of its parts can be serialized. Of course, this opaque binary format makes it impossible to search within the data, but in the case of WF this is not a problem: the workflow runtime has no need to do anything other than load the persisted workflow at some later point in time. This constraint is something that you need to bear in mind, though, if you intend to use Serialized LOB in your applications and you want to be able to search within the serialised object data.
“Surely, storing an object’s state is easy?”
Serialisation support has been baked into the .NET Framework since the first release. To begin with you mark the type that you want to serialise with the SerializableAttribute, and then you use a formatter to perform the serialisation.
The code in Listing 1 highlights the simplicity of this approach. It demonstrates the most basic approach to serialisation, emitting a stream of binary for the object. To be precise, the binary formatter writes all the information that it will need to re-create the object at a later time. Thus, it stores the full assembly and type name along with a serialised representation of the public and private fields of the type. Figure 2 shows an example of how the BinaryFormatter emits its output, highlighting the fact that it preserves type fidelity by storing the full information for the type and its assembly version, as well as the actual data for the fields.
Figure 2: Unpicking the output stream
The flip side of serialising objects is being able to recreate them. This de-serialisation process is as trivial as the original serialisation, involving calling the BinaryFormatter’s Deserialize method, passing in the stream that you want to read from. This method will return a reference to the newly created object.
“What about using text for the output format?”
When discussing Serialized LOB, Fowler introduces the notion of using character-based representations to hold the data, specifically thinking of the use of XML. The .NET Framework has full support for serialising objects to and from XML and other textual formats through the use of the following classes:
- SoapFormatter – This formatter, from the System.Runtime.Serialization.Formatters.Soap namespace, emits SOAP representations. However, the SoapFormatter should now be considered obsolete and you should use the XmlSerializer instead if you wish to produce XML that conforms to SOAP 1.1.
- XmlSerializer – This powerful class can be used to produce XML representations of your objects, although it can only serialize public fields and properties. You use attributes to control how the object’s fields and properties are mapped as elements or attributes, or whether they are ignored. One advantage of the XmlSerializer is the tooling support provided by XSD.EXE, which can be used to create schemas from your .NET types or, more significantly, create .NET types from schema.
- DataContractSerializer – Introduced in .NET Framework 3.0, the DataContractSerializer is primarily designed to work the WCF data contracts. However, it can be used as a general purpose serialisation mechanism for types that are marked with the SerializableAttribute.
- NetDataContractSerializer – A specialised form of XmlObjectSerializer that emits .NET type information into the XML, this serialisation mechanism is again intended for use with WCF data contracts.
- DataContractJsonSerializer – JavaScript Object Notation (JSON) is a fashionable way to send serialised representations of objects between browsers and servers when working with AJAX, as it tends to be lighter in weight than full XML.
- XamlWriter/XamlReader – Many of you will be familiar with eXtensible Application Markup Language (XAML), from using it with WPF or WF. It is yet another serialisation mechanism that you can use.
Listing 2 shows the default output of serialising our simple Pattern object using the various xxxDataContractSerializers and XamlWriter, so you can compare and contrast the differences.
One thing to note from Listing 2 is that only the NetDataContractSerializer preserves the full type information in the generated output.
I won’t be discussing the text-based serialisation mechanisms in much detail in this article, as in general you’ll tend to use the BinaryFormatter when implementing Serialized LOB. Suffice to say that, with the exception of XamlWriter, each of the above serialisation mechanisms is highly configurable through the use of attributes.
Serializing object graphs
You’ll have noticed that Listing 1 contains a Pattern type, because we love talking about design patterns! Now each of the GoF Design Patterns is great in and of itself, but their real power comes into play when you examine the interrelationships between them. This is so important that the GoF even put a funky relationship diagram on the inside back cover of their book.
Figure 3: The interactive GoF relationship program
Figure 3 shows a prototypical training application that allows students to interact with this relationship diagram, before drilling down further into the selected pattern. From this figure you can see that the Composite pattern is one of the most interconnected, using five patterns (Decorator, Visitor, Iterator, Builder and Flyweight) and being used by three (Command, CoR and Interpreter).
The simplified class diagram for the prototype is shown in Figure 4.
Figure 4: The simplified Pattern class diagram
As you can see, each Pattern object maintains a list of the patterns that it uses in its PatternsUsed property, of type List<PatternConnector>. Each PatternConnector object maintains the text that describes the usage of the target pattern, such as “single instance” or “creating composites”, along with a reference to the target pattern itself.
As far as the application is concerned, it retrieves the patterns for display in the UI from a factory method as a List<Pattern>. So what happens if I want to use Serialized LOB to store the List<Pattern> from the application?
The beauty of the BinaryFormatter is that it will happily serialise an object graph ensuring that each object is only written once to the output stream, no matter how complex the graph.
The important thing to note with Listing 3 is that it is broadly speaking identical to Listing 1, with the serialisation code remaining unchanged when dealing with an object graph as opposed to a single object.
Object graphs and text-based serialisation
One thing to note is that, with the exception of NetDataContractSerializer, the text serialisation mechanisms tend to “blow the graph apart”, duplicating objects rather than preserving the references. For example, Listing 4 contains a snippet of the output from XamlWriter when serialising the Composite pattern object as shown in Figure 3.
You can clearly see in Listing 4 that the Pattern object for Flyweight has been duplicated into the output stream. Some of the textual serialisation mechanisms allow you to overcome this. For example, DataContractSerializer has a PreserveObjectReferences property that can be set to True to force it to emit reference links in the output XML. This is not enabled by default, for the simple reason that DataContractSerializer is designed to be used to create interoperable messages for WCF communication. Just remember that if you do decide to serialise your objects for persistent storage using a textual serialisation mechanism, then you will need to make sure that the option that you select preserves object references within the graph.
Hiding fields from serialisation
Something else that does crop up is the need to control whether a field is serialised or not. By default, the BinaryFormatter will attempt to serialise all private and public fields, which can be a problem if a field is transient in nature, or is of a non-serialisable type.
Let’s consider the situation where the Pattern type maintains a reference to the onscreen WPF control that is used to display the pattern to the user in the application in Figure 3 (although I’ll leave it up to you to consider whether maintaining a UI-technology specific reference in a simple type is a good idea!).
The Pattern class might look like this:
public class Pattern { private PatternDisplayControl uiControl; public PatternDisplayControl { ... } // rest of class elided for clarity ... }
Clearly, we shouldn’t attempt to store the UI control in the database when implementing Serialized LOB, as WPF controls are most definitely affiliated with a single thread in a single app domain. Therefore, you would mark the field with the NonSerializedAttribute, as shown below, and add code to the application to reconnect the Pattern object to a pertinent UI control as required:
public class Pattern
{
[NonSerialized]
private PatternDisplayControl
uiControl;
public PatternDisplayControl { ... }
// rest of class elided for clarity
...
}
Versioning
One of the great features of the Windows Workflow Foundation is its ability to manage long-running workflows by persisting them into a database when they’re idle. This enables a workflow to survive machine restarts; promotes scalability by allowing a farm of workflow servers to load and run workflows as needed; and allows the server to conserve resources by not holding an idle workflow in memory.
But what happens if the workflow type is modified between a workflow object being persisted and then being reloaded?
Fortunately, the .NET Framework’s serialisation engine is quite comfortable in handling versioning as long as you create your types so that they are versioning aware. So let’s look at how you can influence versioning in your application.
The default versioning behaviour
The first thing to be aware of is how the system works under the covers. Inside the bowels of the serialisation mechanism used by the BinaryFormatter is an internal type, ObjectReader, which is responsible for locating a type based on the type information contained in the stream.
Internally, ObjectReader uses either the Assembly.Load or Assembly.LoadWithPartialName methods to locate the assembly that contains the serialised Type. When using Assembly.Load, the entire assembly name will be used, including the version number; when using Assembly.LoadWithPartialName, the ObjectReader will either retrieve the exact matching version or it will use the assembly with the latest version number.
Now the default behaviour in .NET Framework v1.x was for ObjectReader to use Assembly.Load, which tended to cause quite a lot of developers to worry about using BinaryFormatter and serialisation in general, due to the fact that merely rebuilding an assembly, and thus changing its version number, appeared to be enough to stop de-serialisation dead in its tracks. In .NET Framework version 2.0 and beyond, however, the default behaviour is changed so that ObjectReader uses Assembly.LoadWithPartialName by default. This means that merely rebuilding an assembly will not prevent it from de-serialising existing data (although changing the type itself might).
You do have control of exactly how types should be resolved when de-serialising, no matter which version of the .NET Framework you’re using. The first thing that you can do is to set the AssemblyFormat property on the BinaryFormatter as follows:
BinaryFormatter bf =
new BinaryFormatter();
bf.AssemblyFormat =
FormatterAssemblyStyle.Simple;
This tells ObjectReader to use the simple name matching mechanism of Assembly.LoadWithPartialName. A more fine-grained alternative is to implement a serialisation binder.
Binding to types during de-serialisation
You can derive a class from the abstract SerializationBinder if you want complete control on how the BinaryFormatter resolves the Type to load when de-serialising. You’re required to override one method, BindToType, in which you determine which type should be used for the de-serialisation given an incoming assembly and type name.
Listing 5 shows an example of a custom SerializationBinder. What you’ll notice is that you can create any type resolution mechanism that you can dream up, including using table- or configuration-driven binding. You’ll also notice that you can completely change the type that is used in the de-serialisation process. In this example, the data serialised for Pattern objects will now be loaded into new and improved AdvancedPattern objects for version 2 of the application.
Of course, you have to tell the BinaryFormatter to use your new SerializationBinder, which involves just one extra line of code:
BinaryFormatter bf =
new BinaryFormatter();
bf.Binder = new V1ToV2Binder();
bf.Deserialize( ... );
Of course, this binding merely supports type resolution during de-serialisation. You still have to deal with issues such as fields that might have been added, removed or renamed between versions. Let’s look at how you do that now.
ISerializable
In version 1.x of the .NET Framework adding or removing fields caused major headaches with serialisation: to be accurate, the de-serialisation mechanism would simply throw exceptions if unexpected fields were found in the stream. Ultimately, the way to handle versioning in 1.x was to implement the ISerializable interface and manually insert values into, or read values from, the stream. What was somewhat quirky about implementing ISerializable is that not only did you have to implement the interface, but you also had to add a special constructor to your type.
Listing 6 shows an example of implementing ISerializable for the Pattern type. As you can see, implementing the interface allows you to control precisely what goes into the stream. Therefore, you can add a version number to the stream in the GetObjectData method, which is called during the serialisation process, and then read this value and perform version-specific processing during the de-serialisation process. One important thing to note about the extra constructor that has been added is that it can be marked as private: the CLR will locate it and use it regardless of its accessibility level when de-serialising the objects.
Implementing ISerializable is fiddly. You have to remember to call the base class’s GetObjectData method, assuming the base class itself implements ISerializable. However, it does enable very fine-grained control of the (de-)serialisation process. For example, the SerializationContext that is passed through lets you determine why (de-)serialisation is occurring, enabling you to determine whether you want to store/load fields or not depending on the target destination of the serialised object.
Fortunately, with .NET Framework 2.0 and above, you often don’t need to implement ISerializable, due to the inclusion of new Version Tolerant Serialization (VTS) features. This is a great boon when implementing the Serialized LOB pattern in .NET, as it makes it relatively straightforward to deal with versioning.
Version Tolerant Serialisation
The BinaryFormatter in .NET Framework 2.0 and beyond supports the following features:
- Tolerance of extra data in the stream. Simply put, when de-serialising the BinaryFormatter will ignore any additional field data that it finds in the stream. This enables an older version of an application to read data that was written by a newer version, but when implementing Serialized LOB you probably won’t ever need this unless you find yourself having to revert to a previous version of the application for some reason (such as finding a major security bug.)
- Support for optional fields. This feature enables you to add new fields to your types as you build new versions, and mark them as optional. Then, when de-serialising objects that were stored with a previous version of the application you can hook into the process and set a sensible default value for the field.
Optional fields are a great way to add support for new features in a type, but still allow the BinaryFormatter to de-serialise content from previous versions of the type. When adding a new field to the type you simply mark the field with the OptionalFieldAttribute. When you apply this attribute you should also specify a version number. For example, Listing 7 shows an updated version of the Pattern type with support for a new optional field containing a description of the pattern.
This will happily enable the BinaryFormatter to de-serialise a stream that was written with a previous version, and therefore doesn’t contain a value for the description. However, this might leave a de-serialised object in an inconsistent state (i.e. having a null value for description). To help overcome this problem, you can hook into the serialisation/de-serialisation process in up to four places, and without the trouble of implementing ISerializable.
The serialisation callback methods
The four points at which you can hook into the serialisation/de-serialisation process are shown in Figure 5.
Figure 5: Hooking into the serialisation/de-serialisation process
The actual mechanism for hooking into the (de-)serialisation process is to add methods that are decorated with the OnSerializingAttribute, OnSerializedAttribute, OnDeserializingAttribute and OnDeserializedAttribute. The methods must all have the same signature, which returns void and takes a single StreamingContext parameter, as shown in Listing 8, which highlights one of the main uses of these hook points: populating the default value of an optional field. By setting the default value in a method marked with OnDeserializingAttribute, you can be sure that if the value is present in the stream, it will be read into the field after the default value has been set, thus ensuring that the correct value is present.
Note that if you wanted to change the behaviour so that the default value of an optional field is based on the values of other fields, you would use a method decorated with OnDeserializedAttribute, as shown below, which sets the value of the description to include the title field if description has not already been set:
[OnDeserialized]
void OnDeserialised(
StreamingContext context)
{
if( description == null )
description = String.Format(
"No description for the {0} pattern",
title );
}
There are a few things to bear in mind when hooking the (de-)serialisation process. The first is that you have no access to the stream itself, just the StreamingContext. This means that you can see why the object is being serialised to/de-serialised, but you have no mechanism for manipulating the actual stream content; if you find yourself needing to do that, then you must implement ISerializable.
The second thing to note is that the method marked with OnDeserializing is called prior to the special constructor that you add when implementing ISerializable. This might seem strange, but it just goes to reinforce the fact that when the CLR is de-serialising objects, it allocates the storage and creates the object before calling the special constructor. Therefore, you can set the default values for fields before the data is read from the stream.
Another great thing to note is that the serialisation mechanism will call the relevant decorated methods on any base classes for your type without you having to do so explicitly. This is significantly different from the implementation of ISerializable, where you must call the base class’ implementation of GetObjectData if it is present. You must bear in mind, though, that each class can only have one method for each of the four hooks.
Finally, note that the method names are irrelevant: it is the presence of the attributes that is important.
Guidelines for supporting VTS
There are a few recommendations that you should follow when using VTS, as listed below:
- DON’T remove or rename a field from a type, as this can cause the de-serialisation mechanism to choke when reading old data. Similarly, you shouldn’t decorate an existing field with the NonSerializableAttribute for the same reason.
- DO Mark any new fields with the OptionalFieldAttribute, and hook into the de-serialisation process using the OnDeserializing/OnDeserialized callback mechanism to set a sensible default value. If removing the NonSerializableAttribute from a field, mark it with the OptionalFieldAttribute as it should be considered to be a new field as far as serialisation is concerned.
- AVOID implementing ISerializable unless you absolutely need the control over the stream.
- ALWAYS set the VersionAdded property for the OptionalFieldAttribute, even though it’s not currently used by the CLR.
- If you need to de-serialise into a different type, use a SerializationBinder-derived class to tell the ObjectReader which new type of object to use.
“The developer forgot to mark it as Serializable”
You can’t use the BinaryFormatter to serialise objects unless the type is marked with SerializableAttribute. This means that if you are trying to implement Serialized LOB, the type and its fields should be marked with SerializableAttribute. Of course, if you own the source code it’s relatively easy to change it and mark the types correctly. But what happens if you don’t own the source?
Well, the word “can’t” in the first sentence of this section is a little strong. There is a mechanism that enables you to add serialisation support to other types by supplying a serialisation surrogate.
As its name suggests, a serialisation surrogate will step in and perform serialisation on behalf of a type that is not marked with the SerializableAttribute. Listing 9 contains an implementation of a serialisation surrogate that could be used with the Pattern class shown in Figure 4.
What you will immediately realise is that the surrogate, being an external class to the original type, would normally only access the type through its public properties. This might make it difficult to serialise the entire state of the object. Other than that, implementing a surrogate is very similar to implementing ISerializable: you have to write code to write or read values from the stream.
Of course, you have to tell the BinaryFormatter that you actually want to use the surrogate. This is done via the SurrogateSelector property, as shown here:
BinaryFormatter bf = new BinaryFormatter(); StreamingContext sc = new StreamingContext( StreamingContextStates.All); SurrogateSelector selector = new SurrogateSelector(); selector.AddSurrogate(typeof(Pattern), sc, new PatternSurrogate()); bf.SurrogateSelector = selector;
Where possible, you should avoid using surrogate selectors, although they can help you out of a hole when working with third party types that are not marked with the SerializableAttribute.
Conclusion
This article has focused on looking at the technologies in the .NET Framework that enable you to implement the Serialized LOB pattern. This pattern is focused on how you persist objects (including object graphs), typically using an opaque binary format, within a single column of a database.
As you have seen, your first choice for implementing this pattern is the BinaryFormatter, although you can use one of the many textual serialisation mechanisms if you prefer to store the data as text.
The key thing to remember with .NET serialisation is that you should prepare your types to be serialised with the SerializableAttribute. You should also note that since .NET Framework 2.0, serialisation has been vastly improved making it relatively easy to implement Serialized LOB without the fear that a simple version number change might break all your existing persisted objects.
Dave Wheeler is a freelance consultant, who specialises in the various UI technologies within Microsoft .NET. He also writes and delivers courses for DevelopMentor, and is a regular speaker at Bearpark’s annual Software Architect and DevWeek conferences. You can contact him at [email protected].
Listing 1: Serialising a simple object
[Serializable] public class Pattern { private string title; public string Title { get { return title; } set { title = value; } } } // Snippet demonstrating // serialization of a simple object using System.IO; using System.Runtime.Serialization. Formatters; ... Pattern p = new Pattern() { Title = "Singleton" }; MemoryStream ms = new MemoryStream(); BinaryFormatter bf = new BinaryFormatter(); bf.Serialize( ms, p ); ...
Listing 2: Output of three different text-based serialisation mechanisms
<!-- DataContractSerializer --> <Pattern xmlns="http://schemas.datacontract.org/2004/07/GoFPatternRelationships" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <title>Singleton</title> </Pattern> <!-- NetDataContractSerializer --> <Pattern z:Id="1" z:Type="GoFPatternRelationships.Pattern" z:Assembly="GoFPatternRelationships, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null" xmlns="..." xmlns:i="..." xmlns:z="..."> <title z:Id="2">Singleton</title> </Pattern> // DataContractJsonSerializer {"title":"Singleton"} <!-- XamlWriter.Save --> <Pattern Title="Singleton" xmlns="clr-namespace: GoFPatternRelationships; assembly=GoFPatternRelationships" />
Listing 3: Serialising an object graph
// Snippet demonstrating
// serialization of an object graph
using System.IO;
using System.Runtime.Serialization.
Formatters;
...
List<Pattern> patternList =
PatternFactory.GetPatterns();
MemoryStream ms = new MemoryStream();
BinaryFormatter bf =
new BinaryFormatter();
bf.Serialize( ms, patternList );
...
Listing 4: XamlWriter exploding the object graph
<Pattern Title="Composite"> <Pattern.PatternsUsed> <UsesList Capacity="8"> <PatternConnector Description="sharing composites"> <PatternConnector.Pattern> <Pattern Title="Flyweight"> ... </Pattern> </PatternConnector.Pattern> </PatternConnector> <PatternConnector Description="adding responsibilities to objects"> <PatternConnector.Pattern> <Pattern Title="Decorator"> <Pattern.PatternsUsed> <UsesList Capacity="4"> <PatternConnector Description="changing skin vs. guts"> <PatternConnector.Pattern> <Pattern Title="Strategy"> <Pattern.PatternsUsed> <UsesList Capacity="4"> <PatternConnector Description="sharing strategies"> <PatternConnector.Pattern> <Pattern Title="Flyweight"> ... </Pattern> <!-- rest of graph elided --> ...
Listing 5: A custom SerializationBinder
public class V1ToV2Binder : SerializationBinder { public override Type BindToType(string assemblyName, string typeName) { if (typeName.EndsWith("Pattern")) { // Use a different type for Pattern typeName = typeName.Replace( "Pattern", "AdvancedPattern"); } // Change to the new version of // the assembly return Type.GetType( String.Format("{0}, {1}", typeName, assemblyName. Replace("1.0", "2.0" )); } }
Listing 6: Implementing versioning with ISerializable
[Serializable] public class Pattern : ISerializable { public Pattern() { PatternsUsed = new UsesList(); } private string title; public string Title { ... } private List<Pattern> connectorList; public List<Pattern> PatternsUsed { ... } public void GetObjectData( SerializationInfo info, StreamingContext context) { info.AddValue( "Version", 1 ); info.AddValue( "Title", title ); info.AddValue( "UsesList", connectorList ); } private Pattern( SerializationInfo info, StreamingContext context ) { int version = info.GetInt32( "Version" ); if( version >= 1 ) { title = info.GetString( "Title" ); connectorList = info.GetValue( "UsesList", typeof( List<Pattern> ) ) as List<Pattern>; } } }
Listing 7: Using an OptionalFieldAttribute
[Serializable]
public class Pattern
{
private string title;
public string Title { ... }
private List<Pattern> connectorList;
public List<Pattern> PatternsUsed
{ ... }
[OptionalField( VersionAdded=2 )]
private string description;
public string Description { ... }
}
Listing 8: Hooking into the de-serialisation process
[Serializable]
public class Pattern
{
[OptionalField( VersionAdded=2 )]
private string description;
public string Description { ... };
[OnDeserializing]
void OnDeserialising(
StreamingContext context)
{
description = "None available";
}
// rest of class elided for clarity
...
}
Listing 9: A serialisation surrogate
public class PatternSurrogate : ISerializationSurrogate { public void GetObjectData(object obj, SerializationInfo info, StreamingContext context) { Pattern p = obj as Pattern; info.AddValue("Version", 1); info.AddValue("Title", p.Title); info.AddValue("UsedList", p.PatternsUsed); } public object SetObjectData(object obj, SerializationInfo info, StreamingContext context, ISurrogateSelector selector) { Pattern p = obj as Pattern; int version = info.GetInt32("Version"); if (version >= 1) { p.Title = info.GetString("Title"); p.PatternsUsed = info.GetValue("UsedList", typeof( List<Pattern> ) ) as List<Pattern>; } return p; } }
Comments