Iterators, composites and visitors

This article was originally published in VSJ, which is now part of Developer Fusion.
As a developer, you’ll probably find yourself iterating through collections on a regular basis. You will also find that you are often constructing trees of objects, especially in a world dominated by XML. It will therefore come as no surprise to you that there are a number of design patterns closely related to iteration and trees: namely Iterator, Composite and Visitor.

In this article, you will see how easy it is to implement a custom iterator in C#; how you can use the Composite pattern to simplify code that handles hierarchical collections; and how the Visitor pattern can be used to add functionality to your types.

So let’s begin by looking at the Iterator pattern.

The Iterator pattern

The core concept of the Iterator pattern, as defined by the Gang of Four, is to encapsulate the iteration logic inside an object, such that it can iterate over an aggregate object (or collection) without revealing the internal implementation of the aggregate/collection.

To demonstrate this, we need an initial aggregate to work with. So for this section of the article, let’s imagine that we are working on a war game simulator and we want to model a platoon of infantry, which broadly speaking consists of an officer, a senior non-commissioned officer (NCO) and approximately thirty soldiers. The soldiers tend to be organised into sections (or squads), each of which is led by another NCO.

Figure 1 shows the initial class diagram to reflect this structure.

Figure 1
Figure 1: Structure of a platoon

With this basic code in mind, let’s think about how we might iterate through all the soldiers in a platoon.

Iterator interfaces in .NET

In .NET, an iterator will typically implement the IEnumerator or IEnumerator<T> interface, as shown in Figure 2.

Figure 2
Figure 2: The Iterator interfaces for .NET

IEnumerator has methods to move through the aggregate and to reset the iterator’s state, and a property to return the current item during iteration. The key to understanding Iterator pattern in .NET is that IEnumerator imposes no constraints on how you implement an iterator. For example, the PlatoonIterator shown below is perfectly valid, if a little inefficient.

public class PlatoonIterator :
	IEnumerator<Soldier> {
	private IEnumerator<Soldier> impl;
	public PlatoonIterator (
		Platoon platoon ) {
		CreateEnumerator( platoon );
	}

	private void CreateEnumerator(
		Platoon platoon) {
		List<Soldier> soldiers =
			new List<Soldier>();
		soldiers.Add(
			platoon.CommandingOfficer );
		soldiers.Add( platoon.Sergeant );
		foreach( Section s in
			platoon.Sections ) {
		soldiers.Add( s.SectionCommander );
			soldiers.AddRange( s.Soldiers );
		}
		impl = soldiers.GetEnumerator();
	}

	public Soldier Current {
		get { return impl.Current; }
	}
	public void Dispose() {
		impl.Dispose();
	}
	object IEnumerator.Current {
		get { return Current; }
	}
	public bool MoveNext() {
		return impl.MoveNext();
	}
	public void Reset() {
		impl.Reset();
	}
}
This implementation creates its own List<Soldier>, populates it with all of the Soldier objects from the Platoon, and then simply uses the fact that List<T> already provides iteration support to implement the IEnumerator functionality.

“for each”: the client perspective

The principal .NET languages all provide some form of a “for each” syntax to enable client code to easily consume an IEnumerator implementation; in fact the above listing used it internally in the PlatoonIterator to process through the sections in the platoon. Therefore, it might be good if you could write the following (although, as you will see, there are better ways to code this type of functionality):
void DisplayAllSoldiers(Platoon platoon)
	{
	foreach (Soldier s in new
		PlatoonIterator( platoon ) ) {
		Console.WriteLine("{1} {0}",
			s.Name, s.Rank);
	}
}
Sadly, this code won’t compile as things currently stand, for the simple reason that the PlatoonIterator is not considered to be enumerable by the C# compiler.

IEnumerable: supporting “for each”

To be enumerable, a type must either:
  • expose a public method named GetEnumerator that returns an IEnumerator reference, or
  • implement the IEnumerable interface, which defines one method: GetEnumerator
Implementing this for the PlatoonIterator is trivial, as shown below:
public class PlatoonIterator :
		IEnumerable<Soldier>,
			IEnumerator<Soldier>
{
	public IEnumerator<Soldier>
		GetEnumerator()
	{
		return this;
	}
IEnumerator IEnumerable.GetEnumerator()
	{
		return this;
	}

	// remainder of code as before
	...
}
With this code in place, the compiler will now be happy with the following line of code:
foreach( Soldier s in new
	PlatoonIterator( platoon ) { ... }
Broadly speaking, the compiler will translate the above code into the following:
PlatoonIterator iterator =
	new PlatoonIterator( platoon );
IEnumerator enumerator =
	iterator.GetEnumerator();
Soldier s;
while( enumerator.MoveNext() )
{
	s = enumerator.Current;
	// body of foreach loop
	...
}
The interesting thing about this code is that it reveals that an iterator will initially be positioned before the first element, and that a call to MoveNext() will be performed prior to calling Current. This makes sense, as potentially the aggregate being enumerated might contain no objects.

Internal or external iterators?

PlatoonIterator is an external iterator, in that the iterator is in a separate class from the aggregate type over which it can iterate. This is perfectly fine and sensible, but it is equally valid (and in many cases preferable) for the aggregate to control its own iteration by internalising an iterator. You can achieve this by either exposing properties or methods that the client calls, or by directly implementing IEnumerable on the aggregate class.

Let’s internalise iteration over the Platoon now, but this time we’ll make the code even easier by using C# 2.0’s yield return syntax to make the code even simpler. The following code shows you just how easy this can be.

public class Platoon {
// other code elided for clarity
	public IEnumerable<Soldier>
		GetSoldiers() {
		yield return CommandingOfficer;
		yield return Sergeant;

foreach (Section section in Sections) {
	yield return section.SectionCommander;
foreach (Soldier s in section.Soldiers) {
				yield return s;
			}
		}
	}
}
The idea behind yield return is that it allows you to expose an iterator without having to write complex state-management code. The GetSoldiers() method, which must return IEnumerable or IEnumerable<T> in order to be able to use the yield return technique, simply returns all of the Soldier objects in the Platoon by yielding up each in turn.

You’ll also note that the GetSoldiers() method doesn’t return anything, even though the return type is defined as IEnumerable<Soldier>. The reason for this is that the C# compiler generates a full implementation of an iterator for you, as shown in Figure 3, which is instantiated and used whenever the GetSoldiers() method is called.

Figure 3
Figure 3: The disassembled code

The client can now iterate through the Platoon using standard “for each” syntax, as shown below:

void DisplayAllSoldiers(Platoon p)
{
	foreach (Soldier s in p.GetSoldiers())
	{
		Debug.WriteLine("{1} {0}",
				s.Name, s.Rank);
	}
}
Implementing IEnumerator and IEnumerator<T> using yield return is considerably easier than having to write all of the methods yourself, and makes writing iterators to work over composite trees much simpler, as you’ll see in a few moments.

The iteration gotcha

There are significant implementation differences between the two code listings even ignoring the use of yield return. Specifically, the PlatoonIterator works by caching its own list containing the references to all of the Soldier objects, whereas the enumerator in the second listing does not. This means that if you were to alter the number of Soldiers in a platoon, or even change its commanding officer, the iterator in the first listing wouldn’t notice those changes whereas that in the second one might, depending on when you made the change.

Depending on the nature of the iterator, it might consider this to be intolerable and consequently throw an exception at you.

Therefore, the most important rule of iteration in .NET is that you cannot alter the aggregate over which you iterating, nor should you alter the objects that are returned. The safe approach is to consider iteration to be a read-only process.

Selective iteration

There’s just one final aspect of iteration that I want to mention before we move onto the Composite pattern: selective iteration. Adding a simple predicate to the GetSoldiers() method makes it easy to return only Soldier objects with a specific rank, as shown below:
public delegate bool Predicate<T>( T t );

public class Platoon {
	public IEnumerable<Soldier>
	GetSoldiers(Predicate<Soldier> test )
	{
		if( test( CommandingOfficer ) )
			yield return CommandingOfficer;
		if( test( Sergeant ) )
			yield return Sergeant;
foreach (Section section in Sections) {
	if( test( section.SectionCommander ) )
	yield return section.SectionCommander;
foreach (Soldier s in section.Soldiers) {
				if( test( s ) )
					yield return s;
			}
		}
	}
	// rest of class elided
}

// Client code
void DisplayAllOfficers(Platoon p)
{
	foreach (Soldier s in p.GetSoldiers(
	(s) => s.Rank >=
		Rank.CommissionedOfficer ))
	{
		Debug.WriteLine("{1} {0}",
			s.Name, s.Rank);
	}
}
You will undoubtedly have noticed the use of a lambda expression in the client code here. This is intentional, as I wanted to reinforce the idea that LINQ, and especially LINQ to Objects, almost entirely relies on the Iterator pattern. The Enumerable<T> class contains nested Iterator types that use selective iteration to perform their work, which largely obviates the need for you to write your own selective iterator.

The Composite pattern

The military composes larger formations out of smaller ones. For example, there are typically three sections in a platoon, three or four platoons in a company, three or four companies in a battalion and so on. Of course, a formation might also contain “loose” soldiers, such as the CO or the company signaller. Figure 4 reflects this structure.

Figure 4
Figure 4: A composite fighting force

However, you want to create a formation that doesn’t conform to the normal force structure, such as a raiding party or rapid deployment task force. Consequently, to make the simulation more flexible, it might be preferable to remove specific aggregate types such as Platoon and Section, and replace them with a more flexible tree-like structure. Figure 4 certainly implies that trees would work well as a representation for a formation. This makes for an interesting situation. For example, you might want to write client code to iterate across the tree and perform a common operation, such as displaying the readiness state of all of the elements. In other words, there are going to be occasions when it makes sense to treat all of the items in the hierarchy the same. In fact, merely iterating through such a hierarchy is an example where it might be nice to treat the leaves (individual Soldier elements) and the nodes (the Formation elements) in the tree identically. The Composite pattern encapsulates this notion of composing objects into trees, and then treating the leaves and nodes through a common interface, or as a common type.

Implementing the Composite pattern

The start point for implementing Composite is to ensure that all of the objects that will be composed into a tree must derive from the same base class (or implement a common interface), as shown in Figure 5. This will allow the client to interact with all items in the composite through a base class reference, thus interacting with them identically.

Figure 5
Figure 5: Architecting types to support the Composite pattern

In this example, the base CombatElement type defines two categories of operations: those that relate to composing or iterating items in the tree, and those that reflect the military nature of the element. Add(), Remove() and GetElements() all fall into the first category, whereas Move(), Fight() and the Name property are in the second. I’ve chosen to do this in a single abstract base class for brevity in this article, but you could just as easily separate the methods out into interfaces if you prefer.

You can see the implementation of CombatElement below:

public abstract class CombatElement
{
	public virtual void Add(
		CombatElement element) {
	throw new NotImplementedException();
	}

	public virtual void
		Remove(CombatElement element) {
	throw new NotImplementedException();
	}

	public virtual
		IEnumerable<CombatElement>
			GetElements() {
	throw new NotImplementedException();
	}

	public string Name { get; set; }

	public abstract void Fight( ... );
	}

	public abstract void Move( ... );
	}
}
You might be wondering why I chose to implement the Add(), Remove() and GetElements() methods by throwing exceptions, but made the Fight() and Move() methods abstract. The answer is simple. I didn’t want all deriving classes to have to implement Add(), Remove() and GetElements() unless they are “node” types, such as Formation. “Leaf” types, such as Soldier, can safely ignore them, but any attempt by client code to call them will correctly yield an exception.

Fight() and Move(), on the other hand, need to be implemented by all deriving types and are thus marked as abstract.

The implementation for the two derived classes, Soldier and Formation, is as follows:

public class Soldier : CombatElement
{
	public Rank Rank { get; set; }

	public override void Fight( ... ) {
		// implementation elided
	}

	public override void Move( ... ) {
		// implementation elided
	}

	public override
		IEnumerable<CombatElement>
			GetElements() {
	 yield return this;
	}
}

public class Formation : CombatElement {
	private List<CombatElement> _elements;

	public Formation() {
	_elements = new List<CombatElement>();
	}

	public override void Add(
		CombatElement element) {
		_elements.Add(element);
	}

	public override void
		Remove(CombatElement element) {
		_elements.Remove(element);
	}

	public override void Fight( ... ) {
		// implementation elided
	}

	public override void Move( ... ) {
		// implementation elided
	}

	public override
		IEnumerable<CombatElement>
			GetElements() {
	// yield up this current object first
		yield return this;

	// iterate through all child elements
		foreach (CombatElement fe in
			_elements) {
// + iterate through each of its elements
			foreach (CombatElement feInner
					in fe.GetElements())
				yield return feInner;
		}
	}
}
As you can see, the Formation class implements its node-like behaviour by storing all of its children in an internal collection and overriding the Add() and Remove() methods to insert items into, and remove them from, this collection. The Soldier class, on the other hand, simply ignores these methods.

Iterating through the Composite

The GetElements() method defined on CombatElement provides support for iterating through the Composite. In the Soldier’s case, this is a trivial implementation that simply yields up the individual Soldier object. The Formation class has a little more work to do, namely:
  • It must yield up itself first
  • It then iterates through its collection of child elements, yielding up each of those children’s children. Note that it is not enough to simply call each element’s GetElements() method. You actually have to enumerate through the results and yield each result up to the caller.
Having implemented iteration across the entire tree (and you just have to love how easy this is with yield return), you can then write simplified code along these lines:
public void TestComposite() {
	CombatElement company =
		CreateCompany();
	DisplayAllSoldiers(company);
}

private void DisplayAllSoldiers(
		CombatElement CombatElement)
{
	foreach (CombatElement fe in
		CombatElement.GetElements() )
	{
		Debug.WriteLine(fe.Name);
	}
}

CombatElement CreateCompany() {
	CombatElement fe = new Formation()
		{ Name = "Alpha Company" } ;

	fe.Add(new Soldier() { Name = "Joe",
		Rank = Rank.Captain });
	fe.Add( new Soldier() { Name = "Jack",
		Rank = Rank.WarrantOfficer } );

	CombatElement platoon =
		new Formation();
	platoon.Name = "1st Platoon";
	fe.Add(platoon);
	platoon.Add(
		new Soldier() {
			Name = "Adam", Rank =
				Rank.Lieutenant });
	platoon.Add(
		new Soldier() {
			Name = "Arthur", Rank =
				Rank.Sergeant });

	CombatElement section =
		new Formation();
	section.Name = "1st Section";
	platoon.Add(section);
	section.Add(
		new Soldier() {
			Name = "Brian", Rank =
				Rank.Corporal });
	section.Add(
		new Soldier() {
	Name = "Bill", Rank = Rank.Private });

	...
	return fe;
}
You can now see that iterating through the composite and calling the appropriate methods or properties on each element has become very simple. However, there are a couple of downsides, namely:
  • The client can only interact with items in the composite via the base class/common interface members.
  • Attempting to call one of those members might yield an exception (e.g. attempting to add a child element to a Soldier.)
  • The client might need to check the type of, and cast returned elements, in order to code around points 1 and 2 safely.
You should therefore only consider using Composite where building trees is a significant part of the objects’ usage pattern. The key benefit, though, is that performing a common operation on a Composite, or a node or leaf within it, is massively simplified.

Now that we can iterate through our composite, let’s look at the final pattern: Visitor.

The Visitor pattern

Building composites is fun, but they can cause another problem: how do you add functionality to the elements in the composite over time? Specifically, how do you add functionality that varies by the concrete type of each element?

For example, let’s imagine that a new requirement appears in the application: the user wants to be able to display a formatted label for each element. For a Soldier object, this consists of the rank and the name of the soldier, for example “Captain John Williams”. For a Formation, however, it might be comprised of the entire unit designation, as in the following example: “1st Section / 2nd Platoon / Alpha Company”.

Now, as an experienced developer, you’re doubtless thinking that the solution would be to add a new virtual member to the base CombatElement class to support this requirement. However, this poses the usual problems of changing a base class: potential naming conflicts with members that already exist on the derived types; extended testing, and so on. This approach also precludes a client from adding functionality to the elements unless it has access to the base element source.

You’ll face these issues every time a new requirement comes in, and it is the Visitor pattern helps to mitigate this issue. So how does it work? Visitor is remarkably simple to implement. You start by defining an interface that contains a method for each concrete type that exists in the Composite. Thus, for the military simulation, this would look as follows:

public interface IVisitor {
	void VisitSoldier( Soldier soldier );
	void VisitFormation( Formation
		formation );
}
Next, you define an abstract method on the base element class that accepts a Visitor, as shown below:
public abstract class CombatElement {
	public abstract void Accept(
		IVisitor visitor );
	// rest of class elided
}
You would now implement the Accept() method on the concrete types. You have a number of choices for this, but one of the more common is to internalise any iteration and call Accept() on all child elements. For example, the implementation that I chose looks as follows:
public class Soldier : CombatElement {
	public override void Accept(
		IVisitor visitor ) {
		visitor.VisitSoldier( this );
	}
	// rest of class elided
}

public class Formation : CombatElement {
	public override void Accept(
		IVisitor visitor) {
		visitor.VisitFormation(this);

	// internalise iteration to simplify
	// client code
	foreach (CombatElement fe in Elements)
		fe.Accept(visitor);
}
That’s it. You can see the basic structure of the Visitor pattern in Figure 6.

Figure 6
Figure 6: The Visitor pattern

The client can now define new, type-specific functionality for each of the different CombatElement types by simply creating a new class that implements the IVisitor interface:

public class DumpTreeVisitor : IVisitor {
	public void VisitSoldier(Soldier s) {
		Debug.WriteLine( s.Rank + " " +
			s.Name );
	}

	public void VisitFormation(
		Formation f) {
		StringBuilder sb =
			new StringBuilder();
		sb.Append( f.Name );

		CombatElement fe = f;

		// Walk up the tree to find the
		// parent formations
		while( fe.Parent != null ) {
			sb.Append( " / " );
			fe = fe.Parent;
			sb.Append(fe.Name);
		}
		Debug.WriteLine( sb.ToString() );
	}
}
With this code in place, all the client needs to do to dump the entire tree is to write the following code:
...
CombatElement root = CreateCompany();
DumpTreeVisitor visitor =
	new DumpTreeVisitor();
root.Accept( visitor );
Adding further operations to the elements, such as a Retreat() method (technically known as an advance to the rear), would now simply be a matter of defining another visitor, for example:
public class RetreatVisitor : IVisitor {
public void VisitSoldier( Soldier s ) {
		// Code to run away
		...
	}
	public void VisitFormation(
		Formation f ) {
		// Code for an organised fighting
		// disengagement
		...
	}
}
Unsurprisingly, using this new functionality is as simple as writing:
...
CombatElement root = CreateCompany();
					// From Listing 6
RetreatVisitor visitor =
	new RetreatVisitor();
root.Accept( visitor );
The ease with which you can add functionality to the types used within the composite truly reflects the power of the Visitor pattern.

Double dispatch

The Visitor pattern relies on a technique known as “double dispatch”. Let’s take a quick look at this now.

A traditional virtual method, such as ToString(), works using single dispatch. Simply put, the method that is invoked depends on just one thing: the concrete type on which ToString() is being called. With the Visitor pattern, the method that is invoked ultimately depends on two things, namely:

  • the concrete type on which Accept() is being invoked, and
  • the concrete type that is implementing the IVisitor interface
Figure 7 demonstrates double dispatch, as utilised by Visitor. Note how the specific block of code that is ultimately called depends on both the actual type of the element and the type of the visitor.

Figure 7
Figure 7: Double dispatch in action

Issues with Visitor

You need to think about a number of things when you use Visitor. Obviously, the first limitation is that Visitor objects themselves can only access the public members of the elements.

More significantly, adding a new type of CombatElement might well require you to modify the IVisitor interface, and thus modify (and test) all of your visitor implementations.

You also need to consider how you implement the Accept() method. I chose to internalise the iteration within the Formation class so that it calls Accept() on each of its children. This is by no means the only option. Another alternative would be to externalise the iteration logic into a separate class, as shown below:

public static class AcceptHelper {
	public static void AcceptAll(
CombatElement root, IVisitor visitor ) {
		foreach( CombatElement ce in
			root.GetElements() ) {
			ce.Accept( visitor );
		}
	}
}
What you should avoid doing, however, is having the Visitor perform the iteration itself, as you will find yourself repeating that code in every Visitor you create.

Conclusion

This article has examined three design patterns. Iterator receives excellent support in .NET through the IEnumerator interface, and the combination of IEnumerable, “for each” and C#’s yield syntax. There is no doubt that every .NET developer utilises this pattern every day. Composite pattern enables you to simplify code that works with trees of objects, by allowing you to treat a leaf and a node the same. This works well with naturally tree-like structures, so you will find an example of this pattern in types such as the XmlNode that is used by XmlReader. Finally, we looked at the Visitor pattern. Although less commonly used than the other patterns, it is a powerful tool when used in conjunction with Composite and Iterator. The real benefit is that it enables you to extend the functionality of a type without having to alter its base class, yet get an effect that is very similar to overriding virtual methods.


Dave Wheeler is a freelance consultant, who specialises in the various UI technologies within Microsoft .NET. He also writes and delivers courses for DevelopMentor, including their new Essential Silverlight 2 course.

Dave is presenting a number of sessions at the DevWeek 2009 conference at the end of March, including some on the subject of patterns.

You might also like...

Comments

About the author

Dave Wheeler United Kingdom

Dave Wheeler is a freelance instructor and consultant who specialises in .NET application development. He’s a moderator on Microsoft’s ASP.NET and Silverlight forums and is a regular speaker at ...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone” - Bjarne Stroustrup