LINQ to Objects for the .NET developer

Introduction

This article was originally published in VSJ, which is now part of Developer Fusion.

LINQ may look like SQL but this just a convenience so that LINQ to SQL can be used by people too lazy to learn new syntax. You need to think of LINQ as an attempt to bring “querying” type operations into the mainstream of the .NET language of your choice. The point is that LINQ isn’t really about database, this just happens to be a really good use of it. It’s a general purpose and extensible approach to working with structured data. In a more general setting you can think of it as yet another aspect of functional programming that is making its way into .NET. In this article I’m going to demonstrate how LINQ works at the deepest level. When you understand how LINQ is implemented you will be better able to use it, how to extend it and you can’t help but admire its simplicity.

A Query is basically an operation where you specify a subset of the data that you actually want to work with. In many ways you can think of “querying” as the point where software creation becomes complicated and the real world enters the design problem. Put simply – data is messy, usually not organised in a way that suits and the task is always more difficult and fragile than you could possibly imagine. LINQ can’t do anything about the inherent complexity of querying data, but it does deliver it in a uniform and integrated format.

The key idea is that LINQ is a set of classes and methods that will work with any class as a data source as long as it implements the IEnumerable interface. As, in a sense, this is the foundation on which the rest of LINQ is built, this is a good place to start. In fact it is probably a good idea to take one step further back and look at the whole idea of enumerators.

Implementing an enumerator

The basic idea of an enumerator is to supply each item in a collection of data items one-by-one – and usually in no specified order. In .NET an enumerator is a class that provides a number of methods in addition to the basic one-by-one enumeration of the items. To be specific an enumerator has to supply:

  • Reset – initialises the enumeration so that it starts over again
  • Current – returns the current item
  • MoveNext – updates the index to the next item

Of course this all implies that there is a numeric index to the current item which starts off set to –1 to indicate that it “points” before the start of the collection. Calling Current repeatedly is allowed, but if the index is invalid then you are supposed to throw an exception. MoveNext returns true if the result is a valid index and false if the resulting index isn’t pointing to a valid item. Put together these three methods make up the IEnumerator interface and any class that supports enumeration does so by implementing this interface.

You don’t have to use a separate class to implement IEnumerator, you can do the job in the same class that implements the inner workings of the data collection if that’s convenient. It is more usual to create the enumerator as a separate class and write a constructor that creates an instance of the enumerator ready to be use. To see the simplest possible example of an enumerator, let’s create everything in a single class. Our example data collection is also going to be a little strange in that no collection of data ever exists. Instead when the collection is instantiated the constructor is supplied with the size of the collection and from then on a random number generator is used each time a data item is required. This is clearly not very useful for anything other than testing hence the name of the class:

class TestCollection:IEnumerator

Generating random data also has the advantage that the example doesn’t use any of the existing collection data types which all supply enumerators and hence tend to confuse the issue. TestCollection doesn’t make use of anything prebuilt in the .NET framework to implement its enumerator. To make it work you need to add:

using System.Collections;

To get us started we need some private variables, something to hold the instance of the random number generator, something to store the size of the collection and, of course an index to the current position in the collection:

{
    private Random rnd;
    private int Size;
    private int loc = -1;

We need the constructor to set everything up ready for the collection to be enumerated:

public testcol(int s)
{
    rnd = new Random();
    Size = s;
}

Now we have to implement the methods of the IEnumerator interface. The reset method is simply:

void IEnumerator.Reset()
{
    this.loc = -1;
}

You don’t really need the “this” but it helps to emphasise the fact that the enumerator works with the instance.

The Current method takes the form of a read only property:

object IEnumerator.Current
{
    get
    {
    	if (this.loc > -1)
    		return this.rnd.Next(500);
    	else
    		return -1;
    	}
}

Notice that Current returns an object rather than an int. This is how it has to be as the interface defined Current in this way? More about this problem later. In principle we should test to make sure loc is sensible, i.e. actually indexes an element of the collection and throws an exception if it doesn’t. In this case we simply return –1. In most cases whatever is using the enumerator usually stops enumeration when the MoveNext method returns false to indicate that there is no next item:

bool IEnumerator.MoveNext()
{
    loc++;
    if (loc < this.Size) return true;
    else return false;
}

Now we have the complete enumerator and it’s very easy to see how it works, but we can’t as yet make use of it. The reason is that any class offering an enumerator has to implement the IEnumerable interface. This has just a single method GetEnumerator which returns the instance of the class that provides the methods of the IEnumerator. If you think about it for a moment it is obvious that IEnumerable has to be implemented by the data collection class that holds the items to be enumerated so we have to add it to the class definition:

class TestCollection:IEnumerator,IEnumerable
{

The single method that we have to implement is trivial as the instance of the TestCollection class concerned provides its own enumerators, i.e. it is the enumerator to be returned:

    IEnumerator IEnumerable.GetEnumerator()
    {
    	return this;
    }
}

Now we can write some code that makes use of our data collection with enumeration. The simplest thing to try out is a foreach loop:

TestCollection col = new TestCollection(5);
foreach (object o in col)
{
    MessageBox.Show(o.ToString());
}

Notice that the iterator o has to be defined as an object. We can use an int in the foreach loop as it supports implicit casting. For example:

foreach (int o in col)
{
    MessageBox.Show(o.ToString());
}

…works perfectly unless you happen to return something that can’t be cast to an int at runtime with the result that an exception is raised. If you want to do better then you need to change the enumeration interfaces to their generic forms, which is what we have to do anyway to use LINQ. Separating the enumerator Before moving on it is worth commenting on why we usually implement the enumerator as a separate class. Consider what happens if we try to use the current enumerator in a nested foreach loop. The same enumerator would be returned by GetEnumerator and hence the nested loops would interfere with each other. If you want to write nested loops or allow multiple enumerations to happen concurrently you need to implement the enumerator as a separate class and you need to create an instance of that class each time an enumerator is called for by GetEnumerator. As nesting of queries is very common we need to do the job properly before moving on to consider LINQ.

All we need to do is separate out the enumeration methods into a new class. This class has to keep track of where it is in the enumeration and it needs to keep track of which instance of the collection it is enumerating:

class TestCollectionEnnumerator :
    IEnumerator
{
    private TestCollection m_Instance;
    private int loc = -1;

If you create the enumeration class as an inner class of the data collection, i.e. TestCollection, it will have access to all of the data collection’s variables and methods and this makes it easier for the two classes to work together. The only new method we need is a constructor that initialises the enumerator with the current instance of the data collection:

public TestCollectionEnnumerator(
    TestCollection Instance)
{
    m_Instance = Instance;
}

The other enumeration methods have to now use m_Instance instead of “this” to make sure they work with the correct instance of the data:

void IEnumerator.Reset()
{
    loc = -1;
}

object IEnumerator.Current
{
    get
    {
    	if (loc > -1)
    		return m_Instance.rnd.Next(500);
    	else return -1;
    }
}

bool IEnumerator.MoveNext()
{
    loc++;
    if (loc < m_Instance.Size)
    	return true;
    else return false;
}

We also need to change the data collection class so that its GetEnumerator actually creates a new instance of the enumerator:

class TestCollection:IEnumerable
{
    private Random rnd;
    private int Size;
    IEnumerator
    	IEnumerable.GetEnumerator()
    {
    	return new
    		TestCollectionEnnumerator(this);
}

Notice how the use of “this” makes the final connection between the instance and any number of enumerators. Now our example code is slightly more complicated but you can use it within nested for each loops and, as we shall see, nested LINQ queries.

Generic enumeration

It is generally better to use a generic form of the interfaces so add to the start of the program:

using System.Collections.Generic;

The generic form of the IEnumerator interface inherits from the non-generic IEnumerator interface. Note that when an interface inherits from another interface, for example IA:IB then when you add IA to a class it’s exactly the same as writing :IA,IB and you have to implement the methods of IA and IB. The generic form of the IEnumerator interface defines a single generic version of Current – after all this is the only method that needs to use the data type. To complicate things a little it also inherits the Dispose method from IDisposable, but we can ignore this at the moment by adding a null implementation. The generic form of IEnumerable interface simply adds a generic form of GetEnumerator.

So to make our class use the generic Interfaces all we have to do is change its definition to:

class TestCollection:IEnumerable<int>

…and add to it a second generic GetEnumerator method:

IEnumerator<int>
    IEnumerable<int>.GetEnumerator()
{
    return new
    	TestCollectionEnnumerator(this);
}

The enumerator class also has to implement the generic interface and has to have two new methods added to it:

class TestCollectionEnnumerator :
    IEnumerator<int>
{
// ...rest of class definition

New methods:

int IEnumerator<int>.Current
{
    get
    {
    	if (loc > -1)
    		return m_Instance.rnd.Next(500);
    	else
    		return -1;
    }
}
void IDisposable.Dispose()
{
}

Notice that you have to keep the existing non-generic implementations and that the new generic methods are almost identical apart from the use of the int data type. Now you can write:

foreach (int o in col)
{
    MessageBox.Show(o.ToString());
}

…and if you try to cast to something that is unreasonable, e.g. string, then you will see a compile time rather than runtime error.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“In order to understand recursion, one must first understand recursion.”