Sophisticated generics

This article was originally published in VSJ, which is now part of Developer Fusion.

Back in the April issue of VSJ I explained some of the problems inherent in the strongly typed implementation of generics in C#. It all boiled down to the simple fact that a generic needs to be vague about the type it is designed to operate on – otherwise it wouldn’t be generic – but this vagueness makes it almost impossible to use operators and methods that belong to the generic type used in practice.

My objection was that in practical terms this shortcoming makes it difficult (but not impossible) to write a generic method that adds together two generic types. This left some readers with the idea that I am in some way not a fan of generics, and that they should be avoided. In fact, my position is quite the reverse. I’m constantly amazed at how the careful use of generics, especially the new generic classes in the .NET 2.0 framework, can make a very difficult algorithm very easy to implement, and enable us either to avoid a great deal of work or having to cut corners without them.

Here we are not concentrating on generics as a way of implementing type-general algorithms, but on implementing complex data structures. This is the most common way that generics enter day-to-day programming, and if this isn’t how you think about generics then you might like to consider the following example.

The task

The starting point of this project was a simple utility that polled a list of resources to see if they were available – in this specific case the resources were servers and being “available” was that they responded to a ping – but you can easily see that this is a general situation. The utility polls at a fixed interval, but only writes event data if a server’s status changes. The EventLog is used to record the status via a custom log file.

This is all very simple, but now we come to the problem of processing and analysing the log. The difficulty is that the analysis program doesn’t know the number of monitored servers it is going to encounter when it reads through the log. Each time it reads a record it has to check to see if it has seen that particular server before. If it has then it simply adds the data to the existing list of records for that server. If it hasn’t then it has to initialise a new list and start recording data for that server.

You can see that writing a program that scans the EventLog is a fairly standard sort of task that could be solved using an array of structs or something similar. The factors that make the task more difficult are that the number of arrays that you need, and the number of records each array has to store, isn’t known before you start the task – indeed it isn’t known until you finish the task.

There are a number of standard fixes to reduce this need for a dynamic data structures back to elementary data structures. For example, you can create a static data structure that is big with respect to the typical task and just refuse to process a log with more than so many servers or so many records. A slightly more flexible method is to pre-scan the log to discover the parameters needed to create a static data structure of the correct size. But pre-scanning doesn’t work if the log can grow while it is being analysed.

Generics to the rescue

There is no avoiding the fact that to do this job correctly we need a fairly sophisticated dynamic data structure, and this is where generics come in. Specifically it’s where the generic classes in System.Collections come in. The new generic Dictionary class provides the solution to the problem of dealing with a variable number of servers. The generic Dictionary uses a hash algorithm to store and retrieve values of any type based on a key of any type. The key has to be unique, and it’s easy to check that a key already does or does not exist. Thus the Dictionary is the ideal dynamic data structure to use to store an unknown number of items that have a unique key.

In this case we can use the server’s URL or IP address as a unique key. So we could create a Dictionary that can store the DateTime giving the time the server first came on using its address, as a string, as the key:

Dictionay<string,DateTime> records=
    new Dictionary<string,DateTime>();

We can add new item using:

records.Add(Address, timedata);

…where Address is a string variable giving the URL/IP address and timedata is a DateTime variable storing the time. This is efficient because a hash algorithm is used and storing and retrieving values scales linearly with the number of items stored.

Already we have implemented a very sophisticated data structure with a few lines of code. Without the help of the Dictionary class most programmers would be seeking a simple and dirty solution to the problem. But this still doesn’t quite do what is required. As the key has to be unique, we can’t store multiple on/off times for each server – just one key and one value. Of course the solution is that the value part of the key/value pair can itself be a dynamic data type. In this case what we need is a list of DateTime values that can expand as more items are read in. The List class is a generic version of the ArrayList class. So now our data structure is:

Dictionary<string, List<DateTime>>
    records = new Dictionary<string,
    List<DateTime>>();

This is beginning to look more complicated, but it’s easy to understand as long as you think of each key being associated with a list of DateTime objects.

There is one last complication – the recorded times come in two types – on times and off times. We need to store the on or off state along with each time. The solution to this is to extend the data structure one more time by defining a custom struct to hold all of the information and then define a List of structs. The struct in this case is just:

public struct ontime
{
    public DateTime Time;
    public Boolean On;
};

…and our dynamic data structure is:

Dictionary<string,List<ontime>>
    records = new
    Dictionary<string,List<ontime>>();

So now we have a Dictionary keyed on the server’s address with the value being a list of ontime structs.

If you have had to implement dynamic data structures of this sort in past projects without the help of suitable classes you will realise what a huge amount of infrastructure has just been created for you. It’s almost too good to be true!

In practice

Let’s see how we make use of this fairly sophisticated dynamic data structure. First we need the definition of the struct and the Dictionary:

public struct ontime
{
    public DateTime Time;
    public Boolean On;
};

Dictionary<string,List<ontime>>
    records = new
    Dictionary<string,List<ontime>>();

Reading records from the event log is very easy using an instance of the EventLog class. All we have to do is provide the name of the event log we want to read – “SiteStatus” in this case:

EventLog eventLog1 = new EventLog();
eventLog1.Log = “SiteStatus”;

Now we have access to a collection of EventLogEntry objects which we can enumerate with a foreach loop:

foreach(EventLogEntry entry in
    eventLog1.Entries)
{

The data about the server is packed as a comma separated string within the EventLogEntry’s Message string. The simplest way of processing this is to use the Split method to produce an array of strings:

   string[] data =
    	entry.Message.Split(‘,’);
    string Address = data[0];
    string Status = data[1];
    string RoundTripTime = data[2];

This raw data can now be used to load a struct with the data we actually want to work with:

   ontime timedata;
    timedata.On = (data[1] ==
    	“Success”);
    timedata.Time =
    	entry.TimeGenerated;

Now we are ready to store the struct in the Dictionary. There are two possible cases to consider depending on whether or not the server key is already in the Dictionary. If it isn’t then we have to add it and a suitable List structure as its corresponding value:

   if (!records.ContainsKey(Address))
    {
    	records.Add(Address, new
    		List<ontime>());
    	records[Address].Add(timedata);
    }

An alternative is just to use the Add method without testing using “ContainsKey” and catch the exception that is thrown when a duplicate key is encountered. Take care when using the Item property to add new key value pairs, because in this case a duplicate key simply overwrites any existing key value pairs. Also notice that we now have two Add methods associated with our data structure. The first, as in records.Add adds a new key value pair to the Dictionary, the second, as in records[Address].Add adds an ontime struct to the List. Finally if the server key is already in the dictionary we simply add the new struct to the List:

   else
    {
    	records[Address].Add(timedata);
    }
}

The program now successfully scans through the event log and builds the dynamic data structure as required. You can see an example of this in Figure 1 with just two server key value pairs and with the second key value expanded to show some of the stucts stored in the List.

Figure 1
Figure 1: The dynamic data structure in action

To see how the data structure makes things easier after it has been fully constructed let’s process it. The first thing to do is to make sure that the structs in each list are in order.

This is fairly easy because a List has a generic Sort method based on quicksort. The only problem is that it needs a rule for comparing ontime structs and this means writing an instance of the Comparison delegate. First, however, we need to step theough each of the servers in the Dictionary:

foreach (KeyValuePair<string,
    List<ontime>>
    keyval in records)
{

Notice that the items in the Dictionary are treated as single KeyValuePair objects as defined by the specific types. Now we can sort the List using a custom rule to compare two ontime structs:

keyval.Value.Sort(compareOntime);

The compareOntime method is very simple:

private static int compareOntime(
    ontime a, ontime b)
{
    return DateTime.Compare(
    	a.Time, b.Time);
}

All we have to do is use the Compare method of the DateTime fields. In general the comparison can be as complicated as you desire as long as it returns 1 if a>b, 0 if a=b and -1 if a<b.

Now we have the List sorted into order we can scan through it and calculate the total time each server has been on. First we enter the name of the server into a Textbox:

   textBox1.Text+=keyval.Key+”:”;

A foreach can be used to scan through the List but first we need some variables to store the results:

   TimeSpan TotalOnTime =
    	TimeSpan.Zero;
    Boolean OnState=false;
    DateTime FirstOn=new DateTime(0);
    DateTime FirstOff=new DateTime(0);
    foreach (ontime entry in
    	keyval.Value)
    {

The logic is simply to test for the first On event and take the time difference from the first Off event following it:

   	if (entry.On & !OnState)
    	{
    		FirstOn = entry.Time;
    		OnState = true;
    	}
    	if (!entry.On & OnState)
    	{
    		FirstOff = entry.Time;
    		TotalOnTime += FirstOff
    			- FirstOn;
    	}
    }
    TextBox1.Text +=
    	TotalOnTime.ToString() +
    	Environment.NewLine;
}

Processing the data is remarkably easy and we are not beset by the problems of not being able to use simple operations and methods on generic types as described in my earlier article. The reason is simply because we aren’t using operations on the generic types! The generics provide the dynamic data structure that we are using, but the actual data is stored in a struct of a type that is known and well defined at compile time.


Dr. Mike James’ programming career has spanned many languages, starting with Fortran. The author of Foundations of Programming, he has always been interested in the latest developments and the synergy between different languages.


Generics don’t add

The problem referred to in the previous article is that a simple generic method such as:

T Sum<T>(T a,T b)
{
    return a+b;
}

…doesn’t work. The compiler complains that it can’t add variables of type t. This is very reasonable but restricts the sort of algorithms you can express in a type independent way. The cause of the difficulty is that the compiler doesn’t know what type a and b are until the type parameters are supplied by the user. There are many ways around the problem but none are truly satisfactory (see the original article for details). This difficulty doesn’t mean that you can make use of generic classes and methods and use addition, or other appropriate operations, on classes and methods after they have been typed as illustrated by the example in this article.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” - Martin Fowler