Real world refactoring

This article was originally published in VSJ, which is now part of Developer Fusion.
Martin Fowler coined the term refactoring in his classic book by the same title. In the book, he defines refactoring patterns for how to perform changes to code to improve it. However, he doesn’t fully address why and when you might want to embark on refactoring missions. That’s the realm of this article. First, I discuss the building block concept of composed method, then I show an example situation that warrants refactoring.

Building Block: Composed Method

The concept of composed method comes from an unexpected place: the book Smalltalk Best Practice Patterns by Kent Beck. I can hear you saying to yourself “what in the world does Smalltalk have to do with the Enterprise Application I’m building?” Well, it turns out that it has a lot to do with it. The Smalltalkers were the first developers to utilise object-oriented programming for real, and they figured out a lot of cool stuff. In fact, lots of the ideas that appear as revelations were already figured out by the Smalltalkers in the early 1980s.

Composed method is one of those core concepts that we utilise too little today. Kent Beck defines composed method as the following:

Divide your programs into methods that perform one identifiable task. Keep all of the operations in a method at the same level of abstraction. This will naturally result in programs with many small methods, each a few lines long.

Composed method has lots of benefits:

  • Small methods are easier to test.
  • Building large public methods as a grouping of much smaller private methods allows you to create “outlines” of behavior, where each coarse grained outline item is a private method that performs the work. Your public methods become much more readable because they provide a high-level overview of the work performed.
  • Discovering reusable code becomes easier. If you have a method that performs 5 discrete tasks, the chances of another problem needing those exact same tasks in that exact order are small. However, if you have 5 small methods, you can call them in a unique way because each is isolated from the others.
  • Debugging is easier if you have methods composed of many smaller methods. You never find yourself slogging through hundreds of lines of code looking for a bug. Small methods isolate bugs into a few lines of code.
  • Having lots of small methods discourages the rampant use of local variables within large methods. It therefore forces you to think about the state of your object more clearly, creating fewer accidental coupling points.
Let’s look at an example of refactoring towards composed method. Consider this mundane method:
private void performUpdate()
{
	DataRow[] updateRows = dsProducts.
		Tables["product"].Select("", "",
		DataViewRowState.ModifiedCurrent);
// process each row for updating
	foreach (DataRow row in updateRows)
	{
		Product p = new Product();
p.Name = row["ProductName"].ToString();
		p.Discontinued = Convert.
			ToBoolean(row["Discounted"]);
		p.QuantityPerUnit =
			row["QuantitPerUnit"].ToString();
		p.UnitPrice =
			Convert.ToInt32(row["UnitPrice"]);
		p.UnitsInStock =
Convert.ToInt32(row["UnitsInStock"]);
// check values for validity
		IList updateErrors = p.validate();
		if (updateErrors.Count != 0)
		{
			throw new
				UpdateException("Upate error");
		}
	}
// resolve updates to adapter
	daProduct.Update(dsProducts.
		Tables["products"]);
}
This code represents a very simple method to update information in a DataSet, presumably in response to a button click or other similar invocation. While this is a short method, it violates the composed method style of coding mentioned above. First, it has several sections, delineated with comments. Comments that serve no other purpose than to create sections with a method are a code smell: it’s a cry for help to make those actual methods. Second, it has more code in it that I would like to see in a single method. For my projects, my rule of thumb states that no method should be longer than about 8 lines of code, and this one clearly violates that. Third, and smelliest, this code does more than one thing. Methods should do one thing and one thing only. Let’s take a pass at refactoring this code to see if we can make it better.
void performUpdate()
{
	foreach (DataRow row in UpdatedRows)
	{
		p = populateProductFromRow(row);
		validateProduct(p);
	}
	daProduct.Update(dsProducts.
		Tables["products"]);
}
private DataRow[] UpdatedRows
{
	get
	{
		return dsProducts.Tables[
			"product"].Select("", "",
			DataViewRowState.ModifiedCurrent);
	}
}
private void validateProduct(Product p)
{
	IList updateErrors = p.validate();
	if (updateErrors.Count != 0)
	{
		throw new UpdateException(
			"Upate error");
	}
}
private Product populateProductFromRow(
		DataRow row)
{
	Product p = new Product();
	p.Name = row["ProductName"].
		ToString();
	p.Discontinued = Convert.ToBoolean(
		row["Discounted"]);
	p.QuantityPerUnit =
		row["QuantitPerUnit"].ToString();
	p.UnitPrice = Convert.ToInt32(
		row["UnitPrice"]);
	p.UnitsInStock = Convert.ToInt32(
		row["UnitsInStock"]);
	return p;
}
This version refactors the original into four different methods. The performUpdate method now reads like an outline of the steps performed. Not coincidentally, the new methods match the comments from the previous version pretty closely. The UpdatedRows property now returns exactly the view of the tables from the dataset, allowing iteration over the data rows. The validateProduct method handles the validation chores, and finally the populateProductFromRow method returns a fully instantiated Product object, extracted from the row in the dataset.

Using composed method, I’ve enhanced the readability of the code. If that’s all the benefit, it might not seem worth the effort to perform the refactoring. However, now that I’ve got the code broken up into little pieces, I can see what assets I actually have. It turns out that I have a few pieces of reusable code that I didn’t realise I had before. Notice for example how much of the code deals with pretty generic database stuff. In fact, only a few places exist where I specify a particular table name or other value. That suggests that I have reusable code with a few replaceable values. Let’s see how aggressively I can refactor methods from this form class to a parent form class, making the code more general as I go.

After a couple of rounds of refactoring, I end up with these two classes. The first, DbFormBase, is a class used as the super class of any form that needs to be able to perform updates:

public abstract class DbFormBase : System.Windows.Forms.Form
{
	protected void performUpdate()
	{
		foreach (DataRow row in UpdatedRows)
		{
			IEntity e = populateEntityFromRow(row);
			validateEntity(e);
		}
		DataAdpater.Update(EntityDataSet.Tables[TableNam]);
	}
	protected abstract DataRow[] UpdatedRows
	{
		get;
	}
	protected abstract SqlDataAdapter DataAdpater
	{
		get;
	}
	protected abstract DataSet EntityDataSet
	{	
		get;
	}
	protected abstract String TableName
	{
		get;
	}
	protected abstract IEntity populateEntityFromRow(DataRow row);
	private void validateEntity(IEntity e)
	{
		IList updateErrors = e.validate();
		if (updateErrors.Count != 0)
		{
			throw new UpdateException("Upate error");
		}
	}
}
In this form class, I create several abstract properties to force child classes to give me specific details about fields that tie the child to a particular dataset and adapter. Specifically, if the child class can tell me what rows have been updated, the DataAdapter, the DataSet, and the TableName, all the rest of the code is boilerplate. To make the entity type itself more generic, I created an interface named IEntity, which contains two methods: validateEntity, which is responsible for validations, and populateEntityFromRow, which takes care of creating an object of the specified type from a DataRow. By allowing Product to implement this interface, I can generically perform the validation in the validateEntity method. Now, even the performUpdate method appears in the base class. All that’s left in my child class (the original form) pertains directly to the subject of that form, namely product information:
protected override SqlDataAdapter
	DataAdpater {
	get
	{
		return daProduct;
	}
}
protected override DataSet EntityDataSet
{
	get
	{
	return dsProducts;
	}
}
protected override String TableName
{
	get
	{
		return "products"
	}
}
protected override DataRow[] UpdatedRows
{
	get
	{
		return dsProducts.Tables[
			"products"].Select("", "",
			DataViewRowState.ModifiedCurrent);
	}
}
protected override IEntity
	populateEntityFromRow(DataRow row)
{
	Product p = new Product();
	p.Name = row["ProductName"].
		ToString();
	p.Discontinued = Convert.ToBoolean(
		row["Discontinued"]);
	p.QuantityPerUnit =
		row["QuantityPerUnit"].ToString();
	p.UnitPrice = Convert.ToInt32(
		row["UnitPrice"]);
	p.UnitsInStock = Convert.ToInt32(
		row["UnitsInStock"]);
	return p;
}
By refactoring the original code using composed method, the small building blocks of reusable code fell out. That allowed me to further refactor the code to extract the generic parts, leaving abstract placeholders for specific information, to create the beginnings of a small persistence framework. Now any time I need a form that has persistence, I can subclass the DbFormBase class, add the required methods imposed via the abstract methods and properties from both DbFormBase and IEntity, and I have update capabilities.

Discovering reusable assets you didn’t know you had is one of the surprising benefits of composed method. Grady Booch, one of the pioneers of object-oriented thinking, says that you know you’re making progress when you start removing more code than you add, because you discover reusable assets that previously rewrote over and over. Targeted refactoring leads you towards reusable assets, improving the quality of your code by making sure you have less of it.

Structural Duplication

Another target of refactoring on real projects is structural duplication. In the book The Pragmatic Programmer, Dave Thomas and Andy Hunt define the DRY principle: Don’t Repeat Yourself. This is a surprisingly hard thing to avoid because duplication sneaks up in all sorts of places in projects. In fact, I devote an entire chapter in my book The Productive Programmer to examples of how to find and eliminate duplication. One particular aspect of duplication is structural, and that’s what I’m attacking with this refactoring example.

Here’s the problem. Let’s say you have an Employee class that defines some simple properties: name, salary, and hire year.

public class Employee
{
	public string Name { get; set; }
	public int HireYear { get; set; }
	public double Salary { get; set; }
}
Furthermore, one of the requirements of your application is the ability to sort on any one of the properties. The simple way to do this uses the IComparer interface built into .NET. Here are a couple of IComparer implementations, for name and salary:
public class NameComparer :
	IComparer<Employee>
{
	public int Compare(Employee x,
		Employee y)
	{
		return x.Name.CompareTo(y.Name);
	}
}
public class SlryComparer :
	IComparer<Employee>
{
	public int Compare(Employee x,
		Employee y)
	{
		return (int)(x.Salary - y.Salary);
	}
}
Nothing terribly complex yet. But duplication has snuck in the back door: look at Figure 1. It has the two methods overlaid on top of one another. The details of the comparisons are different, so you can’t distinguish them. But notice how much is exactly the same between the two classes. One of my colleagues Pat Farley came up with the perfect description of this: the code has the same whitespace, but different values.

Figure 1
Figure 1: Same whitespace, different values

How do you fix this? The easiest way entails using reflection to create a generic comparer for all possible properties of your class, as shown in the GenericComparer class:

public class GenericEmployeeComparer :
	IComparer<Employee>
{
	private PropertyInfo _pi = null;

	public GenericEmployeeComparer(
		string propertyName)
	{
		Type type = typeof(Employee);
		MemberInfo[] members =
			type.GetMember(propertyName,
			MemberTypes.Property,
			BindingFlags.Public |
			BindingFlags.Instance);
		if (members.Length > 0)
		{
			MemberInfo member = members[0];
			if (member is PropertyInfo)
				_pi = (PropertyInfo)member;
		}
	}
	public int Compare(Employee x,
		Employee y)
	{
		if (_pi == null)
			throw new ArgumentException(
		"Invalid property name specified.");

		object val1 = _pi.GetValue(
			x, null);
		object val2 = _pi.GetValue(
			y, null);
		if (val1 is IComparable)
		{
			IComparable c1 =
				(IComparable)val1;
			IComparable c2 =
				(IComparable)val2;
			return c1.CompareTo(c2);
		}
		return 0;
	}
}
This class uses reflection to access any property of the class and perform a comparison on it. Here is a set of unit tests that demonstrate the generic comparer in action:
public class TestComparisons
{
	private List<Employee> emps;

	[SetUp]
	public void SetUp()
	{
		emps = new List<Employee>();
emps.Add(new Employee { Name = "Homer",
		HireYear = 1980, Salary = 40000 });
emps.Add(new Employee { Name = "Lenny",
		HireYear = 1972, Salary = 80000 });
	emps.Add(new Employee { Name = "Carl",
		HireYear = 1975, Salary = 75000 });
	}

	[Test]
	public void generic_comparer()
	{
		var gc = new
			GenericEmployeeComparer("Name");
		emps.Sort(gc);
		Assert.AreEqual("Carl",
			emps[0].Name);
		Assert.AreEqual("Homer",
			emps[1].Name);
		Assert.AreEqual("Lenny",
			emps[2].Name);
		gc = new GenericEmployeeComparer(
			"HireYear");
		emps.Sort(gc);
		Assert.AreEqual("Lenny",
			emps[0].Name);
		Assert.AreEqual("Carl",
			emps[1].Name);
		Assert.AreEqual("Homer",
			emps[2].Name);
		gc = new GenericEmployeeComparer(
			"Salary");
		emps.Sort(gc);
		Assert.AreEqual("Homer",
			emps[0].Name);
		Assert.AreEqual("Carl",
			emps[1].Name);
		Assert.AreEqual("Lenny",
			emps[2].Name);
	}
Every property accessed by the generic comparer must implement the IComparable interface, but that shouldn’t cause problems because all the built-in .NET types implement this interface. Any custom classes you create that end up as properties will need to implement this interface, but it’s one of the most commonly implemented interfaces.

Should you bother going to this much trouble to eliminate the duplication between two classes? It depends on your tolerance for duplication. I believe that duplication is the single most insidious diminishing force in software development, so I would probably go ahead and do this for just two properties. Others who aren’t as offended by duplication might take a softer stance. But for virtually everyone, there is a threshold that justifies the extra complexity of using reflection. What if your Employee had 10 properties? 50 properties? At some point, creating tons of almost identical classes becomes a huge annoying drag on your project.

Summary

This article illustrates a couple of aspects of the real-world application of refactoring. Lots more examples abound from this topic. Understanding how refactorings work is an important first step. But learning when and why to apply them follows hard behind. Design patterns are similar – you can read about patterns forever, but until you start utilising them, you don’t really understand them fully. Refactoring follows the same learning curve.


Neal Ford is an application architect at Thoughtworks, and is author of several books including The Productive Programmer (published by O’Reilly, ISBN 978-0596519780).

Neal will be presenting a number of sessions, including one on real-world refactoring, at DevWeek 2009 in London at the end of March.

You might also like...

Comments

About the author

Neal Ford United Kingdom

Neal Ford is an application architect at Thoughtworks and is author of several books including The Productive Programmer

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“My definition of an expert in any field is a person who knows enough about what's really going on to be scared.” - P. J. Plauger