Fixing .NET memory problems

This article was originally published in VSJ, which is now part of Developer Fusion.
The Microsoft .NET Framework and Common Language Runtime (CLR) mark a significant change in how developers build applications targeting the Windows platform. Developers can free themselves from the error-prone tedium of managing an application's memory by using the features of the .NET Framework to do it automatically.

But nothing comes for free in writing applications. It's time-consuming to automatically identify memory that is no longer needed, collect that memory and return it to the free memory heap. Applications that use memory poorly add to the problem, forcing the system to work harder and more often to reclaim memory. Over time, poor application memory management can also result in subtle, difficult-to-find errors that slow application performance while reducing scalability and reliability.

These types of memory problems are new and unfamiliar to most developers. They assume that they have no control over how memory is used, allocated, and de-allocated, and pay no attention to how design and implementation decisions impact memory usage. But there are solutions, and it's simply a matter of learning where the problems can manifest themselves and why they occur. Even in a managed world, developers can help avoid pitfalls and improve the performance of their applications, by understanding how .NET memory management works and how their applications use memory.

Working with the garbage collector

The .NET garbage collector works by starting with roots, base object references that are not embedded inside another object. For example, static and global pointers are application roots. The garbage collector starts with these roots, and traces references to other objects on the heap. It creates a graph of all objects that are reachable from these root components.

Any object that is not in this graph is considered to be no longer in use and is added back to the heap. The garbage collector accomplishes this by walking through the heap and identifying objects that are not a part of one of these graphs. The garbage collector marks these addresses and keeps track of them until it has walked through the entire heap or some defined portion of it.

During this process, the garbage collector also compacts the heap so that fragmentation doesn't prevent an allocation due to the lack of a large enough memory block. Additionally, this compaction leaves free memory at the top of the heap, where it can be reallocated simply by moving the heap pointer.

For efficiency reasons, the garbage collector also uses a concept called generations in reclaiming memory. There are a total of three generations, labelled 0, 1, and 2. When objects are first allocated at the beginning of application execution, the garbage collector refers to this part of the heap as generation 0. Newly created objects always go into generation 0. These "young" objects have not yet been examined by the garbage collector.

The garbage collector checks this generation first, so it can reclaim more memory, and more quickly, than if it treated all heap memory the same. If there are references to the object when the next collection occurs, the object gets moved to generation 1. When more objects are added to the heap, the heap fills and a garbage collection must occur. The objects that have survived a collection are now older and are considered to be in generation 1.

Likewise, if an object survives a generation 1 garbage collection, it is promoted to generation 2. When a collection occurs, the three generations of heap memory – generations 0, 1, and 2 – are checked in succession. If checking generation 0 reclaims enough memory, garbage collection ceases. If not, the garbage collector then checks generation 1, and finally generation 2. In practice, generation 2 objects are long-lived, and are often not collected until the application finishes and exits.

Garbage collection and optimisation

Garbage collection itself is a computationally expensive process. De-allocations take significantly more time than manually freeing memory back onto the heap. The act of manipulating data in memory, though very real, is indirect and in many ways hidden from the developer. This means that developers still have to worry about memory management. But the rules have changed. Rather than concentrating on the tactical mechanics of allocating, initialising, casting and freeing memory blocks of specific size and location, developers can focus on overarching strategies for using memory management to improve application performance and reliability.

One of the most common problems to experience is creating too many objects. Because allocating new memory with the .NET Framework is quite fast, it's easy to forget that a single line of code could trigger a lot of allocations. The problem occurs when it comes time to collect these objects. Since garbage collection involves a performance penalty, collecting a large number of unnecessary objects exacerbates the problem.

This problem typically occurs when instructions generated from code create temporary objects to perform their actions. Many .NET classes create temporary objects for their return values, for temporary strings and for associated classes such as enumerators that serve a necessary but short-lived purpose. A developer can't simply use any instruction to perform a particular action, because that construct might produce undesirable side effects.

As a simple example, consider an exercise to concatenate two strings. It might seem simple to apply the "+" operator to perform this action. However, the "+" operator causes several new string objects to be created every time text is added to the string. Instead, using the System.Text.StringBuilder class often promotes faster string concatenation without creating new objects. This type of problem can be even worse in cases where a single instruction can create many temporary objects, all of which must be garbage-collected when their work is completed.

Here's another example. I have a sample application that begins by loading a large number of XML documents into memory. Loading an XML document takes a significant amount of memory, and it is especially inefficient if the application reads only a single field from that document. This will produce a large number of temporary objects.

If you apply memory analysis, however, a different picture emerges. A memory analysis tool such as Compuware's DevPartner Studio shows, in fact, that the largest use of temporary space in the application. The analysis reports that the seemingly minor Region.ToString() method is being called nearly a million times, allocating nearly 24 megabytes of temporary space (Figure 1).

Figure 1: Memory analysis shows that the method Region.ToString() is called an excessive number of times, creating hundreds of thousands of temporary objects
Figure 1: Memory analysis shows that the method Region.ToString() is called an excessive number of times, creating hundreds of thousands of temporary objects [Click to enlarge]

By looking at the call graph, we can see that this function is called from the line of code that adds the Region into the regionListBox control:

regionListBox.Items.Add( rs );
It seems that WinForms.ListBox calls ToString() quite a lot! Fortunately, this is very easy to optimise. Replacing the method with:
private string tostring_cache = null;
public override string ToString()
	{
	if( tostring_cache == null )
		tostring_cache = name + " (" + twoLetter + ")";
	return tostring_cache;
	}
This easily eliminates this performance hit, as can be demonstrated in a subsequent run of performance analysis.

Another problem is object leaks. An object leak occurs when a reference is made accidentally, or not removed appropriately, resulting in the object getting written to when the application has finished with it. An example of such a leak is caching an object reference in a static member variable and forgetting to release it after the end of a request. The memory reference will remain until the application completes and the heap is returned to the operating system.

This also leads to the issue of inappropriately long-lived objects. Because garbage collection is automatic, it's easy to forget that memory is still managed according to predefined rules. If an object is kept around long enough to be promoted to generation 2 of collection, it might not be collected until the application exits.

Why is this bad? Because the number of objects stored in the heap is likely to keep growing while the application is running. This causes two problems. First, more heap memory extends the amount of time required to garbage-collect, slowing down the application. Second, memory is not an infinite resource. If the application runs long enough, it will generate an out-of-memory error.

Lastly, if you create an object that refers to many other objects, it can cause other problems. For example, it will force the garbage collector to follow all of the pointers between the objects, lengthening the time needed to complete the process. The results are particularly bad if this is a long-lived object structure, because the garbage collector goes through this process for every collection if the object has been modified.

Strategies for memory management

All of this means that managing memory in the .NET Framework requires a deep level of understanding not only of your application, but also of how the Framework performs its actions. And even then it's not possible to make the one best decision in all circumstances. Instead, application development becomes a matter of continuously weighing strategies for implementing features, balancing factors such as efficiency, ease of implementation and maintainability.

What kinds of strategies are available to help developers manage memory for more efficient applications? The best strategy is two-fold – to understand how the .NET Framework manages memory, and to obtain a precise picture of how your application uses memory. You can then apply both types of information to design, implement, and modify your application to optimise memory use.

The problem is that you need information on how memory is being used, and how memory usage changes as you modify your code. What is needed to examine .NET Framework memory accurately is an interactive, real-time memory analysis capability that can track individual objects in memory over time.

DevPartner Studio provides three fundamental views of .NET memory – RAM (memory) footprint, temporary objects and memory leaks. You can take snapshots of all these views, in order to examine the state of memory at an instant of your choosing. It also lets you force a garbage collection, so that you can observe the effects of memory reclamation as well as determine if an application has an object leak.

Taking a RAM footprint snapshot shows you who allocated the memory, what objects it comprises and which components are holding references to it, thus preventing it from being freed. In the case of the sample application shown in Figure 2, the snapshot shows that String objects are using by far the most memory.

Figure 2: The RAM footprint shows dynamically how much memory is allocated and by whom. This figure shows the working set being allocated at application launch.
Figure 2: The RAM footprint shows dynamically how much memory is allocated and by whom. This figure shows the working set being allocated at application launch. [Click to enlarge]

You can use analysis of temporary objects – the second fundamental view of .NET memory – to look for unusual or inefficient behaviour that creates large numbers of temporary objects or large-sized temporary objects. These problems tend to be easy to fix, in that they typically require changing the construct that is creating the temporary objects, or changing the times those objects are being created.

DevPartner Studio lets you see the objects that allocate the most memory, along with the methods that use the most memory. Further, you can drill down to examine the methods, how many times they are called, whom they are calling and who is calling them (see Figure 3). The call graph provides a visual display of this information that enables you to see at a glance how these calls occur. This provides you with a precise path of when and why these methods are called. This view shows who allocated the memory, what objects it comprises, and which components are holding references to it (preventing it from being freed).

Figure 3: The detailed temporary object view lets you see how objects and methods are being called, and whom they are calling
Figure 3: The detailed temporary object view lets you see how objects and methods are being called, and whom they are calling [Click to enlarge]

Using this approach, we make other design decisions about the sample application. At present, its Region class opens and reads in each XML document in its entirety at startup. There are a couple of alternatives to consider. Perhaps this isn't the most efficient representation of data. It might be that we can process the document into a single string and simply hold that in memory. Alternatively, perhaps it makes more sense to not load the data from disk until it is first accessed.

The only problem with the latter alternative, of course, is that the data is first accessed right now, at startup. But we already know this is slow. It might be worthwhile having an index.xml file with the region names in it, and then only loading the individual regions on demand. Here's how we do it now:

public RegionList(
		string xmldbDirectory )
	{
	DirectoryInfo dbdir =
		new DirectoryInfo(
		xmldbDirectory );
	FileInfo[] files =
		dbdir.GetFiles( "*.xml" );
	foreach( FileInfo f in files )
		summaries.Add( new Region(
			f.FullName ) );
	}
Here's how it could be done by simply loading an index first:
DirectoryInfo dbdir =
	new DirectoryInfo(xmldbDirectory );
FileInfo[] files =
	dbdir.GetFiles( "*.xml" );
foreach( FileInfo f in files )
	{
	Region r = new Region(f.FullName);
	sb.Append( "\t<region name=\"" );
	sb.Append( r.Name );
	sb.Append( "\">" );
	sb.Append( r.BackingFileName );
	sb.Append( "</region>\n" );
	}
sb.Append( "</index>\n" );
index.LoadXml( sb.ToString() );
index.Save( indexFileName );
	}
	else
		index.Load( indexFileName );
	foreach( XmlElement xe in
	index.DocumentElement.ChildNodes )
		{
		summaries.Add( new Region(
			xe.InnerText,
			xe.GetAttribute( "name" ) )
			);
		}
	}
The second algorithm is more code, but running it under the memory analysis demonstrates that it typically uses less memory.

Making the most of .NET memory

Moving to .NET doesn't mean you don't have memory management issues. The problem is that these types of issues are unfamiliar to most developers, consequently making .NET development more difficult and error-prone than its memory management model suggests. Until developers can apply the principles of .NET memory management to their advantage, applications have a greater potential for poor performance, lack of scalability and memory errors.

Part of the solution, is for developers to gain a comprehensive understanding of how the .NET Framework manages memory. But understanding alone is not enough. A strategy that works well for one application might not apply to others. It's critically important to understand how memory is used in individual applications, both at the summary level and for individual objects. It's also essential to study memory usage and garbage collection over time, dynamically viewing the changes in memory use and the effects of garbage collection at different times.


Peter Varhol is a product manager for Microsoft developer solutions at Compuware Corporation. He has graduate degrees in computer science and mathematics, and has published a number of articles on software development and related topics.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“An expert is a man who has made all the mistakes that can be made in a very narrow field” - Niels Bohr