Tree structures in ASP.NET and SQL Server

Storing Trees in SQL Server

In order to store our data structure in SQL Server, we're obviously going to need similar properties to that of the TreeNode object. We'll call the table dfTree, and add the fields shown in the diagram below. In addition, the id field's Identity property is also set.

A diagram displaying the structure of the table, with these fields: id (int, primary key), parentId (int), name (varchar), depth (int), lineage (varchar).

Each row in this table represents a TreeNode object. We use the parentId field to indicate the parent node as the ParentID property does in our TreeNode class - but this will either be NULL (indicating we are at the root of the tree), or be the id of another node in the tree which is our parent.

Ignoring the additional depth and lineage fields for now, it's clearly straightforward to fetch - say a node and all its direct children through a simple LEFT JOIN: (note we need the left join to ensure we get the parent back too)

SELECT p.name AS parentName, c.id AS childId, c.name AS childName FROM dfTree p LEFT JOIN dfTree c ON p.id=c.parentId WHERE p.id=@NODEID

Which could return something like this:

parentName childId childName
NULL 1 ParentNode
ParentNode 2 ChildA
ParentNode 3 ChildB

However, what happens if we want to include all the children of ChildA and ChildB in the resultset too? The number of joins required increase linearly with the depth we want to go to for fetching children. Plus there's no easy way to, say, fetch all subchildren of a node, without caring what their depth is - that would require what would effectively be an infinite inner join. We obviously need another way to select nodes in the tree based on something other than (or in addition to) the parentId field. This is where the depth and lineage fields come in. The depth field represents the number of nodes between the root of the tree, and ourselves (the root has depth 0). The lineage field contains a list of ID's of nodes that we pass through in order to get from the root to the current node, delimited by a '/' character.

So, in order to have a tree like this:

A diagram displaying the structure of the table, with these fields: id (int, primary key), parentId (int), name (varchar), depth (int), lineage (varchar).

Our table would look something like this:

id parentId name depth lineage
1 NULL Root Node 0 /1/
2 1 Child A 1 /1/2/
3 1 Child B 1 /1/3/
4 1 Child C 1 /1/4/
5 2 Child D 2 /1/2/5/

These two fields now make it much easier to fetch specific sets of nodes based on where they are in the tree. For example, all children of the node labelled "Child A" will have a lineage field that starts with "/1/2/". No more inner joins, and a much more elegant way to fetch rows. There is a downside though - although manually entering the depth and lineage fields is straightforward for such a small tree, what happens when we have 100 nodes in the tree? Or we decide we want to move one portion of the tree to somewhere elsewhere in its structure? Suddenly this becomes a much more daunting task!

Fortunately, this isn't a major obstacle - because with a bit of work we can get SQL Server to automatically maintain these fields for us. We essentially have three options here:

  • Use a "computed column" so that the depth and lineage are calculated on-the-fly whenever we query the table. For both columns, this would then be a simple matter of defining two functions in SQL Server. One of which would perform a join with the rows parent, and added one to its depth. The other would perform a join with the rows parent, and add its own id plus '/' to the lineage column.
  • Alternatively, store these values in two "real" columns, and use another SQL Server feature known as triggers to recalculate these whenever a row is modified. If you haven't come across triggers before, they are essentially stored procedures that are automatically executed when rows in a table are modified as a result of an UPDATE, INSERT or DELETE query.
  • Some combination of the two

I've chosen to implement this with triggers here, on the assumption that we're going to be using SELECT far more than either UPDATE and INSERT - so we don't particularly want to recompute the values each time we perform a query - and we can afford to waste some space storing these two additional columns. This is based solely on my somewhat limited knowledge as to how SQL Server works - so if you know a good reason why I should be using a different method, then please do drop me a line!

You might also like...

Comments

About the author

James Crowley

James Crowley United Kingdom

James first started this website when learning Visual Basic back in 1999 whilst studying his GCSEs. The site grew steadily over the years while being run as a hobby - to a regular monthly audien...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“PHP is a minor evil perpetrated and created by incompetent amateurs, whereas Perl is a great and insidious evil perpetrated by skilled but perverted professionals.” - Jon Ribbens