In order to store our data structure in SQL Server, we're obviously going to need similar properties to that of the TreeNode
object. We'll call the table dfTree
, and add the fields shown in the diagram below. In addition, the id
field's Identity
property is also set.
Each row in this table represents a TreeNode
object. We use the parentId
field to indicate the parent node as the ParentID
property does in our TreeNode
class - but this will either be NULL
(indicating we are at the root of the tree), or be the id of another node in the tree which is our parent.
Ignoring the additional depth
and lineage
fields for now, it's clearly straightforward to fetch - say a node and all its direct children through a simple LEFT JOIN
: (note we need the left join to ensure we get the parent back too)
SELECT p.name AS parentName, c.id AS childId, c.name AS childName FROM dfTree p LEFT JOIN dfTree c ON p.id=c.parentId WHERE p.id=@NODEID
Which could return something like this:
parentName | childId | childName |
---|---|---|
NULL | 1 | ParentNode |
ParentNode | 2 | ChildA |
ParentNode | 3 | ChildB |
However, what happens if we want to include all the children of ChildA and ChildB in the resultset too? The number of joins required increase linearly with the depth we want to go to for fetching children. Plus there's no easy way to, say, fetch all subchildren of a node, without caring what their depth is - that would require what would effectively be an infinite inner join. We obviously need another way to select nodes in the tree based on something other than (or in addition to) the parentId
field. This is where the depth
and lineage
fields come in. The depth
field represents the number of nodes between the root of the tree, and ourselves (the root has depth 0). The lineage
field contains a list of ID's of nodes that we pass through in order to get from the root to the current node, delimited by a '/' character.
So, in order to have a tree like this:
Our table would look something like this:
id | parentId | name | depth | lineage |
---|---|---|---|---|
1 | NULL | Root Node | 0 | /1/ |
2 | 1 | Child A | 1 | /1/2/ |
3 | 1 | Child B | 1 | /1/3/ |
4 | 1 | Child C | 1 | /1/4/ |
5 | 2 | Child D | 2 | /1/2/5/ |
These two fields now make it much easier to fetch specific sets of nodes based on where they are in the tree. For example, all children of the node labelled "Child A" will have a lineage field that starts with "/1/2/
". No more inner joins, and a much more elegant way to fetch rows. There is a downside though - although manually entering the depth and lineage fields is straightforward for such a small tree, what happens when we have 100 nodes in the tree? Or we decide we want to move one portion of the tree to somewhere elsewhere in its structure? Suddenly this becomes a much more daunting task!
Fortunately, this isn't a major obstacle - because with a bit of work we can get SQL Server to automatically maintain these fields for us. We essentially have three options here:
- Use a "computed column" so that the depth and lineage are calculated on-the-fly whenever we query the table. For both columns, this would then be a simple matter of defining two functions in SQL Server. One of which would perform a join with the rows parent, and added one to its depth. The other would perform a join with the rows parent, and add its own id plus '/' to the lineage column.
- Alternatively, store these values in two "real" columns, and use another SQL Server feature known as triggers to recalculate these whenever a row is modified. If you haven't come across triggers before, they are essentially stored procedures that are automatically executed when rows in a table are modified as a result of an
UPDATE
,INSERT
orDELETE
query. - Some combination of the two
I've chosen to implement this with triggers here, on the assumption that we're going to be using SELECT
far more than either UPDATE
and INSERT
- so we don't particularly want to recompute the values each time we perform a query - and we can afford to waste some space storing these two additional columns. This is based solely on my somewhat limited knowledge as to how SQL Server works - so if you know a good reason why I should be using a different method, then please do drop me a line!
Comments