Friday, April 25, 2008

SQL Server 2008 - The hierarchyid data type

Hierarchical data in SQL Server are almost always stored using the Parent/Child approach. This approach requires that a relationship occurs between two columns of the same table. Also, in some scenarios the XML data type can be used for this purpose. Examples of hierarchical data are typically an organizational structure of employees, or folders in a computer file system. Tables for such scenarios can be depicted through the following example:

SQL Server 2008 introduces a new CLR data type called hierarchyid to make storing and querying hierarchical data much easier. Unlike the parent/child approach which focused on maintaining relationships among records, this approach stores data by keeping track of the hierarchical level of each record and its position within its given level. The data stored in a hierarchyid field looks similar to how directory structures in a file system are organized, except using numbers such like the sketch below.
The entity instance at the helm of the hierarchy is recorded as the root (/). Subordinates of the root are recorded with their position numbers, such as /1/ for the first subordinate and /2/ for the second and so on. If by any chance
  • a subordinate is to be inserted between /1/ and /2/, it would be recorded as /1.1/ (since 1.1 naturally appears between 1 and 2, yet remember that this is not a decimal numbering system),
  • and if a subordinate is to be inserted before /1/ it would be added as /0/,
  • and to insert a subordinate in between /1/ and /1.1/ would result in the value /1.0/

But, all the above values are in the same level of the hierarchy (i.e. direct children of the root). Hierarchyid comes with several functions that you could use for various kinds of manipulations. It is also optimized to handle tree structures more efficiently, than the traditional parent/child approach.

Populating a column with hierarchyid values does not automatically relate the records of the table to each other, since there is no relationship defined between the records. It is up to the application to perform the placement of the records such that the hierarchy is properly formed. Take for instance the process of drawing an organizational chart for a business: The logical structure of the hierarchy is first drawn, then the employees are added to the sections of the hierarchy. Each section of the chart may not always be filled at once, since new employees may join the various ranks of the business at later times, neither is there a rule stating that the employees should be added to each section of the diagram in a particular order, such as a top-down method. Similarly the hierarchyid data type also possesses its own hierarchical structure defined within it. It is up to the application to pick the appropriate node, retrieve its value and then use it to insert into the table.

Getting down to business

Let us put the hierarchyid data type into use by creating a table of employees. The Employees table will be a simple one similar to the one used in Books Online:

1. Creating the table


CREATE TABLE [dbo].[Employees]

( [EmployeeID] hierarchyid NOT NULL, -- Primary Key of the HierarchyID data type
[EmployeeCode] char(4) NOT NULL, -- Business Key that uniquely identifies employees

[Name] varchar(20) NOT NULL, -- Name of the employee

[Title] varchar(20) NULL, -- Employee's title

PRIMARY KEY CLUSTERED ([EmployeeID] ASC),

CONSTRAINT unqEmployeeCode UNIQUE ([EmployeeCode])

)

GO

Note that the EmployeeID field has been set as the primary key and the EmployeeCode field has been assigned a unique constraint. The EmployeeCode is just a business key used to identify the entity.

2. Populating the Table

Let us start off the population by inserting the record of the top employee. This employee would be considered the root, hence let us add this record first (not that it requires to be done so). In situations where there is no single top employee, but three or four such people, we could forget about the root node, and go about adding nodes from the next level onwards. Let us take into account the following diagram which depicts our employees' organizational structure:


Since the root employee, let us start off by inserting employee E013's record:

INSERT INTO Employees (EmployeeID, EmployeeCode, Name, Title)

VALUES (hierarchyid::GetRoot(), 'E013', 'R. Weisz', 'Manager')

GO


We call the static GetRoot() function to return a hierarchyid value representing "root".

SELECT EmployeeID.ToString() AS 'EmployeeID_String', *

FROM Employees

GO

/* Results:

EmployeeID_String EmployeeID EmployeeCode Name Title

------------------ ---------- ------------- -------- -------

/ 0x E013 R. Weisz Manager

(1 row(s) affected)

*/

Since the hierarchyid value is not easily understandable, we need to cast it into a string by using the ToString() function. Note that it displays the value as "root" (/). Convert and Cast functions work as well.

Our next steps would be to insert the subordinate records. This process requires no specific order of insertion. Yet there are several options we need to look at before performing the inserts. Subordinates or child nodes can be placed (relative to the parent):

In order to add a child node we need to use the GetDescendant() method to generate a hierarchyid value from the specified parent node. The GetDescendant() method accepts two parameters: child1 and child2 which represent the two child nodes between which the new node is to be inserted. Therefore,

  • if child1 and child2 are both NULL, no children exist; hence generate an id for a new child
  • if child1 is NULL and child2 is NOT NULL, generate an id for a new child before child2
  • if child1 is NOT NULL and child2 is NULL, generate an id for a new child after child1
  • if child1 and child2 are both NOT NULL, generate an id for a new child between child1 and child2.

DECLARE @ParentID hierarchyid

SELECT @ParentID = hierarchyid::GetRoot()

INSERT INTO Employees (EmployeeID, EmployeeCode, Name, Title)

VALUES (@ParentID.GetDescendant(NULL, NULL), 'E292', 'N. Portman', 'Executive')

GO

/*

Results:

EmployeeID_String EmployeeCode Name Title

-------------------- ------------ ------------- -----------

/ E013 R. Weisz Manager

/1/ E292 N. Portman Executive

(2 row(s) affected)

*/

The @ParentID variable is used to obtain a reference to the parent node, which in this case is the root node. If the parent is not the root node, then the reference should be obtained from the appropriate record in the table. Similarly let us also add code to insert the rest of E013's children in the following sequence:

E094 as a child before an existing child node (before E292)

E732 as a child after an existing child node (after E292)

E256 as a child between two existing child nodes (between E292 and E732)

DECLARE @ParentID hierarchyid, @ChildID hierarchyid

SELECT @ParentID = hierarchyid::GetRoot() -- Retrieve Parent ID

SELECT @ChildID = EmployeeID FROM Employees

WHERE EmployeeCode = 'E292' -- Retrieve the only existing Child's ID
-- Insert Employee E094 before E292
INSERT INTO Employees (EmployeeID, EmployeeCode, Name, Title)VALUES (@ParentID.GetDescendant(NULL, @ChildID), 'E094', 'N. Jones', 'Executive')

/*

EmployeeID_String EmployeeID EmployeeCode Name Title

------------------- ------------ ------------ ------------ ----------

/ 0x E013 R. Weisz Manager

/0/ 0x48 E094 N. Jones Executive

/1/ 0x58 E292 N. Portman Executive

(3 row(s) affected)

*/

-- Insert Employee E732 after E292

INSERT INTO Employees (EmployeeID, EmployeeCode, Name, Title)

VALUES (@ParentID.GetDescendant(@ChildID, NULL), 'E732', 'E. Cuthbert', 'Executive')


/*

EmployeeID_String EmployeeID EmployeeCode Name Title

------------------- ------------ ------------ ------------ ----------

/ 0x E013 R. Weisz Manager

/0/ 0x48 E094 N. Jones Executive

/1/ 0x58 E292 N. Portman Executive

/2/ 0x68 E732 E. Cuthbert Executive

(4 row(s) affected)

*/


DECLARE @ChildID2 hierarchyidSELECT @ChildID2 = EmployeeID FROM Employees WHERE EmployeeCode = 'E732' -- Retrieve the ID of Employee E732

-- Insert Employee E256 between E292 and E732

INSERT INTO Employees (EmployeeID, EmployeeCode, Name, Title)VALUES (@ParentID.GetDescendant(@ChildID, @ChildID2), 'E256', 'A. Hathaway', 'Specialist')


/*

EmployeeID_String EmployeeID EmployeeCode Name Title

------------------- ------------ ------------ ------------ ----------

/ 0x E013 R. Weisz Manager

/0/ 0x48 E094 N. Jones Executive

/1/ 0x58 E292 N. Portman Executive

/1.1/ 0x62C0 E256 A. Hathaway Specialist

/2/ 0x68 E732 E. Cuthbert Executive

(5 row(s) affected)

*/


Similarly we could also add subordinates to the second level of employees that we just added. Much more simpler would be a stored procedure such as the one below:

CREATE PROC InsertEmployee

(

@ParentCode char(4),

@EmployeeCode char(4),

@Name varchar(20),

@Title varchar(20)

)

AS

BEGIN

DECLARE @ParentID hierarchyid, @ChildID1 hierarchyid, @ChildID2 hierarchyid, @ChildID hierarchyid

IF @ParentCode IS NOT NULL

SELECT @ParentID = EmployeeID FROM Employees WHERE EmployeeCode = @ParentCode

ELSE

SET @ParentID = hierarchyid::GetRoot()

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

BEGIN TRANSACTION

SELECT @ChildID2 = MIN(EmployeeID) FROM Employees

WHERE EmployeeCode > @EmployeeCode AND EmployeeID.GetAncestor(1) = @ParentID

SELECT @ChildID1 = MAX(EmployeeID) FROM Employees

WHERE EmployeeCode < @EmployeeCode AND EmployeeID.GetAncestor(1) = @ParentID

INERT INTO Employees (EmployeeID, EmployeeCode, Name, Title)

VALUES (@ParentID.GetDescendant(@ChildID1, @ChildID2), @EmployeeCode, @Name, @Title)

COMMIT TRANSACTION

END

GO

TRUNCATE TABLE InsertEmployee

EXEC InsertEmployee NULL, 'E013', 'R. Weisz', 'Manager'

EXEC InsertEmployee 'E013', 'E256', 'A. Hathaway', 'Specialist'

EXEC InsertEmployee 'E013', 'E094', 'N. Jones', 'Executive'

EXEC InsertEmployee 'E013', 'E792', 'E. Cuthbert', 'Executive'

EXEC InsertEmployee 'E013', 'E292', 'N. Portman', 'Executive'

EXEC InsertEmployee 'E094', 'E049', 'S. Johansson', 'Associate'

EXEC InsertEmployee 'E256', 'E148', 'K. Knightley', 'Intern'

EXEC InsertEmployee 'E148', 'E940', 'H. Swank', 'Special Intern'

The above stored procedure accepts the parent code to which the record has to be added under, the employee code, name and title of the new employee. This stored procedure however does not add a record for the root node, since it assumes that multiple employees could be at the top level of the employee hierarchy. It also adds the new employee in ascending order of the employee code.
Selecting Subordinates and Levels
The following two methods show how descendants of a particular node are retrieved.

DECLARE @CurEmployeeID hierarchyid

SELECT @CurEmployeeID = EmployeeID FROM Employees WHERE EmployeeCode = 'E256'

-- Code A

SELECT EmployeeID.ToString() AS 'EmployeeID_String', *

FROM Employees

WHERE @CurEmployeeID.IsDescendant(EmployeeID) = 1

/*

EmployeeID_String EmployeeCode Name Title

-------------------- ------------ -------------- ---------------

/1/1/ E256 A. Hathaway Specialist

/1/1/1/ E148 K. Knightley Intern

/1/1/1/1/ E940 H. Swank Special Intern

(3 row(s) affected)

*/

-- Code B

SELECT EmployeeID.ToString() AS 'EmployeeID_String', *

FROM Employees

WHERE EmployeeID.GetAncestor(1) = @CurEmployeeID

/*

EmployeeID_String EmployeeCode Name Title

-------------------- ------------ -------------- ---------------

/1/1/1/ E148 K. Knightley Intern

(1 row(s) affected)

*/


In the above code section, Code A uses the IsDescendant() function which returns true if the specified node is a descendant of the current node, hence the above code returns the sub-hierarchy of records. Code B makes use of the GetAncestor() function which gets the ancestor of the specified node at a given level. Therefore it returns the record of the node which has the current node as its ancestor at Level 1.


The following code uses the GetLevel() function to display the level of each node in the hierarchy.


SELECT EmployeeID.ToString() AS 'EmployeeID_String', EmployeeID.GetLevel() AS 'Level', *

FROM Employees

/*

EmployeeID_String Level EmployeeID EmployeeCode Name Title

-------------------- ------ ------------ ------------ -------------- ---------------

/1/ 1 0x58 E013 R. Weisz Manager

/1/0/ 2 0x5A40 E094 N. Jones Executive

/1/0/1/ 3 0x5A56 E049 S. Johansson Associate

/1/1/ 2 0x5AC0 E256 A. Hathaway Specialist

/1/1/1/ 3 0x5AD6 E148 K. Knightley Intern

/1/1/1/1/ 4 0x5AD6B0 E940 H. Swank Special Intern

/1/1.1/ 2 0x5B16 E292 N. Portman Executive

/1/2/ 2 0x5B40 E792 E. Cuthbert Executive

(8 row(s) affected)

*/

Reordering Nodes
Then, there are situations when a record is required to be shifted under a new parent node. To perform this action we use the Reparent() function, which requires two parameters: the old parent hierarchyid and the new one. The following code shows how this can be done:



DECLARE @CurEmployeeID hierarchyid, @OldParentID hierarchyid, @NewParentID hierarchyid

SELECT @CurEmployeeID = EmployeeID

FROM Employees

WHERE EmployeeCode = 'E049'

SELECT @OldParentID = EmployeeID.GetAncestor(1) -- Retrieve the Current Parent ID

FROM Employees WHERE EmployeeID = @CurEmployeeID

SELECT @NewParentID = EmployeeID -- Retrieve the New (Target) Parent ID

FROM Employees WHERE EmployeeCode = 'E292'

UPDATE EmployeesSET EmployeeID = @CurEmployeeID.Reparent(@OldParentID, @NewParentID)WHERE EmployeeID = @CurEmployeeID

/*

EmployeeID_String EmployeeID EmployeeCode Name Title

-------------------- ------------- ------------ ------------- ---------------

/1/ 0x58 E013 R. Weisz Manager

/1/0/ 0x5A40 E094 N. Jones Executive

/1/1 0x5AC0 E256 A. Hathaway Specialist

/1/1/1/ 0x5AD6 E148 K. Knightley Intern

/1/1/1/1/ 0x5AD6B0 E940 H. Swank Special Intern

/1/1.1/ 0x5B16 E292 N. Portman Executive

/1/1.1/1/ 0x5B16B0 E049 S. Johansson Associate

/1/2/ 0x5B40 E792 E. Cuthbert Executive

(8 row(s) affected)

*/



Indexing HierarchyID



HierarchyID supports two types of indexing:

  • Depth-first
  • Breadth-first










Depth-first index




(Image taken from SQL Server 2008 February CTP Books Online)


A depth-first index stores nodes of a sub-hierarchy next to each other, thereby boosting queries that request for data under a particular hierarchy. An example would be a query requesting for "All employees reporting to A. Hathaway including their subordinates all the way down."
The following code shows how to create a depth-first index on the Employees table:


CREATE CLUSTERED INDEX Employees_DepthFirst

ON Employees (EmployeeID)



Breadth-first index




Image taken from SQL Server 2008 February CTP Books Online)


A breadth-first index stores the immediate subordinate nodes of a particular node together, thereby boosting queries that request for immediate children of a particular node. An example would be a query requesting for "All employees who directly report to A. Hathaway."
The following code shows how to create a breadth-first index on the Employees table. To create a breadth-first index, you require the level of each node. This could be done by creating a calculated column using the GetLevel() function.
ALTER TABLE Employees

ADD [Level] AS EmployeeID.GetLevel()

GO

CREATE UNIQUE INDEX Employees_BreadthFirst

ON Employees ([Level], [EmployeeID])

GO



The option of using either or both index types on your hierarchyid table depends on the types of queries and the frequency of running them.
A little more facts about hierarchyid





  • Supposing that x and y are two hierarchyid values; x <>

  • Arbitrary deletions are possible in a table with hierarchyid, but it could leave orphaned sub-trees. Hence, it is required that the application performs all required validations.


  • Hierarchyid data types are extremely compact in size.


Though drawbacks such as the need for application based consistency maintenance is present, the hierarchyid data type provides improved querying over the parent/child and XML methods in many cases, and requires a little getting used to since it is a CLR data type. Code in this article was written and tested on SQL Server 2008 CTP - February 2008. Refer SQL Server 2008 Books Online for more information and tutorials.

No comments: