SQL World: Working with CTE (Common Table Expressions)

Introduced in SQL Server 2005, the common table expression (CTE) is a temporary named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. You can also use a CTE in a CREATE VIEW statement, as part of the view’s SELECT query. In addition, as of SQL Server 2008, you can add a CTE to the new MERGE statement.

SQL Server supports two types of CTEs—recursive and nonrecursive. In this article, I explain how to create both types. The examples I provide are based on a local instance of SQL Server 2008 and retrieve data from the AdventureWorks2008 sample database.

Working with Common Table Expressions

You define CTEs by adding a WITH clause directly before your SELECT, INSERT, UPDATE, DELETE, or MERGE statement. The WITH clause can include one or more CTEs, as shown in the following syntax:

[WITH <common_table_expression> [,...]]

<common_table_expression>::=

cte_name [(column_name [,...])]

AS (cte_query)

...which can be represented like this...

As you can see, if you include more than one CTE in your WITH clause, you must separate them with commas. In addition, for each CTE, you must provide a name, the AS keyword, and a SELECT statement. You can also provide column names (separated by commas), as long as the number of names match the number of columns returned by the result set.

The SELECT statement in your CTE query must follow the same requirements as those used for creating a view. For details about those requirements, see the topic “CREATE VIEW (Transact-SQL)” in SQL Server Books Online. For more details about CTEs in general, see the topic “WITH common_table_expression (Transact-SQL).”

After you define your WITH clause with the necessary CTEs, you can then reference those CTEs as you would any other table. However, you can reference a CTE only within the execution scope of the statement that immediately follows the WITH clause. After you’ve run your statement, the CTE result set is not available to other statements.

Creating a Nonrecursive Common Table Expression

A nonrecursive CTE is one that does not reference itself within the CTE. Nonrecursive CTEs tend to be simpler than recursive CTEs, which is why I’m starting with this type. In the following example, I create a CTE named cteTotalSales:

WITH

cteTotalSales (SalesPersonID, NetSales)

(

SELECT SalesPersonID, ROUND(SUM(SubTotal), 2)

FROM Sales.SalesOrderHeader

WHERE SalesPersonID IS NOT NULL

GROUP BY SalesPersonID

)

SELECT

sp.FirstName + ' ' + sp.LastName AS FullName,

sp.City + ', ' + StateProvinceName AS Location,

ts.NetSales

FROM Sales.vSalesPerson AS sp

INNER JOIN cteTotalSales AS ts

ON sp.BusinessEntityID = ts.SalesPersonID

ORDER BY ts.NetSales DESC

After I specify the CTE name, I provide two column names, SalesPersonID and NetSales, which are enclosed in parentheses and separated by a comma. That means the result set returned by the CTE query must return two columns.

Next, I provide the AS keyword, then a set of parentheses that enclose the CTE query. In this case, the SELECT statement returns the total sales for each sales person (total sales grouped by salesperson ID). As you can see, the CTE query can include Transact-SQL functions, GROUP BY clauses, or any elements that the SELECT statement in a view definition can include.

I can now reference cteTotalSales in the statement that immediately follows. For this example, I create a SELECT statement that joins the Sales.vSalesPerson view to cteTotalSales, based on the salesperson ID. I then pull the names and locations from the view and the net sales from the CTE. The following table shows the results returned by this statement.

As you saw earlier in the syntax, you can include multiple CTEs in a WITH clause. The following WITH clause includes two CTEs, one named cteTotalSales and one named cteTargetDiff:

WITH

cteTotalSales (SalesPersonID, NetSales)

(

SELECT SalesPersonID, ROUND(SUM(SubTotal), 2)

FROM Sales.SalesOrderHeader

WHERE SalesPersonID IS NOT NULL

AND OrderDate BETWEEN '2003-01-01 00:00:00.000'

AND '2003-12-31 23:59:59.000'

GROUP BY SalesPersonID

cteTargetDiff (SalesPersonID, SalesQuota, QuotaDiff)

(

SELECT ts.SalesPersonID,

CASE

WHEN sp.SalesQuota IS NULL THEN 0

ELSE sp.SalesQuota

END,

CASE

WHEN sp.SalesQuota IS NULL THEN ts.NetSales

ELSE ts.NetSales - sp.SalesQuota

END

FROM cteTotalSales AS ts

INNER JOIN Sales.SalesPerson AS sp

ON ts.SalesPersonID = sp.BusinessEntityID

)

SELECT

sp.FirstName + ' ' + sp.LastName AS FullName,

sp.City,

ts.NetSales,

td.SalesQuota,

td.QuotaDiff

FROM Sales.vSalesPerson AS sp

INNER JOIN cteTotalSales AS ts

ON sp.BusinessEntityID = ts.SalesPersonID

INNER JOIN cteTargetDiff AS td

ON sp.BusinessEntityID = td.SalesPersonID

ORDER BY ts.NetSales DESC

The first CTE—cteTotalSales—is similar to the one in the preceding example, except that the WHERE clause has been further qualified to include sales only from 2003. After I define cteTotalSales, I add a comma, and then define cteTargetDiff, which calculates the difference between the sales total and the sales quota.

The new CTE definition specifies three columns for the result set: SalesPersonID, SalesQuota, and QuotaDiff. As you would expect, the CTE query returns three columns. The first is the salesperson ID. The second is the sales quota. However, because a sales quota is not defined for some salespeople I use a CASE statement. If the value is null, that value is set to 0, otherwise the actual SalesQuota value is used.

The final column returned is the difference between the net sales and sales quota. Again, I use a CASE statement. If the SalesQuota value is null, then the NetSales value is used, otherwise the sales quota is subtracted from the net sales to arrive at the difference.

Something interesting to note about the second CTE query is that I’ve joined the Sales.SalesPerson table to the first CTE—cteTotalSales—so I could calculate the difference between total sales and the sales quota. Whenever you define multiple CTEs in a single WITH clause, you can reference preceding CTEs (but not the other way around).

Once I’ve defined my CTEs, I can reference them in the first statement that follows the CTE, as you saw in the previous example. In this case, I join the Sales.vSalesPerson view to cteTotalSales and then join to cteTargetDiff, all based on the salesperson ID. My SELECT list then includes columns from all three sources. The statement returns the results shown in the following table.

As you can see, sales data is provided for all salespeople, including the city in which they reside, their net sales, their sales quota, and the calculated difference between the two figures. In this case, everyone well exceeds the quota, where a quota has been defined.

Creating a Recursive Common Table Expression

A recursive CTE is one that references itself within that CTE. The recursive CTE is useful when working with hierarchical data because the CTE continues to execute until the query returns the entire hierarchy.

A typical example of hierarchical data is a table that includes a list of employees. For each employee, the table provides a reference to that person’s manager. That reference is itself an employee ID within the same table. You can use a recursive CTE to display the hierarchy of employee data, as it would appear within the organizational chart.

Note that a CTE created incorrectly could enter an infinite loop. To prevent this, you can include the MAXRECURSION hint in the OPTION clause of the primary SELECT, INSERT, UPDATE, DELETE, or MERGE statement. For information about using query hints, see the topic “Query Hints (Transact-SQL)” in SQL Server Books Online.

To demonstrate how the recursive CTE works, I used the following Transact-SQL statements to create and populate the Employees table in the AdventureWorks2008 database:

IF OBJECT_ID('Employees', 'U') IS NOT NULL

DROP TABLE dbo.Employees

CREATE TABLE dbo.Employees

(

EmployeeID int NOT NULL PRIMARY KEY,

FirstName varchar(50) NOT NULL,

LastName varchar(50) NOT NULL,

ManagerID int NULL

)

INSERT INTO Employees VALUES (101, 'Ken', 'Sánchez', NULL)

INSERT INTO Employees VALUES (102, 'Terri', 'Duffy', 101)

INSERT INTO Employees VALUES (103, 'Roberto', 'Tamburello', 101)

INSERT INTO Employees VALUES (104, 'Rob', 'Walters', 102)

INSERT INTO Employees VALUES (105, 'Gail', 'Erickson', 102)

INSERT INTO Employees VALUES (106, 'Jossef', 'Goldberg', 103)

INSERT INTO Employees VALUES (107, 'Dylan', 'Miller', 103)

INSERT INTO Employees VALUES (108, 'Diane', 'Margheim', 105)

INSERT INTO Employees VALUES (109, 'Gigi', 'Matthew', 105)

INSERT INTO Employees VALUES (110, 'Michael', 'Raheem', 106)

As you might realize, the AdventureWorks2008 database already includes the HumanResources.Employee table. However, that table now uses the hierarchyid data type to store hierarchical data, which would introduce unnecessary complexity when trying to demonstrate a recursive CTE. For that reason, I created my own table. However, if you want to try out a recursive CTE without creating and populating a new table, you can use the AdventureWorks sample database that shipped with SQL Server 2005. The HumanResources.Employee table in that database stores the data in a way similar to the table I create above.

After I created the Employees table, I created the following SELECT statement, which is preceded by a WITH clause that includes a CTE named cteReports:

WITH

cteReports (EmpID, FirstName, LastName, MgrID, EmpLevel)

(

SELECT EmployeeID, FirstName, LastName, ManagerID, 1

FROM Employees

WHERE ManagerID IS NULL

UNION ALL

SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID,

r.EmpLevel + 1

FROM Employees e

INNER JOIN cteReports r

ON e.ManagerID = r.EmpID

)

SELECT

FirstName + ' ' + LastName AS FullName,

EmpLevel,

(SELECT FirstName + ' ' + LastName FROM Employees

WHERE EmployeeID = cteReports.MgrID) AS Manager

FROM cteReports

ORDER BY EmpLevel, MgrID

As you can see, the CTE returns five columns: EmpID, FirstName, LastName, MgrID, and EmpLevel. The EmpLevel column refers to the level in the hierarchy in which the employees fit. The highest level of the hierarchy is 1, the next level is 2, followed by 3, and so on.

The CTE query is itself made up of two SELECT statements, connected with the UNION ALL operator. A recursive CTE query must contain at least two members (statements), connected by the UNION ALL, UNION, INTERSECT, or EXCEPT operator. In this example, the first SELECT statement is the anchor member, and the second statement is the recursive member. All anchor members must precede the recursive members, and only the recursive members can reference the CTE itself. In addition, all members must return the same number of columns with corresponding data types.

Now lets look closer at the statements themselves. The first statement, the anchor member, retrieves the employee ID, first name, last name, and manager ID from the Employees table, where the manager ID is null. This would be the employee at the top of the hierarchy, which means this person reports to no one. Consequently, the manager ID value is null. To reflect that this person is at the top of the hierarchy, I assign a value of 1 to the EmpLevel column.

The second statement in the CTE query—the recursive member—also retrieves the employee ID, first name, last name, and manager ID for employees in the Employees table. However, notice that I join the Employees table to the CTE itself. In addition, the join is based on the manager ID in the Employees table and the employee ID in the CTE. By doing this, the CTE will loop through the Employees table until it returns the entire hierarchy.

One other item to notice about the second statement is that, for the EmpLevel column, I add the value 1 to the EmpLevel value as it appears in the CTE. That way, each time the statement loops through the hierarchy, the next correct level is applied to the employees at the level.

After I define my WITH clause, I create a SELECT statement that retrieves the data from the CTE. Note, however, that for the Manager column, I retrieve the first and last name of the employee associated with the manager ID in the CTE. This allows me to display the full name of the manager for each employee. The following table shows the result set returned by the SELECT statement and its CTE.

As you can see, the CTE, whether recursive or nonrecursive, can be a useful tool when you need to generate temporary result sets that can be accessed in a SELECT, INSERT, UPDATE, DELETE, or MERGE statement. In a sense, a CTE is like a derived table: it’s not stored as an object and is valid only during the execution of the primary statement. However, unlike the derived table, a CTE can be referenced multiple times within a query and it can be self-referencing. And best of all, CTEs are relatively easy to implement.

SQL World

Pages

Tuesday, April 3, 2012

Working with CTE (Common Table Expressions)

No comments:

Post a Comment