Introduced in SQL Server 2005, the common table expression
(CTE) is a temporary named result set that you can reference within a SELECT,
INSERT, UPDATE, or DELETE statement. You can also use a CTE in a CREATE VIEW
statement, as part of the view’s SELECT query. In addition, as of SQL Server
2008, you can add a CTE to the new MERGE statement.
SQL Server supports two types of CTEs—recursive and
nonrecursive. In this article, I explain how to create both types. The examples
I provide are based on a local instance of SQL Server 2008 and retrieve data
from the AdventureWorks2008 sample database.
Working with Common Table Expressions
You define CTEs by adding a WITH clause directly before your
SELECT, INSERT, UPDATE, DELETE, or MERGE statement. The WITH clause can include
one or more CTEs, as shown in the following syntax:
[WITH <common_table_expression> [,...]]
<common_table_expression>::=
cte_name [(column_name [,...])]
AS (cte_query)
...which can be represented like this...
As you can see, if you include more than one CTE in your
WITH clause, you must separate them with commas. In addition, for each CTE, you
must provide a name, the AS keyword, and a SELECT statement. You can also
provide column names (separated by commas), as long as the number of names
match the number of columns returned by the result set.
The SELECT statement in your CTE query must follow the same
requirements as those used for creating a view. For details about those
requirements, see the topic “CREATE VIEW (Transact-SQL)” in SQL Server Books Online.
For more details about CTEs in general, see the topic “WITH
common_table_expression (Transact-SQL).”
After you define your WITH clause with the necessary CTEs,
you can then reference those CTEs as you would any other table. However, you
can reference a CTE only within the execution scope of the statement that
immediately follows the WITH clause. After you’ve run your statement, the CTE
result set is not available to other statements.
Creating a Nonrecursive Common Table
Expression
A nonrecursive CTE is one that does not reference itself
within the CTE. Nonrecursive CTEs tend to be simpler than recursive CTEs, which
is why I’m starting with this type. In the following example, I create a CTE
named cteTotalSales:
WITH
cteTotalSales (SalesPersonID, NetSales)
AS
(
SELECT
SalesPersonID, ROUND(SUM(SubTotal), 2)
FROM Sales.SalesOrderHeader
WHERE
SalesPersonID IS NOT
NULL
GROUP BY SalesPersonID
)
SELECT
sp.FirstName + ' ' + sp.LastName AS FullName,
sp.City + ', ' + StateProvinceName AS
Location,
ts.NetSales
FROM Sales.vSalesPerson AS sp
INNER JOIN cteTotalSales AS
ts
ON sp.BusinessEntityID =
ts.SalesPersonID
ORDER BY
ts.NetSales DESC
After I specify the CTE name, I provide two column names,
SalesPersonID and NetSales, which are enclosed in parentheses and separated by
a comma. That means the result set returned by the CTE query must return two
columns.
Next, I provide the AS keyword, then a set of parentheses
that enclose the CTE query. In this case, the SELECT statement returns the
total sales for each sales person (total sales grouped by salesperson ID). As
you can see, the CTE query can include Transact-SQL functions, GROUP BY
clauses, or any elements that the SELECT statement in a view definition can
include.
I can now reference cteTotalSales in the statement that
immediately follows. For this example, I create a SELECT statement that joins
the Sales.vSalesPerson view to cteTotalSales, based on the salesperson ID. I
then pull the names and locations from the view and the net sales from the CTE.
The following table shows the results returned by this statement.
As you saw earlier in the syntax, you can include multiple
CTEs in a WITH clause. The following WITH clause includes two CTEs, one named
cteTotalSales and one named cteTargetDiff:
WITH
cteTotalSales (SalesPersonID,
NetSales)
AS
(
SELECT
SalesPersonID, ROUND(SUM(SubTotal), 2)
FROM Sales.SalesOrderHeader
WHERE
SalesPersonID IS NOT
NULL
AND
OrderDate BETWEEN '2003-01-01
00:00:00.000'
AND '2003-12-31 23:59:59.000'
GROUP BY SalesPersonID
),
cteTargetDiff (SalesPersonID,
SalesQuota, QuotaDiff)
AS
(
SELECT ts.SalesPersonID,
CASE
WHEN sp.SalesQuota IS NULL THEN 0
ELSE sp.SalesQuota
END,
CASE
WHEN sp.SalesQuota IS NULL THEN ts.NetSales
ELSE ts.NetSales - sp.SalesQuota
END
FROM
cteTotalSales AS ts
INNER
JOIN Sales.SalesPerson
AS sp
ON
ts.SalesPersonID =
sp.BusinessEntityID
)
SELECT
sp.FirstName + ' ' + sp.LastName AS FullName,
sp.City,
ts.NetSales,
td.SalesQuota,
td.QuotaDiff
FROM Sales.vSalesPerson AS sp
INNER JOIN cteTotalSales AS
ts
ON sp.BusinessEntityID =
ts.SalesPersonID
INNER JOIN cteTargetDiff AS
td
ON sp.BusinessEntityID =
td.SalesPersonID
ORDER BY
ts.NetSales DESC
The first CTE—cteTotalSales—is similar to the one in the
preceding example, except that the WHERE clause has been further qualified to
include sales only from 2003. After I define cteTotalSales, I add a comma, and
then define cteTargetDiff, which calculates the difference between the sales
total and the sales quota.
The new CTE definition specifies three columns for the
result set: SalesPersonID, SalesQuota, and QuotaDiff. As you would expect, the
CTE query returns three columns. The first is the salesperson ID. The second is
the sales quota. However, because a sales quota is not defined for some
salespeople I use a CASE statement. If the value is null, that value is set to
0, otherwise the actual SalesQuota value is used.
The final column returned is the difference between the net
sales and sales quota. Again, I use a CASE statement. If the SalesQuota value
is null, then the NetSales value is used, otherwise the sales quota is
subtracted from the net sales to arrive at the difference.
Something interesting to note about the second CTE query is
that I’ve joined the Sales.SalesPerson table to the first CTE—cteTotalSales—so
I could calculate the difference between total sales and the sales quota.
Whenever you define multiple CTEs in a single WITH clause, you can reference
preceding CTEs (but not the other way around).
Once I’ve defined my CTEs, I can reference them in the first
statement that follows the CTE, as you saw in the previous example. In this
case, I join the Sales.vSalesPerson view to cteTotalSales and then join to
cteTargetDiff, all based on the salesperson ID. My SELECT list then includes
columns from all three sources. The statement returns the results shown in the
following table.
As you can see, sales data is provided for all salespeople,
including the city in which they reside, their net sales, their sales quota,
and the calculated difference between the two figures. In this case, everyone
well exceeds the quota, where a quota has been defined.
Creating a Recursive Common Table
Expression
A recursive CTE is one that references itself within that
CTE. The recursive CTE is useful when working with hierarchical data because
the CTE continues to execute until the query returns the entire hierarchy.
A typical example of hierarchical data is a table that
includes a list of employees. For each employee, the table provides a reference
to that person’s manager. That reference is itself an employee ID within the
same table. You can use a recursive CTE to display the hierarchy of employee
data, as it would appear within the organizational chart.
Note that a CTE created incorrectly could enter an infinite
loop. To prevent this, you can include the MAXRECURSION hint in the OPTION
clause of the primary SELECT, INSERT, UPDATE, DELETE, or MERGE statement. For
information about using query hints, see the topic “Query Hints (Transact-SQL)”
in SQL Server Books Online.
To demonstrate how the recursive CTE works, I used the
following Transact-SQL statements to create and populate the Employees table in
the AdventureWorks2008 database:
IF OBJECT_ID('Employees', 'U') IS NOT NULL
DROP TABLE
dbo.Employees
GO
CREATE TABLE dbo.Employees
(
EmployeeID int NOT NULL PRIMARY KEY,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL,
ManagerID int NULL
)
GO
INSERT INTO Employees VALUES (101, 'Ken', 'Sánchez', NULL)
INSERT INTO Employees VALUES (102, 'Terri', 'Duffy', 101)
INSERT INTO Employees VALUES (103, 'Roberto', 'Tamburello', 101)
INSERT INTO Employees VALUES (104, 'Rob', 'Walters', 102)
INSERT INTO Employees VALUES (105, 'Gail', 'Erickson', 102)
INSERT INTO Employees VALUES (106, 'Jossef', 'Goldberg', 103)
INSERT INTO Employees VALUES (107, 'Dylan', 'Miller', 103)
INSERT INTO Employees VALUES (108, 'Diane', 'Margheim', 105)
INSERT INTO Employees VALUES (109, 'Gigi', 'Matthew', 105)
INSERT INTO Employees VALUES (110, 'Michael', 'Raheem', 106)
As you might realize, the AdventureWorks2008 database
already includes the HumanResources.Employee table. However, that table now
uses the hierarchyid data type to store hierarchical data, which would
introduce unnecessary complexity when trying to demonstrate a recursive CTE.
For that reason, I created my own table. However, if you want to try out a
recursive CTE without creating and populating a new table, you can use the
AdventureWorks sample database that shipped with SQL Server 2005. The
HumanResources.Employee table in that database stores the data in a way similar
to the table I create above.
After I created the Employees table, I created the following
SELECT statement, which is preceded by a WITH clause that includes a CTE named
cteReports:
WITH
cteReports (EmpID, FirstName, LastName, MgrID, EmpLevel)
AS
(
SELECT
EmployeeID, FirstName,
LastName, ManagerID,
1
FROM
Employees
WHERE
ManagerID IS NULL
UNION ALL
SELECT e.EmployeeID, e.FirstName, e.LastName, e.ManagerID,
r.EmpLevel + 1
FROM
Employees e
INNER
JOIN cteReports r
ON e.ManagerID = r.EmpID
)
SELECT
FirstName + ' ' + LastName AS FullName,
EmpLevel,
(SELECT FirstName + ' ' + LastName FROM Employees
WHERE
EmployeeID = cteReports.MgrID) AS Manager
FROM cteReports
ORDER BY
EmpLevel, MgrID
As you can see, the CTE returns five columns: EmpID,
FirstName, LastName, MgrID, and EmpLevel. The EmpLevel column refers to the
level in the hierarchy in which the employees fit. The highest level of the
hierarchy is 1, the next level is 2, followed by 3, and so on.
The CTE query is itself made up of two SELECT statements,
connected with the UNION ALL operator. A recursive CTE query must contain at
least two members (statements), connected by the UNION ALL, UNION, INTERSECT,
or EXCEPT operator. In this example, the first SELECT statement is the anchor
member, and the second statement is the recursive member. All anchor members
must precede the recursive members, and only the recursive members can reference
the CTE itself. In addition, all members must return the same number of columns
with corresponding data types.
Now lets look closer at the statements themselves. The first
statement, the anchor member, retrieves the employee ID, first name, last name,
and manager ID from the Employees table, where the manager ID is null. This
would be the employee at the top of the hierarchy, which means this person
reports to no one. Consequently, the manager ID value is null. To reflect that
this person is at the top of the hierarchy, I assign a value of 1 to the
EmpLevel column.
The second statement in the CTE query—the recursive
member—also retrieves the employee ID, first name, last name, and manager ID
for employees in the Employees table. However, notice that I join the Employees
table to the CTE itself. In addition, the join is based on the manager ID in
the Employees table and the employee ID in the CTE. By doing this, the CTE will
loop through the Employees table until it returns the entire hierarchy.
One other item to notice about the second statement is that,
for the EmpLevel column, I add the value 1 to the EmpLevel value as it appears
in the CTE. That way, each time the statement loops through the hierarchy, the
next correct level is applied to the employees at the level.
After I define my WITH clause, I create a SELECT statement
that retrieves the data from the CTE. Note, however, that for the Manager
column, I retrieve the first and last name of the employee associated with the
manager ID in the CTE. This allows me to display the full name of the manager
for each employee. The following table shows the result set returned by the
SELECT statement and its CTE.
As you can see, the CTE, whether recursive or nonrecursive,
can be a useful tool when you need to generate temporary result sets that can
be accessed in a SELECT, INSERT, UPDATE, DELETE, or MERGE statement. In a
sense, a CTE is like a derived table: it’s not stored as an object and is valid
only during the execution of the primary statement. However, unlike the derived
table, a CTE can be referenced multiple times within a query and it can be
self-referencing. And best of all, CTEs are relatively easy to implement.
No comments:
Post a Comment