Recursive Common Table Expressions - The Recursive CTE

I've seen a lot of posts online on how recursive CTEs work. Almost all of those examples use the same set of data and the same scenario – organizational hierarchy for adventure works employees or something similar. It's not a bad example but I wanted to post something a little different. In this post we will demonstrate exactly how the recursive CTE works from simple to practical examples none of which use the organizational hierarchy examples you will find everywhere else.

First, how do we define a recursive CTE (Common Table Expression)?

A recursive CTE is a CTE that is self-referencing. The CTE is made up of an Anchor member query and a Recursive member query and recursion is made possible using a UNION ALL between the two members. The Recursive member query will always reference the 

The following example is the simplest way of demonstrating the way a recursive CTE works. All we are doing is taking a starting number (0) and adding 5 to it until we hit 50. The anchor query SELECTs the @StartingNumber variable, in this case 0. The recursive member query SELECTs FirstNumber and adds 5 to it (FirstNumber is the alias we defined for @StartingNumber in the anchor query). The anchor query returns the first result and then the recursive member uses that result to get the next result. The anchor result is 0, the recursive member adds 5 to that result for a total of 5.   

A way to visualize this is imagine you have a temp table associated with your anchor query. The anchor query writes the first result to this temp table and then the recursive member uses that result to produce the next row. The new row is inserted into the temp table and the recursive member keeps on processing. So 0 is the first row + 5 = 5. Now 5 is the next row, + 5 = 10. Now 10 is the next row, + 5 = 15. Now 15 is the next row, etc.

With this behavior in mind you can imagine how an incorrectly constructed query could run an infinite number of times. In our example below we have a WHERE clause in the recursive member that will limit our results to a max value of 50. */

DECLARE @StartingNumber int = 0
DECLARE @EndingNumber int = 50;

WITH CTEazyE AS (
-- Anchor member

   SELECT
      @StartingNumber AS FirstNumber -- FirstNumber is the alias we will reference in the recursive query

-- The union is required for recursion

   UNION ALL

--Recursive member

   SELECT
      FirstNumber + 5 -- Adds 5 to the FirstNumber value
   FROM CTEazyE
   WHERE FirstNumber < @EndingNumber ) -- Applying a filter to stop recursion.

SELECT *
FROM CTEazyE

Go ahead and make changes to the query above so you can see the different results. If you are comfortable with execution plans have a look at the execution plan, this will better explain the behavior of how the recursion works. Remember that temp table I mentioned earlier, that is going to be your table spool in the execution plan.

Now let's look at another example this time with dates. In the following query we want to return each month (date) for the last 12 months. The following query uses a start date, in this example 12 months ago today. It then adds one month to the anchor date to get the next date value. If your starting value is (today's date - 12 months), add 1 month to that and you should have (today's date - 11 months). It will keep doing this until it reaches the current date.

DECLARE @StartDate date = DATEADD(mm, -12, GETDATE())
DECLARE @Enddate date = GETDATE()

;WITH CTEazyE AS (
   SELECT
      @StartDate AS StartDate      

   UNION ALL

   SELECT
      DATEADD(mm,1,StartDate)
   FROM CTEazyE
   WHERE StartDate < @Enddate )          

SELECT *
FROM CTEazyE

Let's look at one more example. Say a developer approaches you and says they need to delete a row from a production table. You know this table has other tables referencing it with foreign keys but you don't know which ones off the top of your head. Of those tables referencing this main table it's also possible there are other tables referencing the tables that are referencing this main table. You don't use cascade delete so you need to delete records manually in each of these tables. You need to produce a Foreign Key chain for all tables involved in this delete.

In the script below we use the @PTable variable to define the target parent table--this is the table you want to delete the row from. The anchor query returns the object_id and table name of the target parent table. We define the rest of the columns and populate these columns with NULL values. These NULL fields will get actual values from the recursive member query each execution that follows the anchor. The key here is the INNER JOIN, we are joining the referenced_object_id field from the recursive query to the ObjectID of the anchor query. The referenced_object_id is the parent table in a foreign key. So each time we produce a new referenced_object_id we are going to get all objects that reference the that object id.

In the example below we are creating a few temp tables with FK constraints. Table 2 references table 1 and table 3 references table 2. This query will allow us to see the foreign key chain using our recursive CTE.   

USE <databaseName>;


CREATE TABLE t1 (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY, Blah NVARCHAR(MAX));
CREATE TABLE t2 (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY, t1ID INT NOT NULL, Blah NVARCHAR(MAX), CONSTRAINT FK_t2 FOREIGN KEY (t1ID) REFERENCES t1 (ID) );
CREATE TABLE t3 (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY, t2ID INT NOT NULL, Blah NVARCHAR(MAX), CONSTRAINT FK_t3 FOREIGN KEY (t2ID) REFERENCES t2 (ID) );

GO

DECLARE @PTable nvarchar(250) = 't1';

WITH CTEazyE AS (
   SELECT
      [object_id] AS ObjectID
      ,name AS TableName
      ,CAST(NULL AS sysname) AS ForeignKeyName
      ,NULL AS ReferencedTableID
      ,CAST(NULL AS sysname) AS ReferencedTableName
   FROM sys.tables
   WHERE name = @PTable

   UNION ALL

   SELECT
      fk.parent_object_id AS ID
     ,OBJECT_NAME(fk.parent_object_id) AS name
     ,fk.name
     ,ct.ObjectID AS ReferencedTableID
     ,ct.TableName AS ReferencedTableName
FROM sys.foreign_keys fk
INNER JOIN CTEazyE ct ON fk.referenced_object_id = ct.ObjectID )

SELECT DISTINCT
   ObjectID
   ,TableName
   ,ForeignKeyName
   ,ReferencedTableID
   ,ReferencedTableName
FROM CTEazyE
ORDER BY ReferencedTableName;

DROP TABLE t3;
DROP TABLE t2;
DROP TABLE t1;

In the results you should see the first row as the parent table and then t2 which references the parent table. Since t3 references t2 you will also see t3 in the results. The recursive CTE will keep going until it doesn't find any more constraints between tables. You can try this out on a table you know to see more meaningful results.

Recursive CTEs are a nice tool but need to be used with caution, like any recursive type of query.