Calculate Year Over Year Variance - Subtract From Previous Row Using LAG

Say you want to summarize a value by year and find the variance between each year (row). Imagine you have a table that tracks company sales. You need to summarize sales totals by year and then compare the values year over year and find the variance. Here is a simple way to do this using the LAG function.

* Note * If your data does not have contiguous dates you will need to write different queries to fill in the missing dates. This is known as the missing dates problem and solutions to that can be found HERE

IF OBJECT_ID('tempdb..#SalesData') IS NOT NULL
DROP TABLE #SalesData
DECLARE @StartDate date = '2013-01-01';

WITH rCTE AS (
   SELECT
      CAST(2679 AS decimal(7,2)) AS SalesAmount
     ,@StartDate AS SalesDate
     ,CAST('nadia' as varchar) AS SalesPerson
     ,0 AS TransID

   UNION ALL

   SELECT
      CAST(SalesAmount + (SELECT CAST(CRYPT_GEN_RANDOM(1) AS INT) + 3 - CAST(CRYPT_GEN_RANDOM(1) AS INT)) AS decimal(7,2))
    ,DATEADD(dd, + 3, SalesDate)
    ,REVERSE(SalesPerson)
    ,TransID + 1
   FROM rCTE
   WHERE TransID < 500 )

SELECT
   SalesAmount
  ,SalesDate
  ,SalesPerson
INTO #SalesData
FROM rCTE
WHERE SalesAmount > 1
OPTION (MAXRECURSION 600)

--Let's take a look at the data

SELECT * FROM #SalesData

Now that we have our sales data table, we are going to summarize the data by year and load this to a temp table. Notice the ORDER BY, this is important since we will be using the LAG function next.

SELECT DISTINCT
   YEAR(SalesDate) SalesYear
  ,SUM(SalesAmount) OVER (PARTITION BY YEAR(SalesDate)) AS TotalSalesByYear
INTO #SalesTotals
FROM #SalesData
ORDER BY SalesYear ASC;

After we load our temp table with the summarized annual sales, now we are going to use the LAG function to get the previous row for TotalSalesByYear --this will effectively create a new column for the previous year's sales. You can see in the LAG function below we use the TotalSalesByYear value and then you see a comma and the number 1. The number is the offset or how many rows back you want to look from the current row. You can also specify a default value after the offset, this is the value to return if that offset value returns NULL. By excluding that default value, the default value will return NULL, which is what we want so we don't get a divide by zero error when calculating the variance. Here is a link for LAG usage.

WITH CTEazyE AS (
   SELECT
      SalesYear
     ,TotalSalesByYear
     ,LAG(TotalSalesByYear, 1) OVER(ORDER BY SalesYear ASC) AS PreviousYearSales
   FROM #SalesTotals )    

-- Now we calculate the variance year over year.

SELECT
     SalesYear
    ,TotalSalesByYear
    ,PreviousYearSales
    ,(TotalSalesByYear - PreviousYearSales) AS VarianceAmount
    ,100 * (TotalSalesByYear - PreviousYearSales) / PreviousYearSales AS VariancePercent
FROM CTEazyE

There are other ways to calculate this but the intent of this post was to introduce you to the LAG (LEAD) function. Mind you, the LAG function is a SQL 2012 + feature.