Calculate Year Over Year Variance - Subtract From Previous Row Using LAG

Say you want to summarize a value by year and find the variance between each year (row). Imagine you have a table that tracks company sales. You need to summarize sales totals by year and then compare the values year over year and find the variance. Here is a simple way to do this using the LAG function.

* Note * If your data does not have contiguous dates you will need to write different queries to fill in the missing dates. This is known as the missing dates problem and solutions to that can be found HERE

IF OBJECT_ID('tempdb..#SalesData') IS NOT NULL
DROP TABLE #SalesData
DECLARE @StartDate date = '2013-01-01';

WITH rCTE AS (
   SELECT
      CAST(2679 AS decimal(7,2)) AS SalesAmount
     ,@StartDate AS SalesDate
     ,CAST('nadia' as varchar) AS SalesPerson
     ,0 AS TransID

   UNION ALL

   SELECT
      CAST(SalesAmount + (SELECT CAST(CRYPT_GEN_RANDOM(1) AS INT) + 3 - CAST(CRYPT_GEN_RANDOM(1) AS INT)) AS decimal(7,2))
    ,DATEADD(dd, + 3, SalesDate)
    ,REVERSE(SalesPerson)
    ,TransID + 1
   FROM rCTE
   WHERE TransID < 500 )

SELECT
   SalesAmount
  ,SalesDate
  ,SalesPerson
INTO #SalesData
FROM rCTE
WHERE SalesAmount > 1
OPTION (MAXRECURSION 600)

--Let's take a look at the data

SELECT * FROM #SalesData

Now that we have our sales data table, we are going to summarize the data by year and load this to a temp table. Notice the ORDER BY, this is important since we will be using the LAG function next.

SELECT DISTINCT
   YEAR(SalesDate) SalesYear
  ,SUM(SalesAmount) OVER (PARTITION BY YEAR(SalesDate)) AS TotalSalesByYear
INTO #SalesTotals
FROM #SalesData
ORDER BY SalesYear ASC;

After we load our temp table with the summarized annual sales, now we are going to use the LAG function to get the previous row for TotalSalesByYear --this will effectively create a new column for the previous year's sales. You can see in the LAG function below we use the TotalSalesByYear value and then you see a comma and the number 1. The number is the offset or how many rows back you want to look from the current row. You can also specify a default value after the offset, this is the value to return if that offset value returns NULL. By excluding that default value, the default value will return NULL, which is what we want so we don't get a divide by zero error when calculating the variance. Here is a link for LAG usage.

WITH CTEazyE AS (
   SELECT
      SalesYear
     ,TotalSalesByYear
     ,LAG(TotalSalesByYear, 1) OVER(ORDER BY SalesYear ASC) AS PreviousYearSales
   FROM #SalesTotals )    

-- Now we calculate the variance year over year.

SELECT
     SalesYear
    ,TotalSalesByYear
    ,PreviousYearSales
    ,(TotalSalesByYear - PreviousYearSales) AS VarianceAmount
    ,100 * (TotalSalesByYear - PreviousYearSales) / PreviousYearSales AS VariancePercent
FROM CTEazyE

There are other ways to calculate this but the intent of this post was to introduce you to the LAG (LEAD) function. Mind you, the LAG function is a SQL 2012 + feature. 

Running Totals Using the OVER() Clause with ORDER BY

Window functions and the OVER clause have been around since 2005, but back in 2005 you couldn't use ORDER BY for aggregate functions in the OVER clause. As a matter of fact it wasn't until SQL 2012 that they introduced the ability to use ORDER BY with aggregates in your OVER() clause. Here's how you can use the ORDER BY in the OVER() clause to get running totals.

-- Generate some test data

DECLARE @StartDate date = '2014-01-01'
DECLARE @EndDate date = '2014-12-31';
WITH rCTE AS (
   SELECT
      CAST((500 * RAND()) AS decimal(7,2)) AS SalesAmount
     ,@StartDate AS SalesDate
     ,0 AS Level

   UNION ALL

   SELECT
      CAST(SalesAmount + ((SELECT CHECKSUM(NewId()) % 5) * RAND()) AS decimal(7,2))
     ,DATEADD(dd, + 7, SalesDate)
     ,Level + 1
FROM rCTE
WHERE Level < 52)

SELECT SalesAmount, SalesDate
INTO #SalesData
FROM rCTE

In the first query below we use 3 different OVER clauses to partition the data different ways. In the first OVER clause we are partitioning by the month. Since our sales dates are by day and not by month we use the DATEPART function to get the month of the SalesDate and partitioning on that. This will give us the total sales for each month.

Next we want to get running totals by month. In the second partition we are again using the DATEPART function but this time we are using it with year instead of month and we are adding an ORDER BY using the DATEPART month. What this is doing is getting a total sales amount and then ordering by the month to give us running totals by month--the key here is the ORDER BY, this is what gives us running totals.

The last partition is the same as the second partition except it doesn't include an ORDER BY, so it shows us a total for the entire year.

We use DISTINCT in the SELECT to eliminate duplicate rows. The first query also has a WHERE clause to limit the year to 2014 only.

-- Running totals by month for a single year

SELECT DISTINCT
   DATEPART(YEAR, SalesDate) AS SalesYear
  ,DATEPART(MONTH, SalesDate) AS SalesMonth
  ,SUM(SalesAmount) OVER (PARTITION BY DATEPART(MM,SalesDate)) AS [Totals by Month]
  ,SUM(SalesAmount) OVER (PARTITION BY DATEPART(YY,SalesDate) ORDER BY DATEPART(MM,SalesDate)) AS [Running Totals by Month]
  ,SUM(SalesAmount) OVER (PARTITION BY DATEPART(YY,SalesDate)) AS [Total for the Year]
FROM #SalesData
WHERE DATEPART(YEAR, SalesDate) = 2014

Now we'll look at the same query but this time remove the year form the WHERE clause so we have months over multiple years. Since we are expanding to multiple years, if we want to get a simple total by month we would need to include both year and month in the PARTITION. For running totals, nothing changes from the first query.

-- Running totals by year and month over multiple years

SELECT DISTINCT
   DATEPART(YEAR, SalesDate) AS SalesYear
  ,DATEPART(MONTH, SalesDate) AS SalesMonth
  ,SUM(SalesAmount) OVER (PARTITION BY DATEPART(YEAR,SalesDate), DATEPART(MONTH,SalesDate)) AS [Totals by Month]
  ,SUM(SalesAmount) OVER (PARTITION BY DATEPART(YEAR,SalesDate) ORDER BY DATEPART(MM,SalesDate)) AS [Running Totals by Month]
  ,SUM(SalesAmount) OVER (PARTITION BY DATEPART(YEAR,SalesDate)) AS [Total for the Year]
FROM #SalesData

An easy way to get running totals in SQL 2012 +.