LEAD

The LEAD window function that returns the value from a specified offset location relative to the current row. In datasets where data is logged by a predefined sequence like time or dates, it can be used to compare current value to next data value in the sequence. Put simply, it returns the next value of a column from a table.

The LEAD() function takes 3 parameters:

  • return_value: This is the column we wish to return the lead value
  • lead_value: An integer to specify the lead magnitude. i.e. 1 for lead by 1 row, 2 for lead by 2 rows
  • default_value: A place-holder for next values outside of the range of the lead.

The syntax looks like this:

LEAD(return_column, lag_value, default_value )

LEAD by Example ;)

In the example below, I will query a customer database to return the the total_spend. I then will apply the LEAD() function with various lead values to demonstrate its outcome

SELECT 
    customer_id,
    total_spent,
    LEAD(total_spent, 1, 0) OVER() as lead_1, 
    LEAD(total_spent, 2, 0) OVER() as lead_2,
    LEAD(total_spent, 4, 0) OVER() as lead_4
FROM customers
WHERE total_spent IS NOT NULL
LIMIT 20;
 customer_id | total_spent | lead_1  | lead_2  | lead_4
-------------+-------------+---------+---------+---------
          1 |     2340.50 | 1567.89 |  890.25 | 1234.60
          2 |     1567.89 |  890.25 | 3421.75 | 4567.30
          3 |      890.25 | 3421.75 | 1234.60 |  678.90
          4 |     3421.75 | 1234.60 | 4567.30 |
          5 |     1234.60 | 4567.30 |  678.90 | 2876.45
          6 |     4567.30 |  678.90 |         | 1456.78
          7 |      678.90 |         | 2876.45 | 3210.55
          8 |             | 2876.45 | 1456.78 |  987.65
          9 |     2876.45 | 1456.78 | 3210.55 | 1765.40
         10 |     1456.78 | 3210.55 |  987.65 |
         11 |     3210.55 |  987.65 | 1765.40 | 2345.80
         12 |      987.65 | 1765.40 |         | 3987.20
         13 |     1765.40 |         | 2345.80 | 1123.45
         14 |             | 2345.80 | 3987.20 | 2654.75
         15 |     2345.80 | 3987.20 | 1123.45 |  876.30
         16 |     3987.20 | 1123.45 | 2654.75 | 4321.90
         17 |     1123.45 | 2654.75 |  876.30 | 1567.85
         18 |     2654.75 |  876.30 | 4321.90 |
         19 |      876.30 | 4321.90 | 1567.85 | 2890.40
         20 |     4321.90 | 1567.85 |         | 1678.55
    (20 rows)

One observation that we can make is that, each lead value is derived from an offset of the total_spent. That is lead_1 takes the value of the current row plus 1. Same as lead 2 and lead 4. One other way to see this is to observe the NULL values shift across the lead outcomes.

LEAD() with PARTITION

The case example above is a simplistic one, without any partition or even order by. But of course, as with any window function, we can leverage partitions and order by clauses to get more specific outcomes. In the example below, we look at LEAD() with a partition and order by condition. Specifically, we are interested in determining the next best spending by each state for each customer. Here is the query to achieve this

SELECT
    customer_id,
    state,
    total_spent,
    LEAD(customer_id, 1, NULL) OVER (PARTITION BY state ORDER BY total_spent DESC) as second_highest_spend_customer,
    LEAD(total_spent, 1, NULL) OVER (PARTITION BY state ORDER BY total_spent DESC) as second_highest_spend
FROM customers
WHERE total_spent IS NOT NULL
LIMIT 20;
customer_id  | state | total_spent | second_highest_spend_customer | second_highest_spend
-------------+-------+-------------+-------------------------------+----------------------
          31 | AZ    |     2198.45 |                             5 |              1234.60
           5 | AZ    |     1234.60 |                               |
          16 | CA    |     3987.20 |                            33 |              3654.20
          33 | CA    |     3654.20 |                            43 |              1987.45
          43 | CA    |     1987.45 |                            32 |              1876.35
          32 | CA    |     1876.35 |                             2 |              1567.89
           2 | CA    |     1567.89 |                            10 |              1456.78
          10 | CA    |     1456.78 |                            34 |              1234.75
          34 | CA    |     1234.75 |                               |
          39 | CO    |     2143.80 |                            19 |               876.30
(10 rows)

In this case, notice that the data is ranked by states with customers who spent the highest. At each row, we have the next best spending and the id of the customer that is spend it. Using such a technique could allow you to compute the margin by spend to bender understand the value of customers.

In the next section, we look at the LAG(), the direct opposite to the LEAD function