LEAD
The
The
- return_value: This is the column we wish to return the lead value
- lead_value: An integer to specify the lead magnitude. i.e. 1 for lead by 1 row, 2 for lead by 2 rows
- default_value: A place-holder for next values outside of the range of the lead.
The syntax looks like this:
LEAD(return_column, lag_value, default_value )
LEAD by Example ;)
In the example below, I will query a customer database to return the the
SELECT
customer_id,
total_spent,
LEAD(total_spent, 1, 0) OVER() as lead_1,
LEAD(total_spent, 2, 0) OVER() as lead_2,
LEAD(total_spent, 4, 0) OVER() as lead_4
FROM customers
WHERE total_spent IS NOT NULL
LIMIT 20;
customer_id | total_spent | lead_1 | lead_2 | lead_4 -------------+-------------+---------+---------+--------- 1 | 2340.50 | 1567.89 | 890.25 | 1234.60 2 | 1567.89 | 890.25 | 3421.75 | 4567.30 3 | 890.25 | 3421.75 | 1234.60 | 678.90 4 | 3421.75 | 1234.60 | 4567.30 | 5 | 1234.60 | 4567.30 | 678.90 | 2876.45 6 | 4567.30 | 678.90 | | 1456.78 7 | 678.90 | | 2876.45 | 3210.55 8 | | 2876.45 | 1456.78 | 987.65 9 | 2876.45 | 1456.78 | 3210.55 | 1765.40 10 | 1456.78 | 3210.55 | 987.65 | 11 | 3210.55 | 987.65 | 1765.40 | 2345.80 12 | 987.65 | 1765.40 | | 3987.20 13 | 1765.40 | | 2345.80 | 1123.45 14 | | 2345.80 | 3987.20 | 2654.75 15 | 2345.80 | 3987.20 | 1123.45 | 876.30 16 | 3987.20 | 1123.45 | 2654.75 | 4321.90 17 | 1123.45 | 2654.75 | 876.30 | 1567.85 18 | 2654.75 | 876.30 | 4321.90 | 19 | 876.30 | 4321.90 | 1567.85 | 2890.40 20 | 4321.90 | 1567.85 | | 1678.55 (20 rows)
One observation that we can make is that, each lead value is derived from an offset of the total_spent. That is
LEAD() with PARTITION
The case example above is a simplistic one, without any partition or even order by. But of course, as with any window function, we can leverage partitions and order by clauses to get more specific outcomes. In the example below, we look at LEAD() with a partition and order by condition. Specifically, we are interested in determining the next best spending by each state for each customer. Here is the query to achieve this
SELECT
customer_id,
state,
total_spent,
LEAD(customer_id, 1, NULL) OVER (PARTITION BY state ORDER BY total_spent DESC) as second_highest_spend_customer,
LEAD(total_spent, 1, NULL) OVER (PARTITION BY state ORDER BY total_spent DESC) as second_highest_spend
FROM customers
WHERE total_spent IS NOT NULL
LIMIT 20;
customer_id | state | total_spent | second_highest_spend_customer | second_highest_spend -------------+-------+-------------+-------------------------------+---------------------- 31 | AZ | 2198.45 | 5 | 1234.60 5 | AZ | 1234.60 | | 16 | CA | 3987.20 | 33 | 3654.20 33 | CA | 3654.20 | 43 | 1987.45 43 | CA | 1987.45 | 32 | 1876.35 32 | CA | 1876.35 | 2 | 1567.89 2 | CA | 1567.89 | 10 | 1456.78 10 | CA | 1456.78 | 34 | 1234.75 34 | CA | 1234.75 | | 39 | CO | 2143.80 | 19 | 876.30 (10 rows)
In this case, notice that the data is ranked by states with customers who spent the highest. At each row, we have the next best spending and the id of the customer that is spend it. Using such a technique could allow you to compute the margin by spend to bender understand the value of customers.
In the next section, we look at the