Differencing a Time Series
Differencing is a common operation in time series analysis and forecasting, particularly in the evaluation of the stationarity and autocorrelation of a time series data.
Generating time series data
import numpy as np
import pandas as pd
index = pd.date_range('2022-01-01', '2022-01-14')
data = np.random.randint(2, 20, size=14)
ts_data = pd.DataFrame( data=data, index=index, columns=['sales'] )
ts_data
| sales | |
|---|---|
| 2022-01-01 | 6 |
| 2022-01-02 | 4 |
| 2022-01-03 | 10 |
| 2022-01-04 | 5 |
| 2022-01-05 | 7 |
| 2022-01-06 | 8 |
| 2022-01-07 | 15 |
| 2022-01-08 | 19 |
| 2022-01-09 | 14 |
| 2022-01-10 | 7 |
| 2022-01-11 | 13 |
| 2022-01-12 | 3 |
| 2022-01-13 | 17 |
| 2022-01-14 | 7 |
Implementing Differencing
The default difference is at lag 1, however, diff() access $n$ as the number of lags
ts_data['diff_lag1'] = ts_data.sales.diff()
ts_data
| sales | diff_lag1 | |
|---|---|---|
| 2022-01-01 | 6 | NaN |
| 2022-01-02 | 4 | -2.0 |
| 2022-01-03 | 10 | 6.0 |
| 2022-01-04 | 5 | 5.0 |
| 2022-01-05 | 7 | 2.0 |
| 2022-01-06 | 8 | 1.0 |
| 2022-01-07 | 15 | 7.0 |
| 2022-01-08 | 19 | 4.0 |
| 2022-01-09 | 14 | -5.0 |
| 2022-01-10 | 7 | -7.0 |
| 2022-01-11 | 13 | 6.0 |
| 2022-01-12 | 3 | -10.0 |
| 2022-01-13 | 17 | 14.0 |
| 2022-01-14 | 7 | -10.0 |