Differencing a Time Series

Differencing is a common operation in time series analysis and forecasting, particularly in the evaluation of the stationarity and autocorrelation of a time series data.

Generating time series data

import numpy as np
import pandas as pd
index = pd.date_range('2022-01-01', '2022-01-14')
data = np.random.randint(2, 20, size=14)

ts_data = pd.DataFrame( data=data, index=index, columns=['sales'] )
ts_data
sales
2022-01-01 6
2022-01-02 4
2022-01-03 10
2022-01-04 5
2022-01-05 7
2022-01-06 8
2022-01-07 15
2022-01-08 19
2022-01-09 14
2022-01-10 7
2022-01-11 13
2022-01-12 3
2022-01-13 17
2022-01-14 7

Implementing Differencing

The default difference is at lag 1, however, diff() access $n$ as the number of lags

ts_data['diff_lag1'] = ts_data.sales.diff()
ts_data
sales diff_lag1
2022-01-01 6 NaN
2022-01-02 4 -2.0
2022-01-03 10 6.0
2022-01-04 5 5.0
2022-01-05 7 2.0
2022-01-06 8 1.0
2022-01-07 15 7.0
2022-01-08 19 4.0
2022-01-09 14 -5.0
2022-01-10 7 -7.0
2022-01-11 13 6.0
2022-01-12 3 -10.0
2022-01-13 17 14.0
2022-01-14 7 -10.0