ADF Stationarity Test
To determine if a time series is stationary, we run the time series on a Augmented Dickey-Fuller test which tests the hypothesis that there is a unit root present in the time series. The idea is that if the time series has a unit root (null hypothesis), it is therefore dependent on the root and not times series, making it non-stationary. However, if it does not have a unit root, it only depends on time, it is therefore stationary.
from statsmodels.tsa.stattools import adfuller
import pandas as pd
import numpy as np
stock_data = pd.read_csv('stock_price.csv', parse_dates=['ts'], index_col=['ts'])
stock_data.head()
| ts | closed_price |
|---|---|
| 2012-03-15 | 57.24 |
| 2012-03-16 | 57.14 |
| 2012-03-19 | 56.55 |
| 2012-03-20 | 55.79 |
| 2012-03-21 | 55.99 |
Running the ADF Test
test_statistic, p_value = adfuller(stock_data)[:2]
test_statistic, p_value
(-0.7170386326564675, 0.8422866474115657)
In this case, the p-value is $>.05$ which tells us that the time series is not stationary. In this case, transformation on the original dataset is necessary.
stock_data['diff_lag1'] = stock_data.close_price.diff()
stock_data.head()
| ts | closed_price | diff_lag1 |
|---|---|---|
| 2012-03-15 | 57.24 | NaN |
| 2012-03-16 | 57.14 | -0.10 |
| 2012-03-19 | 56.55 | -0.59 |
| 2012-03-20 | 55.79 | -0.76 |
| 2012-03-21 | 55.99 | 0.20 |
test_statistic, p_value = adfuller(stock_data['diff_lag1'].dropna())[:2]
test_statistic, p_value
In this case, the p-value is $<.05$ which tells us that the differenced time series at $lag=1$ is stationary.