3. Time Series Analysis#
3.1. Introduction#
Time series analysis is a powerful statistical tool used to model and predict future values from a series of data points ordered in time. This type of analysis is highly beneficial in numerous fields including finance, economics, climate studies, signal processing, and any domain where understanding changes over time is essential. The key aspects and models of time series analysis will be explored in this notebook including stationarity and trends, autoregressive models, moving average models, seasonal decomposition, and forecasting techniques.
3.2. Stationarity and Trends#
3.2.1. Theory#
A time series is stationary if its statistical characteristics like mean, variance, and autocorrelation are constant over time. Detecting and enforcing stationarity is often a preliminary step in time series analysis as many models assume the series is stationary.
Trends are long-term increases or decreases in the data, which make a series non-stationary. Trends can be deterministic, stochastic, or a combination of both.
To test for stationarity, we employ statistical tests like the Augmented Dickey-Fuller (ADF) Test. If the series is found to be non-stationary, techniques such as differencing, detrending, or transformation are used to convert it into a stationary series.
3.2.2. Examples#
Here’s a Python example demonstrating how to identify and transform non-stationary series:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
# Simulated time series data with trend
np.random.seed(0)
time = np.arange(100)
data = 0.5 * time + np.random.normal(size=100)
# Visualize the time series
plt.plot(time, data)
plt.title('Non-Stationary Time Series with Trend')
plt.show()
# Apply Augmented Dickey-Fuller test
result = adfuller(data)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
# Differencing to remove trend and achieve stationarity
data_diff = np.diff(data)
plt.plot(time[1:], data_diff)
plt.title('Stationary Time Series After Differencing')
plt.show()
3.3. Autoregressive Models#
3.3.1. Theory#
Autoregressive (AR) models predict future values based on past values linearly. The AR(p) model is defined by the equation:
[ X_t = c + \phi_1 X_{t-1} + \phi_2 X_{t-2} + … + \phi_p X_{t-p} + \epsilon_t ]
where ( X_t ) is the value at time ( t ), ( c ) is a constant, ( \phi_i ) are the parameters of the model, and ( \epsilon_t ) is white noise. Here, ( p ) is the order of the autoregression, which represents the number of lag observations included.
3.3.2. Examples#
Let’s construct and fit an autoregressive model using Python:
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tools.eval_measures import rmse
# Generate an AR(1) process
np.random.seed(0)
n = 100
phi = 0.5
data = np.zeros(n)
data[0] = np.random.normal()
for t in range(1, n):
data[t] = phi * data[t-1] + np.random.normal()
# Fit the AR model
model = AutoReg(data, lags=1)
model_fit = model.fit()
# Summarize the model
print(model_fit.summary())
# Predict the next 10 values
predictions = model_fit.predict(start=n, end=n+9)
print("Predictions:", predictions)
3.4. Moving Average Models#
3.4.1. Theory#
Moving Average (MA) models use past forecast errors in a regression-like model. An MA(q) model is defined as:
[ X_t = \mu + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + … + \theta_q \epsilon_{t-q} ]
where ( \mu ) is the mean of the series, ( \epsilon_t ) represents white noise, and ( \theta_i ) are the parameters of the model.
3.4.2. Examples#
Here�s how we can work with moving average models in Python:
from statsmodels.tsa.arima.model import ARIMA
# Generate an MA(1) process
np.random.seed(0)
size = 100
theta = 0.5
epsilon = np.random.normal(size=size)
data = np.zeros(size)
data[0] = epsilon[0]
for t in range(1, size):
data[t] = epsilon[t] + theta * epsilon[t-1]
# Fit the MA model
model = ARIMA(data, order=(0,0,1))
model_fit = model.fit()
# Summarize the model
print(model_fit.summary())
# Forecast the next 10 values
forecast = model_fit.forecast(steps=10)
print("Forecast:", forecast)
3.5. Seasonal Decomposition#
3.5.1. Theory#
Seasonal decomposition separates a time series into its seasonal, trend, and residual components. A widely used method for seasonal decomposition is the Seasonal-Trend decomposition using Loess (STL). This helps in understanding the underlying patterns of time series data.
3.5.2. Examples#
Here is how you can decompose a time series into its components using Python:
from statsmodels.tsa.seasonal import seasonal_decompose
# Example with time series data
np.random.seed(0)
time = pd.date_range('2021-01-01', periods=365, freq='D')
data = 10 + 0.1 * np.arange(365) + np.sin(2 * np.pi * np.arange(365) / 7) + np.random.normal(scale=0.5, size=365)
# Decompose the time series
result = seasonal_decompose(data, model='additive', period=7)
# Plot the decomposition
result.plot()
plt.show()
3.6. Forecasting Techniques#
3.6.1. Theory#
Forecasting involves predicting future values based on historical data. Several techniques exist, such as ARIMA models, Exponential Smoothing, and Machine Learning-based approaches.
ARIMA combines AR, MA, and differencing. The parameters (p, d, q) indicate the order of the autoregression, degree of differencing, and order of the moving average.
3.6.2. Examples#
Let’s use an ARIMA model to forecast future values:
from statsmodels.tsa.arima.model import ARIMA
# Generate a synthetic time series data
np.random.seed(0)
data = np.cumsum(np.random.normal(size=100))
# Fit an ARIMA model
model = ARIMA(data, order=(1,1,1))
model_fit = model.fit()
# Summarize the model
print(model_fit.summary())
# Forecast the next 10 values
forecast = model_fit.forecast(steps=10)
print("Forecast:", forecast)
# Plot the forecast
plt.plot(np.arange(100), data, label='Original')
plt.plot(np.arange(100, 110), forecast, label='Forecast', color='red')
plt.legend()
plt.show()