Time Series Analysis Using Python
2 min readOct 2, 2023
Time series analysis is a method of studying how a variable changes over time. It can be used for forecasting, anomaly detection, trend analysis, and more. In this post, I will show you how to perform time series analysis using code in Python.
Some steps to follow are:
- Import the necessary libraries, such as pandas, numpy, matplotlib, and statsmodels.
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
- Load the data into a pandas DataFrame and check its shape, columns, and summary statistics.
# Step 1: Load the data
data = pd.read_csv('time_series_data.csv')
print("Data Shape:", data.shape)
print("Columns:", data.columns)
print("Summary Statistics:")
print(data.describe())
- Plot the data to visualize the time series and identify any patterns or outliers.
# Step 2: Plot the data
plt.figure(figsize=(12, 6))
plt.plot(data['Date'], data['Value'])
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Time Series Data')
plt.show()
- Test the stationarity of the data using the Augmented Dickey-Fuller test or the KPSS test. Stationarity means that the mean, variance, and autocorrelation of the data do not change over time.
# Step 3: Test for stationarity (ADF test)
result = adfuller(data['Value'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
- If the data is not stationary, apply some transformations, such as differencing, logarithm, or seasonal adjustment, to make it stationary.
# Step 4: If not stationary, apply differencing
data['Differenced'] = data['Value'].diff().dropna()
# Step 5: Plot differenced data
plt.figure(figsize=(12, 6))
plt.plot(data['Date'], data['Differenced'])
plt.xlabel('Date')
plt.ylabel('Differenced Value')
plt.title('Differenced Time Series Data')
plt.show()
- Choose an appropriate model for the data, such as ARIMA, SARIMA, VAR, or LSTM. The model should capture the autocorrelation and seasonality of the data.
# Step 6: Choose an appropriate model (ARIMA in this example)
plot_acf(data['Differenced'], lags=30)
plot_pacf(data['Differenced'], lags=30)
plt.show()
# Step 7: Fit ARIMA model
model = ARIMA(data['Value'], order=(1, 1, 1))
results = model.fit()
- Fit the model to the data and check its performance using metrics such as AIC, BIC, RMSE, or MAPE.
# Step 8: Model evaluation
aic = results.aic
bic = results.bic
rmse = np.sqrt(mean_squared_error(data['Value'].iloc[1:], results.fittedvalues))
print('AIC:', aic)
print('BIC:', bic)
print('RMSE:', rmse)
- Use the model to make predictions for future values and plot them along with the actual data.
# Step 9: Use the model to make predictions
forecast_periods = 10
forecast, stderr, conf_int = results.forecast(steps=forecast_periods)
# Step 10: Plot predictions along with actual data
plt.figure(figsize=(12, 6))
plt.plot(data['Date'], data['Value'], label='Actual')
plt.plot(range(len(data), len(data) + forecast_periods), forecast, label='Forecast', color='red')
plt.fill_between(range(len(data), len(data) + forecast_periods),
conf_int[:, 0], conf_int[:, 1], color='pink', alpha=0.3, label='95% CI')
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Model Predictions')
plt.legend()
plt.show()
- Evaluate the accuracy and reliability of the predictions using confidence intervals and error analysis.
# Step 11: Evaluate prediction accuracy
prediction_error = data['Value'].iloc[-1] - forecast[0]
prediction_interval = conf_int[0][1] - conf_int[0][0]
print('Prediction Error:', prediction_error)
print('Prediction Interval:', prediction_interval)