The ARMA model consists of two parts: Auto-Regressive and Moving Average. This is a powerful tool in predicting stationary time series. Today, we're going to apply it on the stock price of Apple Inc. We will perform the prediction mainly in four parts:

• Decomposition
• ARMA: fit and predict
• Evaluation

After the whole process, there're some TBD concerns.

Here we extract the historical data of AAPL from 2007-01-01 to 2009-01-01, which exactly covers the explosion of 2008 subprime crisis. We only consider adjusted close prices and volumes.

            Adj_Close    Volume
Date
2008-12-24  11.017744   67.8335
2008-12-26  11.117505   77.0812
2008-12-29  11.221153  171.5000
2008-12-30  11.179694  241.9004
2008-12-31  11.057907  151.8853
Volume       float64
dtype: object

As you can see, I here devide the volume by some properly large number so that the scales of the variables are similar.

        Adj_Close      Volume
count  504.000000  504.000000
mean    17.510924  264.179629
std      4.551519  111.138634
min     10.428248   67.833500
25%     12.748336  188.703025
50%     17.160162  240.869300
75%     21.891029  308.259350
max     25.889884  843.242400

## Decomposition

First we take the logarithm of the series. This is totally reversible and in many cases w.r.t. time series could efficiently improve the stationarity.

Then we decompose the series by three parts: trend, seasonality and residual.

The charts are plotted below.

As we can see, the seasonality is periodically significant. Also we may notice the trend behavior in late 2008: a continuous slump in the price and a lagged surge in the volume. The market panic is well illustrated. It's sure that the seasonality is stationary, but as for the residual, it remains to be tested. Now let's define a function to test the stationarity of the residuals. We use the Dickey-Fuller test to examine the existence of a unit root.

Now let's use our test function.

Results of Dickey-Fuller Test:
Test Statistic                -5.663466e+00
p-value                        9.265660e-07
#Lags Used                     1.600000e+01
Number of Observations Used    4.270000e+02
Critical Value (1%)           -3.445758e+00
Critical Value (5%)           -2.868333e+00
Critical Value (10%)          -2.570388e+00
dtype: float64

Since the test tatistic is way smaller than the 1% critical value (and the p-value also really small), we reject the null, i.e. we take it that the residual of adjusted close price is stationary under 1% significance level.

Results of Dickey-Fuller Test:
Test Statistic                -1.165042e+01
p-value                        2.043672e-21
#Lags Used                     0.000000e+00
Number of Observations Used    4.430000e+02
Critical Value (1%)           -3.445198e+00
Critical Value (5%)           -2.868086e+00
Critical Value (10%)          -2.570257e+00
dtype: float64

Similar to that of adjusted close price, the residual of volume is considered to be stationary under 1% significance level.

## ARMA: fit and predict

Now with stationary time series, we can start forecasting. There are two situations: - A strictly stationary series with no dependence on past values. - A series with significant dependence on past values. In this case we need to use some statistical models like ARMA to forecast the data.

Here we are of course using the latter one, and thus we need to specify the parameters of the model: - Number of AR (Auto-Regressive) terms ($p$): AR terms are just lags of dependent variable. For instance if $p$ is $5$, the predictors for $x\_t$ will be $x\_{t-1},x\_{t-2},\ldots,x\_{t-5}$. - Number of MA (Moving Average) terms ($q$): MA terms are lagged forecast errors in prediction equation. For instance if $q$ is $5$, the predictors for $x\_t$ will be $e\_{t-1},e\_{t-2},\ldots,e\_{t-5}$ where $e\_i$ is the difference between the moving average at $i^{th}$ instant and actual value.

An importance concern here is how to determine the values of $p$ and $q$. Below we plot the ACF and PACF charts to determine them. The rules are:

• $p$: The lag value where the PACF chart crosses the upper confidence interval for the first time.
• $q$: The lag value where the ACF chart crosses the upper confidence interval for the first time.

So first, let's plot the ACF and PACF charts.

From the charts it is clear that for $Adj\ Close$ we have $p=2$, $q=14$ and for $Volume$ we have $p=2$, $q=5$. Then we can load the ARIMA models for prediction. However, there're actually built-in method for that, which is quite more direct.

/Users/Allen_Frostline/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/Users/Allen_Frostline/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
...
/Users/Allen_Frostline/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)

(1, 0)

... and yes, unreliable and horribly slow. Just forget about that. Let's continue with the ARMA model.

/Users/Allen_Frostline/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
/Users/Allen_Frostline/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
"Check mle_retvals", ConvergenceWarning)

## Evaluation

Just as what I did above, I use adjusted R-square as the evaluation metric function. The "adjusted" means the R-square is punished for too many explanatory variables, and thus can in a way reduce the tendency of overfitting.

The results seem good, but never forget that this is only the in-sample prediction, so in other words this could be totally an opposite scenario when we use this fitted model to predict out-sample values, i.e. to predict the adjusted close or volume on 2009-01-02 (a Friday).

array([ 0.97023813,  0.97742186,  0.97331353,  0.97333901,  0.981861  ])
array([ 1.00925408,  1.01171945,  1.00450219,  0.99949544,  0.99683326])

## Conclusion & concerns

Another concern is (which is really obvious), that here when back-computing the predicted values I used the formular

$Y_{total} = Y_{trend} \cdot Y_{seasonality} \cdot Y_{residual}$

but only the residual is predicted using our ARMA model. The rest two parts, of cource, should also be predicted and then we can finally call it a "prediction". Otherwise we are only partially predicting the series. For the seasonality we may directly do a translation as the patterns are very stable. However, as for the trend composition, neither translation nor ARMA are going to work well. My suggestion is to resort to deep learning or some RNN models (like LSTM in another post of mine). Machine learning in the field of time series prediction is indeed promising and stunning in recent years.

Another concern, which is actually way more important, is that we lack out-sample prediction here. This is also called cross validation. Why is cross validation so important? Because if we don't do any cross validation, then we can always use as many explanatory variables as possible to achieve better scores in evaluation, which is most likely how overfitting happens. What we are gonna do to eliminate this, is to use historical data within time range, say, from 2009-01-01 to 2009-06-01, to predict the adjusted close prices and volumes day by day. The adjusted R-square is then the "test score" of the model, while implicitly the former R-square becomes the "train score".

References: