# Spread Trading: XOP and DRIP

In this strategy we try to do spread trading based on the `M`

-day (adjusted) returns of two highly related ETFs (exchange-traded funds). The intuition is to hedge the one-sided risks of buy-and-holding one specific ETF with (in expectation) increasing returns, by holding an opposite position of another ETF with decreasing returns. Once we have that the two ETFs' returns are highly correlated, we can trade and make profit by this sort of pair trading.

Apart from `M`

, we define trading thresholds `g`

and `j`

, together with stop-loss threshold `s`

. Total capital limit `K`

is assumed to be twice of `N`

, namely the 15-day rolling median volume (of the less liquid ETF). Specifically, we first calculate the array of daily minimums of the two (adjusted) volume series, and then calculate the 15-day rolling median of this series as `N`

. Apart from the capital limit, we also define daily position value (if any) based on `N`

, which is `N/100`

.

Specifically, for each trading day, we have workflow as below.

Apart from this process, we also keep track of our risk exposure with a stop-loss threshold, and try to do trading only within a month's time, i.e. we start trading only when it's the start of a new month, and kill any position every time it's the end of a month.

# Dependencies

Import necessary modules and set up corresponding configurations. In this research notebook, we are using the following packages:

**Quandl**: source of financial data**NumPy**: mathematical tools & matrix processing**Pandas**: data frame support**Matplotlib**: plotting**SciPy**: statistical analysis**StatsModels**: statistical analysis

1 | import warnings |

# Preparatory Analysis

In this part, we will try to analyze the economic and statistical features in the data. Here the two ETFs we're using are XOP and DRIP. Data is retrieved from Quandl from `2015-12-02`

to `2018-12-31`

. We'll use only data from `2015-12-02`

to `2016-12-01`

for this section, as we don't want to include future information during the backtest. Also, while it's always better to have longer historical data for analysis, due to limited length of ETF data on Quandl (specifically for these two ETFs) we're unfortunately restrained to this short timespan.

### Definition of XOP

The SPDR S&P Oil & Gas Exploration & Production ETF (XOP) seeks to provide investment results that, before fees and expenses, correspond generally to the total return performance of the S&P Oil & Gas Exploration & Production Select Industry Index. See here for more detailed description.

### Definition of DRIP

The Direxion Daily S&P Oil & Gas Exp. & Prod. Bull and Bear 3X Shares (DRIP) seek daily investment results, before fees and expenses, of **300% of the inverse**, of the performance of the S&P Oil & Gas Exploration & Production Select Industry Index. See here for more detailed description.

### Relationship of XOP and DRIP

By definition of the two ETFs, we expect DRIP to track -300% the daily return of XOP. This means the spread we should be tracking is, instead of the return difference between the two, the difference of -300% of `M`

-day returns of XOP and the `M`

-day returns of DRIP. Also, we are supposed to hold, if any, positions of these two ETFs in a ratio of XOP:DRIP = -3, no matter we're long or short in the spread.

A peek into the data (assume `M`

is 5.):

`DRIP`

: price of DRIP`XOP`

: price of XOP`rDRIP`

: 5-day return of DRIP`rXOP`

: 5-day return of XOP`rXOPn3`

: -300% of 5-day return of XOP`spread`

: spread of`rXOPn3`

from`rDRIP`

(`spread = rDRIP - rXOPn3`

)

The first few entries of our data reads:

1 | # exploratory settings |

Date | DRIP | XOP | rDRIP | rXOP | rXOPn3 | spread |
---|---|---|---|---|---|---|

2015-12-09 | 93.228406 | 31.481858 | 0.300551 | -0.091874 | 0.275621 | 0.024930 |

2015-12-10 | 87.783600 | 32.033661 | 0.175401 | -0.063402 | 0.190207 | -0.014806 |

2015-12-11 | 100.660997 | 30.465377 | 0.247456 | -0.084908 | 0.254725 | -0.007269 |

2015-12-14 | 107.982471 | 29.719958 | 0.113006 | -0.041524 | 0.124571 | -0.011565 |

2015-12-15 | 101.685757 | 30.310485 | 0.065597 | -0.028545 | 0.085635 | -0.020037 |

2015-12-16 | 108.538063 | 29.632831 | 0.164217 | -0.058733 | 0.176199 | -0.011983 |

2015-12-17 | 118.711577 | 28.616350 | 0.352321 | -0.106679 | 0.320036 | 0.032284 |

2015-12-18 | 125.551661 | 28.154731 | 0.247272 | -0.075845 | 0.227535 | 0.019737 |

2015-12-21 | 130.267899 | 27.862871 | 0.206380 | -0.062486 | 0.187459 | 0.018921 |

2015-12-22 | 125.008291 | 28.203374 | 0.229359 | -0.069518 | 0.208553 | 0.020806 |

Also we may plot the histogram of the spread. Here we plot it against the fitted normal and t distributions. Apparently the t distribution matches our spread data better, which coincides with our expectation as financial data is commonly seen with fat tails. Also, we may notice that the spread is well centered around zero, which reassures us that we can assume **symmetrical** thresholds for trading.

1 | fig = plt.figure(figsize=(20, 7.5)) |

In the second subplot, we can see that the spread series is quite "stationary" over time, but we'd better not stop just observing by eye. (Also it's a bit heteroskedastic, but we're not focusing on that in this research.)

### Statistical Tests

Below are some statistical tests we need to run through before actual pair traing. For detailed reasoning please refer to this post.

- Test for Unit Root: Here we use the Augmented Dickey-Fuller test. The p-value is 6.35e-14 \(\ll\) 0.05, so we may safely reject the null.

1 | result = adfuller(df.spread) |

```
ADF Statistic: -8.614239430241229
p-value: 6.353844261802846e-14
Critical Values:
1%: -3.458
5%: -2.874
10%: -2.573
```

- Test for Strong Stationarity: Here we opt for the Hurst Exponent. By definition, Hurst value H < 0.5 indicates the time series is mean-reverting, and our H value for the spread is -0.0390, so assumption is confirmed.

1 | def hurst(ts): |

`H: -0.0390`

Based on the previous two test results we conclude our spread is mean-reverting and the strategy is reasonable.

# Backtest Engine

In this part we design a simple backtest engine that takes ETF symbols, backtest timespan and the theoretical return ratio. It then provides an interface to run backtest against different parameters. I've encapsulated private methods/variables in the class `BacktestEngine`

and there are only three attributes available:

`BacktestEngine.symbols`

: tuple of ETF symbols`BacktestEngine.run`

: run backtest (returns Sortino ratio, Sharpe ratio, maximum drawdown and YoY return)`BacktestEngine.df`

: stores the data from backtest (trade log)

The basic usage of this engine would be

1 | be = BacktestEngine('DRIP', 'XOP', '2016-12-02', '2018-12-31', ratio=-3) |

and if you want to check the tradelog during the timespan, call `be.df`

. Note in this data frame, we denote the two ETFs by `X`

and `Y`

, and instead of the original `M`

-day return of `Y`

, we denote `rY`

as `ratio`

times the original `M`

-day returns. The positions of `X`

and `Y`

are also reported in `be.df`

together with daily and cumulative returns (in percentages of `K`

).

An example of this `be.df`

would be

Date | X | Y | rX | rY | spread | N | pX | pY | daily_rtn | cum_rtn |
---|---|---|---|---|---|---|---|---|---|---|

2016-12-22 | 12.267480 | 41.323160 | -0.019732 | -0.017567 | -0.002165 | 1.549400e+07 | 0 | 0 | 0.0 | 0.0 |

2016-12-23 | 12.208217 | 41.450761 | -0.017488 | -0.015711 | -0.001778 | 1.210184e+07 | 0 | 0 | 0.0 | 0.0 |

2016-12-27 | 12.000796 | 41.666701 | -0.018578 | -0.016343 | -0.002235 | 1.064284e+07 | 0 | 0 | 0.0 | 0.0 |

2016-12-28 | 12.445270 | 41.166112 | 0.003185 | 0.004286 | -0.001101 | 9.867562e+06 | 0 | 0 | 0.0 | 0.0 |

2016-12-29 | 12.692200 | 40.901094 | 0.017419 | 0.017180 | 0.000239 | 9.607867e+06 | 0 | 0 | 0.0 | 0.0 |

1 | class BacktestEngine: |

Here for illustration, we make a test run with parameters `M`

=5, `g`

=0.010, `j`

=0.005 and `s`

=0.01. The timespan, as required throughout the analysis, is set from `2016-12-02`

to `2018-12-31`

inclusive. The special meta parameter `ratio`

is set to -3.

1 | be = BacktestEngine('DRIP', 'XOP', start_date='2016-12-02', end_date='2018-12-31', ratio=-3) |

`Sortino Ratio=0.0574, Sharpe Ratio=0.0387, Maximum Drawdown=1.118e-11, YoY Return=0.09%`

With only a Sortino ratio of 0.0574, a Sharpe ratio of 0.0387 and YoY return of 0.09%, it's definitely not a good strategy. Not to mention the unsatisfactory return plots. The top right subplot together with the bottom left one suggests that we might be using too wide thresholds. In case of detailed analysis, we can also take a look at `be.df`

and specifically the trading days when we have non-zero positions, which turns out rather few (and supports our worry about wideness of thresholds):

1 | be.df.loc[be.df.pX != 0] |

Date | X | Y | rX | rY | spread | N | pX | pY | daily_rtn | cum_rtn | |
---|---|---|---|---|---|---|---|---|---|---|---|

2017-04-20 | 19.072870 | 34.413981 | 0.191975 | 0.178984 | 0.012991 | 1.533244e+07 | -25014 | -4643 | 0.000000 | 0.000000 | |

2017-04-21 | 18.845694 | 34.532005 | 0.097182 | 0.100743 | -0.003561 | 1.496846e+07 | -25014 | -4643 | 0.000011 | 0.000011 | |

2017-06-12 | 21.522415 | 32.279704 | -0.076695 | -0.055866 | -0.020829 | 9.166219e+06 | 13122 | 2984 | 0.000000 | 0.000056 | |

2017-08-04 | 22.747188 | 30.827585 | 0.142928 | 0.143423 | -0.000495 | 1.754893e+07 | -21483 | -5827 | 0.000000 | 0.000029 | |

2017-11-02 | 15.319535 | 34.463445 | -0.210285 | -0.227637 | 0.017352 | 1.302198e+07 | -26349 | -3734 | 0.000000 | 0.000259 | |

2017-12-26 | 11.398287 | 37.260895 | -0.221848 | -0.249496 | 0.027649 | 1.189655e+07 | -29352 | -3264 | 0.000000 | 0.000180 | |

2017-12-27 | 11.645217 | 36.963916 | -0.200136 | -0.218041 | 0.017905 | 1.351392e+07 | -29352 | -3264 | 0.000135 | 0.000315 | |

2017-12-28 | 11.418041 | 37.241096 | -0.152493 | -0.163117 | 0.010624 | 1.189655e+07 | -29352 | -3264 | 0.000092 | 0.000406 | |

2017-12-29 | 11.714357 | 36.805528 | -0.054226 | -0.042553 | -0.011673 | 1.246303e+07 | -29352 | -3264 | 0.000131 | 0.000537 | |

2018-02-05 | 14.331815 | 34.023830 | 0.357343 | 0.307833 | 0.049510 | 1.823786e+07 | -40842 | -5061 | 0.000000 | 0.000619 | |

2018-03-28 | 13.193795 | 33.978894 | 0.097942 | 0.103973 | -0.006031 | 2.054748e+07 | 52227 | 6579 | 0.000000 | 0.000305 | |

2018-04-03 | 12.630042 | 34.326023 | 0.045008 | 0.062801 | -0.017792 | 2.304128e+07 | 51093 | 6703 | 0.000000 | 0.000390 | |

2018-05-11 | 7.140870 | 40.931377 | -0.120585 | -0.125726 | 0.005141 | 3.427444e+07 | -150390 | -8464 | 0.000000 | 0.000188 | |

2018-08-23 | 6.237512 | 41.241724 | -0.144022 | -0.155054 | 0.011033 | 2.413513e+07 | -118650 | -5898 | 0.000000 | 0.000023 | |

2018-08-24 | 6.039495 | 41.778234 | -0.157459 | -0.177582 | 0.020123 | 2.205453e+07 | -118650 | -5898 | -0.000041 | -0.000018 | |

2018-08-27 | 5.960289 | 41.877588 | -0.146099 | -0.152580 | 0.006481 | 2.155575e+07 | -118650 | -5898 | 0.000098 | 0.000081 | |

2018-10-24 | 9.096368 | 35.500295 | 0.494290 | 0.398785 | 0.095505 | 3.827526e+07 | -146172 | -9913 | 0.000000 | 0.000239 | |

2018-11-12 | 9.106299 | 34.953065 | 0.168153 | 0.166935 | 0.001217 | 4.281623e+07 | 156027 | 11825 | 0.000000 | -0.001016 | |

2018-11-14 | 9.801436 | 34.107346 | 0.324832 | 0.280804 | 0.044028 | 4.281623e+07 | -141351 | -13473 | 0.000000 | -0.000185 | |

2018-11-15 | 9.364493 | 34.614778 | 0.136145 | 0.133480 | 0.002665 | 4.050823e+07 | -141351 | -13473 | 0.000016 | -0.000169 | |

2018-11-23 | 11.211572 | 32.336311 | 0.197243 | 0.197471 | -0.000228 | 4.050823e+07 | 120567 | 12081 | 0.000000 | 0.000222 | |

2018-12-06 | 11.797473 | 31.520441 | 0.111319 | 0.125227 | -0.013908 | 3.503895e+07 | 97920 | 10763 | 0.000000 | 0.001057 | |

2018-12-07 | 11.906709 | 31.381147 | 0.147368 | 0.155996 | -0.008628 | 3.503895e+07 | 97920 | 10763 | 0.000635 | 0.001692 | |

2018-12-12 | 12.989137 | 30.475730 | 0.209991 | 0.191626 | 0.018365 | 4.187946e+07 | -89073 | -12988 | 0.000000 | 0.002391 | |

2018-12-13 | 13.187748 | 30.286687 | 0.117845 | 0.117424 | 0.000421 | 3.936253e+07 | -89073 | -12988 | 0.000148 | 0.002539 | |

2018-12-17 | 16.375449 | 28.077868 | 0.248297 | 0.226081 | 0.022215 | 4.187946e+07 | -78804 | -13600 | 0.000000 | 0.002583 |

# Parameter Tuning

As mentioned above, in this section we try to fit the best set of parameters from `2015-12-02`

to `2016-12-01`

, i.e. the training set. As the focus of this report is not about efficient optimization, we opt for a simple grid search here. The parameter grids are defined as

`M_grid`

: 5, 10, 15, 20 (4 in total)`g_grid`

: 0.001, 0.003, ..., 0.011 (6 in total)`j_grid`

: -0.010, -0.008, ..., 0.010 (11 in total)`s_grid`

: 1e-3, 5e-3, 1e-2, 5e-2, 1e-1 (5 in total)

So no more than 1320 simulations are run. Note here parameter combinations where `-g < j < g`

does not hold are neglected. Below are a selection of outstanding parameter sets during simulation.

1 | from time import time |

```
(Record 0) M=15, g=0.007, j=-0.004, s=0.05000, st=1.4542 sr=0.3393, md=1.934e-10, rt= 8.99%
(Record 1) M=15, g=0.011, j=-0.010, s=0.05000, st=1.4591 sr=0.3430, md=1.673e-10, rt= 9.17%
(Record 2) M=20, g=0.007, j=-0.006, s=0.05000, st=1.5146 sr=0.3540, md=2.076e-10, rt= 9.47%
(Record 3) M=20, g=0.007, j= 0.006, s=0.05000, st=1.6001 sr=0.3534, md= 1.7e-10, rt= 9.33%
(Record 4) M=15, g=0.011, j=-0.004, s=0.05000, st=1.4680 sr=0.3452, md=1.673e-10, rt= 9.15%
```

From the two plots below, we can tell that Record 3, or the parameter set `M=20, g=0.007, j=0.006, s=0.05000`

, is a good choice as it has both large Sortino ratio/YoY return and a relatively small maximum drawdown. Record 2, or `M=20, g=0.007, j=-0.006, s=0.05000`

is also playing well among all outstanding parameter sets, with slightly better Sortino ratio and YoY returns but larger maximum drawdown. We'll test on both sets.

1 | rec = np.arange(len(record.best)) |

Using the parameters from Record 3, we run backtest against the test set, i.e. from `2016-12-02`

to `2018-12-31`

. The plots are as below.

1 | be = BacktestEngine('DRIP', 'XOP', start_date='2016-12-02', end_date='2018-12-31', ratio=-3) |

`Sortino Ratio=0.7407, Sharpe Ratio=0.2447, Maximum Drawdown=2.005e-11, YoY Return=1.76%`

Using the parameters from Record 3, the backtest result is as below.

1 | be = BacktestEngine('DRIP', 'XOP', start_date='2016-12-02', end_date='2018-12-31', ratio=-3) |

`Sortino Ratio=0.9394, Sharpe Ratio=0.2872, Maximum Drawdown=1.997e-11, YoY Return=2.24%`

# Conclusion

Both results are amazingly great (especially compared with our result using random parameters before any tuning). Considering here we're not utilizing any future data in backtest, the performance is satisfactory despite we're neglecting a lot executional details in our analysis, like transaction costs and market impacts. There are also several comments on the processing of data:

- We are using the first
`M`

days to calculate the rolling median of`N`

, which causes a loss in data. Perhaps we should use`M`

days of further previous historical data to make up this loss. - Thresholds change, over time. To be more specific, the "ideal" thresholds change all the time, partly due to market regime shifting and partly other unknown reasons. We may, therefore, never have the "best" parameters for trading. Should we go for a dynamic version of spread trading? I doubt so. But it's worth thinking if there's any remedy for this problem.