Here I'm trying to write something partly based on Cont's first model in the previous post. I plan to skip the Laplace transform and go for Monte Carlo simulation. Also, I'm trying to abandon the assumption of unified order sizes. To implement that, I need to shift from a Markov chain which is supported by discrete spaces, onto some other stochastic process that is estimatable. Moreover, although I actually considered supervised learning for this problem, I gave it up at last. This is because my model is inherently designed for high frequency trading and thus training for several minutes each time would be intolerable.

# Section 1: Package Import

I need smm for multivariate stochastic processes, and scipy.optimize for maximum likelihood estimation.

1 2018-01-29 00:00:06.951631+08:00 12688.00 12663.58 12391.48 12390.00 12389.96 12388.00 12384.22 12381.39 12380.00 ... 6.0 15.0 1.0 460.0 4.0 121.0 5.0 1.0 5.0 120.0
2 2018-01-29 00:00:07.792882+08:00 12676.93 12657.04 12391.48 12390.00 12389.96 12388.00 12384.22 12381.39 12380.00 ... 1.0 400.0 363.0 5.0 6.0 15.0 1.0 460.0 4.0 121.0
3 2018-01-29 00:00:08.702945+08:00 12643.27 12617.26 12361.27 12360.00 12359.38 12358.06 12356.22 12355.44 12354.17 ... 6.0 15.0 1.0 460.0 4.0 121.0 5.0 1.0 5.0 120.0
4 2018-01-29 00:00:10.998615+08:00 12666.00 12642.73 12380.00 12377.00 12374.99 12369.73 12366.43 12365.84 12361.45 ... 460.0 4.0 121.0 5.0 1.0 5.0 120.0 150.0 12.0 97.0
5 2018-01-29 00:00:11.742304+08:00 12674.00 12643.27 12384.22 12381.39 12380.00 12377.00 12374.99 12369.73 12366.43 ... 4.0 121.0 5.0 60.0 1.0 5.0 120.0 150.0 12.0 97.0

Larger index means smaller values for both bid and ask prices. It's uncommon and here I re-indexed the variables s.t. bid_1 and ask_1 corresponds with the best opponent prices.

# Section 2: Data Mining

I dropped the time variable simply because I don't know how to use it. Normally there're two ways to handle uneven time-grids: resampling and ignoring, and I chose the latter.

Now we make a plot of the order book within the past 10 steps, including 20 bid levels and 20 ask levels.

Not sure if it tells any critical information. Let's make another plot. This time $t=500$ and we only consider the best bid and ask orders.

A simple idea would be inputting the prices and volumes in the current orderbook, and predict the future mid prices. Furthermore, it's ideal to have a rough expectation of the minimum time that the mid price crosses a certain price, or the time needed in expectation before my order got executed successfully.

The calculation of change took over 10 minutes. I don't think it's gonna be useful in real work. However, it's not so bad an idea to save it somewhere locally in case I need it later.

# Section 3: Model

After some research, I decided to fit the data in change to student's t-distribution, Skellam distribution, and two-side Weibull distribution. I'll now elaborate reason why I chose, and how to estimate each distribution below.

First is the t-distribution. It is well-known for its leptokurtosis which suits well in many financial time series as a better alternative to Normal distribution. The PDF and CDF of the t-distribution involves the Gamma function and thus would be computationally troublesome when we want to calculate the MSE of the parameters. However, notice for any r.v. $X\sim t(\nu,\mu,\sigma)$, we have relationship

$\text{Var}(X) = \begin{cases} \frac{\nu}{\nu - 2} & \text{for }\nu > 2,\\ \infty & \text{for }1 < \nu \le 2,\\ \text{undefined} & \text{otherwise} \end{cases}$

and

$\text{Kur}_+(X) = \begin{cases} \frac{6}{\nu - 4} & \text{for }\nu > 4,\\ \infty & \text{for }2 < \nu \le 4,\\ \text{undefined} & \text{otherwise} \end{cases}$

where $\text{Kur}_+\equiv \text{Kur} - 3$ is the excess kurtosis, we can simply go for moment estimation for t-distribution using empirical variance or kurtosis.

Second, the Skellam distribution. This is mainly due to the original model used in Cont's paper, where he assumes Poisson order arrivals uniformly over the time. Here I slightly improve the model s.t. bid and ask orders are modelled in the same time and represented by r.v. $S\equiv P_a - P_b$ where $P_a\sim Pois(\lambda_a)$ and $P_b\sim Pois(\lambda_b)$. This is therefore a discrete distribution with two parameters. scipy.stats has its PMF implemented and all I need to do is numerically maximize the likelihood.

For the two-sided Weibull distribution, it is given by

$Y = \begin{cases} -\text{Weibull}(\lambda_1, k_1) & \text{if } Y < 0,\\ \text{Weibull}(\lambda_2, k_2) & \text{otherwise} \end{cases}$

where shape parameters $k_{1,2}0$ and scale parameters $\lambda_{1,2} > 0$.

Therefore, the pdf is

$f(y \mid \lambda_1, k_1, \lambda_2, k_2) = \begin{cases} \left(\frac{-y}{(\lambda_1)}\right)^{k_1 -1}\exp\left(-\left(\frac{-y}{(\lambda_1)}\right)^{k_1}\right) & \text{if y < 0},\\ \left(\frac{y}{(\lambda_2}\right)^{k_2 -1}\exp\left(-\left(\frac{y}{(\lambda_2)}\right)^{k_2}\right) & \text{otherwise} \end{cases}$

and to normalize the integration to $1$, we also have

$\frac{\lambda_1}{k_1} + \frac{\lambda_2}{k_2} = 1 \Rightarrow \lambda_2 = k_2 (1 - \lambda_1 / k_1)$

which means there're in fact only three parameters to estimate.

Now we rewrite the log-likelihood as

\begin{align\*} LL = \sum_{i=1}^n \log(f(y_i)) = \sum_{i=1}^n &\left((k_1-1)(\log^\*(-y_i) - \log^\*(\lambda_1)) - (-y_i / \lambda_1)^{k_1}\right)\mathbb{I}_{y_i < 0} + \\ &\left((k_2-1)(\log^\*(y_i) - \log^\*(\lambda_2)) - (y_i / \lambda_2)^{k_2}\right)\mathbb{I}_{y_i \ge 0}. \end{align\*}

where we have the special $\log^*(y)\equiv 0$ if $y\le0$.

As coded above, at last I didn't include two-sided Weibull distribution because the optimization did not converge. In conclusion, for changes of order sizes (denoted by $x$), we use modified t-distribution with

$\hat{\mu} = \bar{x},\quad \hat{\sigma} = 0.3 \cdot \sqrt{\widehat{\text{Var}}(x)} + 0.7 \cdot \sqrt{\left(2 - \frac{6}{6 + 2\ \widehat{\text{Kur}}_+(x)}\right)}$

and

$\hat{\nu} = \frac{6}{\widehat{\text{Kur}}_+(x)} + 4$

where

$\widehat{\text{Kur}}_+(x) = \widehat{\text{Kur}}(x) - 3$

while

$\widehat{\text{Kur}}(x) = \hat{m}_4(x) / \hat{m}_2^2(x)$

and

$\hat{m}_4 = \sum_{i=1}^n (x_i - \bar{x})^4 / n,\quad \hat{m}_2 = \sum_{i=1}^n (x_i - \bar{x})^2 / n.$

Now, when we assume independence across different buckets of order book, we can estimate the parameters of t-distributions as below.

array([[ 5.1589201 , -0.52536232, 11.05729   ],
[ 5.86412495, -0.61454545, 12.08484143],
[ 5.82376701,  4.61231884, 11.67543236],
[ 6.28819815, -0.7173913 , 10.85941723],
[ 6.89178374, -1.59927798, 11.25140225],
[ 6.14231284,  2.29856115, 12.46686452],
[ 6.4347771 ,  2.22302158, 13.73785226],
[ 6.17737187,  0.67753623, 12.19098061],
[ 5.9250571 , -1.68231047, 12.54472066],
[ 5.16886809, -0.69090909, 11.94199489],
...
[ 5.94772822,  3.18181818, 12.4415555 ],
[ 6.5157695 ,  4.62181818, 13.67098387],
[ 6.69385395,  0.66304348, 13.63770319],
[ 4.99329442, -1.11510791, 11.63780506],
[ 5.04144977,  1.91756272, 11.20026029],
[ 5.47054269, -4.34163701, 10.66971035],
[ 5.11684414, -2.35460993,  9.98656422],
[ 4.89130697,  1.07092199, 11.5511127 ],
[ 5.31202782, -0.58865248, 11.01769165],
[ 5.17908162,  2.16961131, 10.81368767]])

When we do not ignore the correlation across all buckets, a multivariate t-distribution must be considered. Similar to multivariate Normal distributions, here we need to estimate a covariance matrix, a vector of expectations and a vector of degrees of freedom. Notice the degrees of freedom do not vary significantly across the rows in params, to accelerate computation I set a unified degree of freedom for all buckets, namely $df = 7$. Using Expectation Maximization (EM) algorithm introduced by D. Peel and G. J. McLachlan (2000), I wrote the model below to estimate this distribution.

Now the distribution for order size movement is estimated. We can simulate the trajectory and rebuild the order book in future several steps. Specifically, notice the predicted movement may well change the shape of the order book while, according to practical observation, the order book retains its "V"-shape in most of the time. Therefore, I sort up separately both halves of the order book every time they're updated by a predicted order size movement (or "co-movement", since it should be a vector).

Below is a simple sketch of this order book trajectory where I assign stronger color to the traces that are closer to the best (bid/ask) prices.

It can be seen from the figure that stronger traces are located more to the bottom, which validates our intuition since trades around the current price are more active than those to the left or the right of the order book.

# Section 4: Prediction

With this prediction procedure implemented, we can estimate the probability of our order (placed at the price bucket order_idx with size order_size) being executed within n_steps.

0.861

So a limit buy order at bid_8 ($20 - 12 = 8$) with size 100 can be executed within 10 steps, at a probability of 86.1%. Moreover, we can even make a 3D surface plot to get a comprehensive idea of the whole distribution. # Reference

• Rama Cont, Sasha Stoikov and Rishi Talreja. "A stochastic model for order book dynamics". Operations Research, Volume 58, No. 3 (2010), 549-563.
• David Peel, and Geoffrey J. McLachlan. "Robust mixture modelling using the t-distribution." Statistics and computing 10, no. 4 (2000): 339-348.