The Perfect Market Maker: Is it Possible?
In this research report we try to simulate and explore difference scenarios for a "perfect" market maker in the Bitcoin market. By "perfect" we refer to the capability to capture all spreads on the right side of any trade, i.e. there will be no spread loss at all. Although this setting is too perfect to be considered comparable with real trading, our analysis w.r.t. model parameters are believed to be insightful still.
- Self-explanatory backtest engine: the backtest engine is well encapsulated with limited public functions for people with little coding knowledge.
- High-speed simulation: simulation per each set of parameters costs only ~2 ms without altering trading details.
- Illustrated analysis: most analysis after parameter tuning is carried out with multiple (yet necessary) figures and elaborate explanation. Certain figures are made into animation for better understandability.
Import necessary modules and set up corresponding configurations. In this research notebook, we are using the following packages:
- numpy: mathematical tools & matrix processing
- pandas: data frame support
- matplotlib: plotting
- ipython: statistical analysis
- numba: accelerating pure numerical calculation
- ffmpeg: animation support (unnecessary for the rest of codes)
%config InlineB_smallackend.figure_format = 'retina'
In this section, several useful functions are introduced for later use during the backtest.
cumsum: It's a modified version of the original cumulative summation function, the summation now handles two boundaries during calculation. Calculation is accelerated by JIT.
sharpe_ratio: It's a handy function that calculates the annualized Sharpe ratio based on high-freq returns (returns are defined in percentage).
sortino_ratio: Similar as above, the function gives the annualized sortino ratio.
A backtest engine is designed for this problem.
We have the following parameters (the first two are datasets) for simulation:
- \(s\): target transaction size
- \(j\): max long position (in Bitcoin)
- \(k\): max short position (in Bitcoin)
Trade are participated only if:
- The trade price is at the current best bid or offer price,
- Trade quantity \(q>4s\)
- The new position \(x\) satisfies \(−k \le x \le j\)
be = BacktestEngine()
Specially, the class provides a great feature that prints the whole backtest process in an animation. In order to activate the feature, run command below
be.run(s=..., j=..., k=..., animation=True)
BacktestEngine has been accelerated largely thanks to the vectorization and JIT features. Per each loop the calculation performance is shown as below (~2 ms per
be = BacktestEngine()
%timeit be.run(s=0.001, j=0.010, k=0.010)
2.11 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In this part we try to load a small dataset and compare our results against the one in reference. The order book is as below.
be = BacktestEngine()
The trade data is as below.
Now, the backtest result is as below. Here we use \(s=0.01\), \(j=0.055\) and \(k=0.035\) like in the reference. We conclude that our model is valid as the result coincides with the one given.
T = be.run(0.01, 0.055, 0.035)
In this section, we opt for a simple grid search to find the best parameters of our strategy. There are several things to consider before we actually start searching.
Should we force \(j=k\)?
I believe the answer is yes. There is little reason we want to find the inter-relationship between the upper and lower bounds of our positions. Since we assume a short position yields direct cash out, we don't distinguish between a long and a short trade really. Of course market may have its trend, but theoretically we don't care about the result from searching on a \((j,k)\) grid.
Which metrics should we consider?
Like in most backtest scenarios, we use Sharpe and Sortino ratios as metrics. Besides these two, we also consider the final P&L as a crucial statistic here.
Hence, we run simulation on a \(100\times 100\) grid of \((s,j=k)\) grid and keep track of outstanding results. Then we filter these results by the three metrics and keep only the best \(10\) in all of the three.
be = BacktestEngine()
s_grid = np.arange(0.001, 0.101, 0.001)
Running 10000 simulations: best_pnl=366.5568, best_sr=14.1144, best_st=136.7319 | 100.00% finished, ETA=0 s
best_params = list(filter(lambda x: (np.nan not in x) and (min(x[-3:]) > 0),
(Record 0) s=0.006, j=0.780, k=0.780 | pnl=10.1482, sr= 0.0645, st=0.0905 (Record 1) s=0.005, j=0.690, k=0.690 | pnl=15.4882, sr= 5.8961, st=12.0982 (Record 2) s=0.006, j=0.830, k=0.830 | pnl=27.7178, sr= 5.4487, st=11.4175 (Record 3) s=0.005, j=0.790, k=0.790 | pnl=51.7044, sr= 5.9154, st=12.0906 (Record 4) s=0.002, j=0.520, k=0.520 | pnl=97.8652, sr=14.1144, st=136.7319
The best parameters, together with their corresponding performance metrics, are plotted as below. The left plot shows the relative performance from record \(0\) up to record \(4\) (we filtered away most records as they give negative returns), which is a monotonic-like one and we have record \(4\) an undoubtable winner. The right plot shows how our best \(5\) sets of parameters differ from each other. Despite the total P&L is increasing, the Sharpe ratios hardly changes -- this implies our search converged -- or more possibly, end up overfitted.
def lim_generator(values, extend=.05):
rec = np.arange(len(best_params))
Before a thorough parameter analysis, we can also view the backtest performance in animation (
ffmpeg required on your computer). It can be seen that our P&L has a rather similar trajectory comparing with the Bitcoin price, only that it's direction of movement is opposite to the second. This implies we're probably holding short positions most of the time.
In this section, we try to take an overall look on the whole parameter grid as well as the outputs. Here are several questions we intend to answer by the end of this part:
Is is true that performance is monotonic w.r.t. \(j\) (and \(k\))?
As we found in the previous section, with larger \(j\) (and \(k\)) values we have higher P&L and ratios. We will investigate into this issue here. The two plots below gives some insight on this question. As we can tell from the left figure below, the best performance from larger \(j\) (and \(k\)) values are indeed greater than those from smaller values, however, so are the worst results. This result coincides with the intuitive that larger position range means larger risk exposure over time, and therefore, more uncertainty in performance.
st001 = 
Does smaller \(s\) yields better performance?
Similar as above, this guess is suggested from our grid search. First, from the right figure above we may tell that smaller \(s\) yields a more volatile performance -- by volatile, it means we have more chance to attain better results. In the contrast, larger values give significantly more robust (yet around a negative Sortino ratio) performance and thus we conclude smaller \(s\) are more preferable.
Potential problems in the backtest?
There could be a lot of problems in fact, e.g. we are never a "perfect" market maker. But more severely, we may encounter some problem that we could've avoided, e.g. are we significantly biased to one side of trade, or, are we overfitting our model?
T = be.run(*record.records[:3])
The two figures above shows the progress of our position over time. The position is, by and large, negative throughout the day. This can be infered either from the left scatter plot or from the right histogram (which is extremely biased to the left). The positive skewness in our position is a ruthless indication that we've overfit the model, mostly due to limitation of data. On such a small dataset, overfitting is highly risky and likely without cross-validatin methods etc. A potential cure for this may be using a larger dataset, or try to k-fold the timespan for CV.
In the meantime, let's take a step back and analyze why the grid search gives us short position most time of the day. As far as I'm concerned, this is mainly because the profit we can obtain from taking a short position most of the time overwhelms that we can achieve from dynamic adjusting our side of trade and maintaining a neutral position. In a particular market like this given one where the general tendency of price is declining, a simple grid search ends up like this and we should've been aware of this before the whole analysis.
Numerically, the market making profit in this particular example is the price difference from each matched bid/ask and the corresponding mid price, which we used to calculate position market values. Under this setting, every trade we made we obtain a certain piece of revenue at no cost. The buy-and-hold profit, on the other hand, comes from holding a short position (in our story) and wait for the price to decline. We know the second profit is significantly larger than the first.
Theoretically, in order to fix this problem in its essence, we need to add one more parameter into our model, a parameter that rewards neutral positions or punishes holding a outstanding one. Available candidates include time-dollar product of a lasting position and moving averages of positions.
In this research, we tried to wrap up a simple backtest engine with a very special "perfect" market making setting. The setting is proved to be unrealistic but still provided a number of insights after detailed analysis. In the meantime, we may improve the model in a variety of ways based on the last sector Parameter Analysis.