In this post, I'll introduce four stochastic processes commonly used to simulate stock prices. Formulation and Python implementation are presented one by one, with brief comments afterwards.
Before introducing the four methods, we first define a handy function to prepend a zero before a numpy array:
1  def prepend(arr, val=0): 
Brownian motion (BM) was initially exhibited and modeled in gas or liquid particle movements. The random motion was the solution to massive number of tiny collisions of particles in certain medium. Brownian motion was named after the botanist, Robert Brown, in 1827, and after almost a century introduced into the financial markets. People usually call a standard Brownian motion as a Wiener process, which has the following properties:
A general BM follows SDE:
\[\d S_t = \mu \d t + \sigma \d W_t\]
which directly gives solution by take integrals on both sides:
\[S_t = S_0 + \mu t + \sigma W_t.\]
With all of these given, we can easily simulate the process as below.
1  class BM: 
Comments: One may notice that we don't have any constraint posed on \(W\) — it can be however positive — and however negative. Therefore, while BM might give a simple and easytoimplement solution for a short run, we don't really want to use it as it potentially gives us negative prices.
Geometric Brownian Motion (GBM) was famous as been used in Fisher Black and Myron Scholes's 1973 paper, The Pricing of Options and Corporate Liabilities. The process is by definition positive and thus gives a fix for what we doubted in BM. The corresponding SDE is
\[\d S_t = S_t(\mu \d t + \sigma \d W_t)\]
which gives solution
\[S_t = S_0\exp\left\{\left(\mu\frac{\sigma^2}{2}\right)t + \sigma W_t\right\}.\]
1  class GBM: 
Comments: The GBM is good enough for most simulations. However, it is also wellknown that the BlackScholes model cannot give fat tails as is inspected empirically in stock markets.
Robert C. Merton, who shared the 1997 Nobel Price with Scholes (Black had passes away unfortunately), was one of the first academics to address some of the limitations in the GBM. In his 1976 paper, Option Pricing when Underlying Stock Returns are Discontinuous, he superimposed a "jump" component on the diffusion term so that the model can now simulate sudden economic shocks, i.e. jumps in prices. The jump \(J\) is given by the exponential of a compound Poisson process \(N\) with normal underlyings. The SDE is as follows.
\[\begin{align*}Y_i&\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\gamma, \delta^2)&\text{(Jump Magnitude)}\\\d N_t & \sim \text{Pois}(\lambda \d t)&\text{(Poisson Process)}\\J_t &= \textstyle{\sum_{i=1}^{N_t}}Y_i&\text{(Jump)}\\\d S_t &= S_t (\mu \d t + \sigma \d W_t + \d J_t).\\\end{align*}\]
Merton's jump diffusion SDE has a closedform solution:
\[S_t = S_0 \exp\left\{\left(\mu \frac{\sigma}{2}\right)t + \sigma W_t + J_t\right\}.\]
1  class MertonJump: 
Comments: Merton's jump process solved the kurtosis mismatch problem in empirical financial data by minimally changing the GBM. However, with the discontinuous (and usually negative, corresponding to market clashes) jumps introduced in the model, we may witness frequent slumps and in general a decline in the total drift. On the other, the jump process still did not solve the constant volatility issue.
In the early 1990's Steven Heston introduced this model where volatilities, different from the original GBM, are no longer constant. In the Heston model, volatilities evolve according to the CoxIngersollRoss process with a meanreverting essense. As there're now two stochastic processes, we need two (potentially correlated) Wiener processes. The SDE is now
\[\begin{align*}\d W_t^S\d W_t^V &= \rho\d t & \text{(Correlated Wiener)}\\\d V_t &= \kappa (\theta  V_t) \d t + \xi \sqrt{V_t} \d W_t^V &\text{(CoxIngersollRoss)}\\\d S_t &= \mu S_t \d t + \sqrt{V_t}S_t \d W_t^S & \text{(Heston)}\end{align*}\]
1  class Heston: 
Comments: The Heston model is one of the most popular stochastic volatility models in finance. In case one need even more freedom, he may opt for timevarying parameters e.g. \(\mu\to\mu_t\) and \(\xi\to\xi_t\).
Finally, let's take a look at all the simulated price processes. The Brownian motion is shifted s.t. \(S_0=1\) like the rest models. Mutual parameters like \(\mu\) and \(\sigma\) are set to the same values. Each figure shows \(1000\) paths.
]]>It's always been a headache to me that I cannot have my blog's search engine to show content I want — there're always something you don't want 'em to show up in a search result, like password protected posts (shown as encrypted codes) and random pages for a certain project (some even don't have a title, and this tipuesearch would still show them in the searching result — with a blank title and a bunch of html raw codes). Even worse, it seems there's no offical way to set this sort of content filters. This feels bad. This terrible feeling has tortured me for months till I made up my mind and fixed it from source codes today.
The fix turned out, well, quite straightforward. First, we locate the node package folder hexogeneratortipuesearchjson
. The package structure shows
node_modules└───hexogeneratortipuesearchjson ├───index.js ├───LICENSE ├───package.json ├───README.md └───node_modules └───...
The file we need to edit is index.js
. Below I've attached the full codes after modification:
1  var util = require('hexoutil'); 
Note the second line of the definition of postsContent
and the lines we comment out. These modifications are made such that encrypted posts and standalone pages won't be searched.
There is a piece of nicely given advice I'd like to share with you: do never post anything too large on your Hexo blog.
The suggestion given above was a joke to me until yesterday. I thought I can just wait for some more minutes and then everything will be fine. Pages will end up posted with probability one in the long run — well, they didn't make it this time. The html files were so big that GitHub returned a file oversize error and rejected my push from Hexo. Everything got messy and however I tried, deployment was always rejected.
If you also encounter this problem, well, lucky you, cause I've managed a fix to it. The first step would be locating the .git
folder in our hexo directory. Here I used the builtin command
1  find . name ".git" 
under the hexo
folder. The corresponding location was hexo/.deploy_git
. Then we enter this directory and go onto our step 2, which is basically git commit history reversion. First go to the GitHub website and find an earlier commit and copy its SHA codes. Then, in the terminal we opened, enter:
1  git reset hard {{SHA}} 
and then make a regular push.
]]>In this tiny piece of post I'm gonna post how you can make animations using the matplotlib
module in Python. Things get much more intuitive when they move, don't they?
We're here trying to plot two (thick) sine curves which have different offsets on the xaxis direction, and those offsets increases with an indicator called frame_no
.
1  import numpy as np 
The plot is altered bit by bit as frame_no
changes with frames. You can also try different fps configurations by changing the interval
argument. Also, repetition is set true by default, but you may still forbid it by specifying repeat
as false in FuncAnimation
. Finally, you would have a plot as follows:
Lovely, isn't it?
]]>Although it's not recomended, people sometimes need the variable names. For example, you want to automate the process of generating a dictionary with variable names as keys, or use variable names as columns names in a pandas dataframe. How are we gonna implement this in Python?
There is a nasty workaround provided somewhere on Stackoverflow (sorry but I forgot the actual thread):
1  def varName(p): 
foofarvarName
The method utilized the fact that Python stores all variables in the global()
dictionary where keys are corresponding id
values. Enjoy coding 🙃
In this research report we try to simulate and explore difference scenarios for a "perfect" market maker in the Bitcoin market. By "perfect" we refer to the capability to capture all spreads on the right side of any trade, i.e. there will be no spread loss at all. Although this setting is too perfect to be considered comparable with real trading, our analysis w.r.t. model parameters are believed to be insightful still.
Import necessary modules and set up corresponding configurations. In this research notebook, we are using the following packages:
1  %config InlineB_smallackend.figure_format = 'retina' 
In this section, several useful functions are introduced for later use during the backtest.
cumsum
: It's a modified version of the original cumulative summation function, the summation now handles two boundaries during calculation. Calculation is accelerated by JIT.sharpe_ratio
: It's a handy function that calculates the annualized Sharpe ratio based on highfreq returns (returns are defined in percentage).sortino_ratio
: Similar as above, the function gives the annualized sortino ratio.1 

A backtest engine is designed for this problem.
We have the following parameters (the first two are datasets) for simulation:
Trade are participated only if:
1  be = BacktestEngine() 
Specially, the class provides a great feature that prints the whole backtest process in an animation. In order to activate the feature, run command below
1  be.run(s=..., j=..., k=..., animation=True) 
1  class BacktestEngine: 
The BacktestEngine
has been accelerated largely thanks to the vectorization and JIT features. Per each loop the calculation performance is shown as below (~2 ms per be.run
).
1  be = BacktestEngine() 
1  %timeit be.run(s=0.001, j=0.010, k=0.010) 
2.11 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In this part we try to load a small dataset and compare our results against the one in reference. The order book is as below.
1  be = BacktestEngine() 
time  ask1price  ask1size  ask2price  ask2size  ask3price  ask3size  bid1price  bid1size  bid2price  bid2size  bid3price  bid3size 

20180408 17:08:00.246  7035.55  66.062339  7035.56  0.5  7035.57  1.50587  7035.54  5.582917  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:01.426  7035.55  66.062339  7035.56  0.5  7035.57  1.50587  7035.54  5.584317  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:08.293  7035.55  65.958939  7035.56  0.5  7035.57  1.50587  7035.54  5.584317  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:08.437  7035.55  65.958939  7035.56  0.5  7035.57  1.50587  7035.54  5.570860  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:08.485  7035.55  65.958939  7035.56  0.5  7035.57  1.50587  7035.54  5.591242  7035.53  0.00142  7035.5  0.011361 
The trade data is as below.
1  be.Z.head() 
time  price  size 

20180408 17:08:08.293  7035.55  0.1034 
20180408 17:08:13.472  7035.54  0.3900 
20180408 17:08:19.105  7035.55  0.1502 
20180408 17:08:20.858  7035.54  0.0630 
20180408 17:08:23.087  7035.54  0.1030 
Now, the backtest result is as below. Here we use \(s=0.01\), \(j=0.055\) and \(k=0.035\) like in the reference. We conclude that our model is valid as the result coincides with the one given.
1  T = be.run(0.01, 0.055, 0.035) 
time  trade  cash  position 

20180408 17:08:08.293  0.01  70.3555  0.01 
20180408 17:08:13.472  0.01  0.0001  0.00 
20180408 17:08:19.105  0.01  70.3556  0.01 
20180408 17:08:20.858  0.01  0.0002  0.00 
20180408 17:08:23.087  0.01  70.3552  0.01 
20180408 17:08:42.770  0.01  140.7106  0.02 
20180408 17:08:47.415  0.01  211.0660  0.03 
20180408 17:08:49.413  0.01  281.4214  0.04 
20180408 17:08:51.663  0.01  351.7768  0.05 
20180408 17:08:54.890  0.01  281.4213  0.04 
20180408 17:09:07.259  0.01  211.0658  0.03 
20180408 17:09:10.259  0.01  281.4212  0.04 
20180408 17:09:14.027  0.01  351.7766  0.05 
20180408 17:09:53.208  0.01  281.4866  0.04 
In this section, we opt for a simple grid search to find the best parameters of our strategy. There are several things to consider before we actually start searching.
I believe the answer is yes. There is little reason we want to find the interrelationship between the upper and lower bounds of our positions. Since we assume a short position yields direct cash out, we don't distinguish between a long and a short trade really. Of course market may have its trend, but theoretically we don't care about the result from searching on a \((j,k)\) grid.
Like in most backtest scenarios, we use Sharpe and Sortino ratios as metrics. Besides these two, we also consider the final P&L as a crucial statistic here.
Hence, we run simulation on a \(100\times 100\) grid of \((s,j=k)\) grid and keep track of outstanding results. Then we filter these results by the three metrics and keep only the best \(10\) in all of the three.
1  be = BacktestEngine() 
1  s_grid = np.arange(0.001, 0.101, 0.001) 
Running 10000 simulations: best_pnl=366.5568, best_sr=14.1144, best_st=136.7319  100.00% finished, ETA=0 s
1  best_params = list(filter(lambda x: (np.nan not in x) and (min(x[3:]) > 0), 
(Record 0) s=0.006, j=0.780, k=0.780  pnl=10.1482, sr= 0.0645, st=0.0905(Record 1) s=0.005, j=0.690, k=0.690  pnl=15.4882, sr= 5.8961, st=12.0982(Record 2) s=0.006, j=0.830, k=0.830  pnl=27.7178, sr= 5.4487, st=11.4175(Record 3) s=0.005, j=0.790, k=0.790  pnl=51.7044, sr= 5.9154, st=12.0906(Record 4) s=0.002, j=0.520, k=0.520  pnl=97.8652, sr=14.1144, st=136.7319
The best parameters, together with their corresponding performance metrics, are plotted as below. The left plot shows the relative performance from record \(0\) up to record \(4\) (we filtered away most records as they give negative returns), which is a monotoniclike one and we have record \(4\) an undoubtable winner. The right plot shows how our best \(5\) sets of parameters differ from each other. Despite the total P&L is increasing, the Sharpe ratios hardly changes  this implies our search converged  or more possibly, end up overfitted.
1  def lim_generator(values, extend=.05): 
1  rec = np.arange(len(best_params)) 
Before a thorough parameter analysis, we can also view the backtest performance in animation (ffmpeg
required on your computer). It can be seen that our P&L has a rather similar trajectory comparing with the Bitcoin price, only that it's direction of movement is opposite to the second. This implies we're probably holding short positions most of the time.
1  be.run(*best_params[4][:3], animation=True) 
In this section, we try to take an overall look on the whole parameter grid as well as the outputs. Here are several questions we intend to answer by the end of this part:
As we found in the previous section, with larger \(j\) (and \(k\)) values we have higher P&L and ratios. We will investigate into this issue here. The two plots below gives some insight on this question. As we can tell from the left figure below, the best performance from larger \(j\) (and \(k\)) values are indeed greater than those from smaller values, however, so are the worst results. This result coincides with the intuitive that larger position range means larger risk exposure over time, and therefore, more uncertainty in performance.
1  st001 = [] 
Similar as above, this guess is suggested from our grid search. First, from the right figure above we may tell that smaller \(s\) yields a more volatile performance  by volatile, it means we have more chance to attain better results. In the contrast, larger values give significantly more robust (yet around a negative Sortino ratio) performance and thus we conclude smaller \(s\) are more preferable.
There could be a lot of problems in fact, e.g. we are never a "perfect" market maker. But more severely, we may encounter some problem that we could've avoided, e.g. are we significantly biased to one side of trade, or, are we overfitting our model?
1  T = be.run(*record.records[4][:3]) 
The two figures above shows the progress of our position over time. The position is, by and large, negative throughout the day. This can be infered either from the left scatter plot or from the right histogram (which is extremely biased to the left). The positive skewness in our position is a ruthless indication that we've overfit the model, mostly due to limitation of data. On such a small dataset, overfitting is highly risky and likely without crossvalidatin methods etc. A potential cure for this may be using a larger dataset, or try to kfold the timespan for CV.
In the meantime, let's take a step back and analyze why the grid search gives us short position most time of the day. As far as I'm concerned, this is mainly because the profit we can obtain from taking a short position most of the time overwhelms that we can achieve from dynamic adjusting our side of trade and maintaining a neutral position. In a particular market like this given one where the general tendency of price is declining, a simple grid search ends up like this and we should've been aware of this before the whole analysis.
Numerically, the market making profit in this particular example is the price difference from each matched bid/ask and the corresponding mid price, which we used to calculate position market values. Under this setting, every trade we made we obtain a certain piece of revenue at no cost. The buyandhold profit, on the other hand, comes from holding a short position (in our story) and wait for the price to decline. We know the second profit is significantly larger than the first.
Theoretically, in order to fix this problem in its essence, we need to add one more parameter into our model, a parameter that rewards neutral positions or punishes holding a outstanding one. Available candidates include timedollar product of a lasting position and moving averages of positions.
In this research, we tried to wrap up a simple backtest engine with a very special "perfect" market making setting. The setting is proved to be unrealistic but still provided a number of insights after detailed analysis. In the meantime, we may improve the model in a variety of ways based on the last sector Parameter Analysis.
]]>In this strategy we try to do spread trading based on the M
day (adjusted) returns of two highly related ETFs (exchangetraded funds). The intuition is to hedge the onesided risks of buyandholding one specific ETF with (in expectation) increasing returns, by holding an opposite position of another ETF with decreasing returns. Once we have that the two ETFs' returns are highly correlated, we can trade and make profit by this sort of pair trading.
Apart from M
, we define trading thresholds g
and j
, together with stoploss threshold s
. Total capital limit K
is assumed to be twice of N
, namely the 15day rolling median volume (of the less liquid ETF). Specifically, we first calculate the array of daily minimums of the two (adjusted) volume series, and then calculate the 15day rolling median of this series as N
. Apart from the capital limit, we also define daily position value (if any) based on N
, which is N/100
.
Specifically, for each trading day, we have workflow as below.
Apart from this process, we also keep track of our risk exposure with a stoploss threshold, and try to do trading only within a month's time, i.e. we start trading only when it's the start of a new month, and kill any position every time it's the end of a month.
Import necessary modules and set up corresponding configurations. In this research notebook, we are using the following packages:
1  import warnings 
In this part, we will try to analyze the economic and statistical features in the data. Here the two ETFs we're using are XOP and DRIP. Data is retrieved from Quandl from 20151202
to 20181231
. We'll use only data from 20151202
to 20161201
for this section, as we don't want to include future information during the backtest. Also, while it's always better to have longer historical data for analysis, due to limited length of ETF data on Quandl (specifically for these two ETFs) we're unfortunately restrained to this short timespan.
The SPDR S&P Oil & Gas Exploration & Production ETF (XOP) seeks to provide investment results that, before fees and expenses, correspond generally to the total return performance of the S&P Oil & Gas Exploration & Production Select Industry Index. See here for more detailed description.
The Direxion Daily S&P Oil & Gas Exp. & Prod. Bull and Bear 3X Shares (DRIP) seek daily investment results, before fees and expenses, of 300% of the inverse, of the performance of the S&P Oil & Gas Exploration & Production Select Industry Index. See here for more detailed description.
By definition of the two ETFs, we expect DRIP to track 300% the daily return of XOP. This means the spread we should be tracking is, instead of the return difference between the two, the difference of 300% of M
day returns of XOP and the M
day returns of DRIP. Also, we are supposed to hold, if any, positions of these two ETFs in a ratio of XOP:DRIP = 3, no matter we're long or short in the spread.
A peek into the data (assume M
is 5.):
DRIP
: price of DRIPXOP
: price of XOPrDRIP
: 5day return of DRIPrXOP
: 5day return of XOPrXOPn3
: 300% of 5day return of XOPspread
: spread of rXOPn3
from rDRIP
( spread = rDRIP  rXOPn3
)The first few entries of our data reads:
1  # exploratory settings 
Date  DRIP  XOP  rDRIP  rXOP  rXOPn3  spread 

20151209  93.228406  31.481858  0.300551  0.091874  0.275621  0.024930 
20151210  87.783600  32.033661  0.175401  0.063402  0.190207  0.014806 
20151211  100.660997  30.465377  0.247456  0.084908  0.254725  0.007269 
20151214  107.982471  29.719958  0.113006  0.041524  0.124571  0.011565 
20151215  101.685757  30.310485  0.065597  0.028545  0.085635  0.020037 
20151216  108.538063  29.632831  0.164217  0.058733  0.176199  0.011983 
20151217  118.711577  28.616350  0.352321  0.106679  0.320036  0.032284 
20151218  125.551661  28.154731  0.247272  0.075845  0.227535  0.019737 
20151221  130.267899  27.862871  0.206380  0.062486  0.187459  0.018921 
20151222  125.008291  28.203374  0.229359  0.069518  0.208553  0.020806 
Also we may plot the histogram of the spread. Here we plot it against the fitted normal and t distributions. Apparently the t distribution matches our spread data better, which coincides with our expectation as financial data is commonly seen with fat tails. Also, we may notice that the spread is well centered around zero, which reassures us that we can assume symmetrical thresholds for trading.
1  fig = plt.figure(figsize=(20, 7.5)) 
In the second subplot, we can see that the spread series is quite "stationary" over time, but we'd better not stop just observing by eye. (Also it's a bit heteroskedastic, but we're not focusing on that in this research.)
Below are some statistical tests we need to run through before actual pair traing. For detailed reasoning please refer to this post.
1  result = adfuller(df.spread) 
ADF Statistic: 8.614239430241229pvalue: 6.353844261802846e14Critical Values: 1%: 3.458 5%: 2.874 10%: 2.573
1  def hurst(ts): 
H: 0.0390
Based on the previous two test results we conclude our spread is meanreverting and the strategy is reasonable.
In this part we design a simple backtest engine that takes ETF symbols, backtest timespan and the theoretical return ratio. It then provides an interface to run backtest against different parameters. I've encapsulated private methods/variables in the class BacktestEngine
and there are only three attributes available:
BacktestEngine.symbols
: tuple of ETF symbolsBacktestEngine.run
: run backtest (returns Sortino ratio, Sharpe ratio, maximum drawdown and YoY return)BacktestEngine.df
: stores the data from backtest (trade log)The basic usage of this engine would be
1  be = BacktestEngine('DRIP', 'XOP', '20161202', '20181231', ratio=3) 
and if you want to check the tradelog during the timespan, call be.df
. Note in this data frame, we denote the two ETFs by X
and Y
, and instead of the original M
day return of Y
, we denote rY
as ratio
times the original M
day returns. The positions of X
and Y
are also reported in be.df
together with daily and cumulative returns (in percentages of K
).
An example of this be.df
would be
Date  X  Y  rX  rY  spread  N  pX  pY  daily_rtn  cum_rtn 

20161222  12.267480  41.323160  0.019732  0.017567  0.002165  1.549400e+07  0  0  0.0  0.0 
20161223  12.208217  41.450761  0.017488  0.015711  0.001778  1.210184e+07  0  0  0.0  0.0 
20161227  12.000796  41.666701  0.018578  0.016343  0.002235  1.064284e+07  0  0  0.0  0.0 
20161228  12.445270  41.166112  0.003185  0.004286  0.001101  9.867562e+06  0  0  0.0  0.0 
20161229  12.692200  40.901094  0.017419  0.017180  0.000239  9.607867e+06  0  0  0.0  0.0 
1  class BacktestEngine: 
Here for illustration, we make a test run with parameters M
=5, g
=0.010, j
=0.005 and s
=0.01. The timespan, as required throughout the analysis, is set from 20161202
to 20181231
inclusive. The special meta parameter ratio
is set to 3.
1  be = BacktestEngine('DRIP', 'XOP', start_date='20161202', end_date='20181231', ratio=3) 
Sortino Ratio=0.0574, Sharpe Ratio=0.0387, Maximum Drawdown=1.118e11, YoY Return=0.09%
With only a Sortino ratio of 0.0574, a Sharpe ratio of 0.0387 and YoY return of 0.09%, it's definitely not a good strategy. Not to mention the unsatisfactory return plots. The top right subplot together with the bottom left one suggests that we might be using too wide thresholds. In case of detailed analysis, we can also take a look at be.df
and specifically the trading days when we have nonzero positions, which turns out rather few (and supports our worry about wideness of thresholds):
1  be.df.loc[be.df.pX != 0] 
Date  X  Y  rX  rY  spread  N  pX  pY  daily_rtn  cum_rtn  

20170420  19.072870  34.413981  0.191975  0.178984  0.012991  1.533244e+07  25014  4643  0.000000  0.000000  
20170421  18.845694  34.532005  0.097182  0.100743  0.003561  1.496846e+07  25014  4643  0.000011  0.000011  
20170612  21.522415  32.279704  0.076695  0.055866  0.020829  9.166219e+06  13122  2984  0.000000  0.000056  
20170804  22.747188  30.827585  0.142928  0.143423  0.000495  1.754893e+07  21483  5827  0.000000  0.000029  
20171102  15.319535  34.463445  0.210285  0.227637  0.017352  1.302198e+07  26349  3734  0.000000  0.000259  
20171226  11.398287  37.260895  0.221848  0.249496  0.027649  1.189655e+07  29352  3264  0.000000  0.000180  
20171227  11.645217  36.963916  0.200136  0.218041  0.017905  1.351392e+07  29352  3264  0.000135  0.000315  
20171228  11.418041  37.241096  0.152493  0.163117  0.010624  1.189655e+07  29352  3264  0.000092  0.000406  
20171229  11.714357  36.805528  0.054226  0.042553  0.011673  1.246303e+07  29352  3264  0.000131  0.000537  
20180205  14.331815  34.023830  0.357343  0.307833  0.049510  1.823786e+07  40842  5061  0.000000  0.000619  
20180328  13.193795  33.978894  0.097942  0.103973  0.006031  2.054748e+07  52227  6579  0.000000  0.000305  
20180403  12.630042  34.326023  0.045008  0.062801  0.017792  2.304128e+07  51093  6703  0.000000  0.000390  
20180511  7.140870  40.931377  0.120585  0.125726  0.005141  3.427444e+07  150390  8464  0.000000  0.000188  
20180823  6.237512  41.241724  0.144022  0.155054  0.011033  2.413513e+07  118650  5898  0.000000  0.000023  
20180824  6.039495  41.778234  0.157459  0.177582  0.020123  2.205453e+07  118650  5898  0.000041  0.000018  
20180827  5.960289  41.877588  0.146099  0.152580  0.006481  2.155575e+07  118650  5898  0.000098  0.000081  
20181024  9.096368  35.500295  0.494290  0.398785  0.095505  3.827526e+07  146172  9913  0.000000  0.000239  
20181112  9.106299  34.953065  0.168153  0.166935  0.001217  4.281623e+07  156027  11825  0.000000  0.001016  
20181114  9.801436  34.107346  0.324832  0.280804  0.044028  4.281623e+07  141351  13473  0.000000  0.000185  
20181115  9.364493  34.614778  0.136145  0.133480  0.002665  4.050823e+07  141351  13473  0.000016  0.000169  
20181123  11.211572  32.336311  0.197243  0.197471  0.000228  4.050823e+07  120567  12081  0.000000  0.000222  
20181206  11.797473  31.520441  0.111319  0.125227  0.013908  3.503895e+07  97920  10763  0.000000  0.001057  
20181207  11.906709  31.381147  0.147368  0.155996  0.008628  3.503895e+07  97920  10763  0.000635  0.001692  
20181212  12.989137  30.475730  0.209991  0.191626  0.018365  4.187946e+07  89073  12988  0.000000  0.002391  
20181213  13.187748  30.286687  0.117845  0.117424  0.000421  3.936253e+07  89073  12988  0.000148  0.002539  
20181217  16.375449  28.077868  0.248297  0.226081  0.022215  4.187946e+07  78804  13600  0.000000  0.002583 
As mentioned above, in this section we try to fit the best set of parameters from 20151202
to 20161201
, i.e. the training set. As the focus of this report is not about efficient optimization, we opt for a simple grid search here. The parameter grids are defined as
M_grid
: 5, 10, 15, 20 (4 in total)g_grid
: 0.001, 0.003, ..., 0.011 (6 in total)j_grid
: 0.010, 0.008, ..., 0.010 (11 in total)s_grid
: 1e3, 5e3, 1e2, 5e2, 1e1 (5 in total)So no more than 1320 simulations are run. Note here parameter combinations where g < j < g
does not hold are neglected. Below are a selection of outstanding parameter sets during simulation.
1  from time import time 
(Record 0) M=15, g=0.007, j=0.004, s=0.05000, st=1.4542 sr=0.3393, md=1.934e10, rt= 8.99%(Record 1) M=15, g=0.011, j=0.010, s=0.05000, st=1.4591 sr=0.3430, md=1.673e10, rt= 9.17%(Record 2) M=20, g=0.007, j=0.006, s=0.05000, st=1.5146 sr=0.3540, md=2.076e10, rt= 9.47%(Record 3) M=20, g=0.007, j= 0.006, s=0.05000, st=1.6001 sr=0.3534, md= 1.7e10, rt= 9.33%(Record 4) M=15, g=0.011, j=0.004, s=0.05000, st=1.4680 sr=0.3452, md=1.673e10, rt= 9.15%
From the two plots below, we can tell that Record 3, or the parameter set M=20, g=0.007, j=0.006, s=0.05000
, is a good choice as it has both large Sortino ratio/YoY return and a relatively small maximum drawdown. Record 2, or M=20, g=0.007, j=0.006, s=0.05000
is also playing well among all outstanding parameter sets, with slightly better Sortino ratio and YoY returns but larger maximum drawdown. We'll test on both sets.
1  rec = np.arange(len(record.best)) 
Using the parameters from Record 3, we run backtest against the test set, i.e. from 20161202
to 20181231
. The plots are as below.
1  be = BacktestEngine('DRIP', 'XOP', start_date='20161202', end_date='20181231', ratio=3) 
Sortino Ratio=0.7407, Sharpe Ratio=0.2447, Maximum Drawdown=2.005e11, YoY Return=1.76%
Using the parameters from Record 3, the backtest result is as below.
1  be = BacktestEngine('DRIP', 'XOP', start_date='20161202', end_date='20181231', ratio=3) 
Sortino Ratio=0.9394, Sharpe Ratio=0.2872, Maximum Drawdown=1.997e11, YoY Return=2.24%
Both results are amazingly great (especially compared with our result using random parameters before any tuning). Considering here we're not utilizing any future data in backtest, the performance is satisfactory despite we're neglecting a lot executional details in our analysis, like transaction costs and market impacts. There are also several comments on the processing of data:
M
days to calculate the rolling median of N
, which causes a loss in data. Perhaps we should use M
days of further previous historical data to make up this loss.After messing up with my Python virtualenv
my computer finally started going nuts. Jupyter notebook threw me the following error every time I start it:
1  zsh: /usr/local/bin/jupyter: bad interpreter: /usr/local/opt/python/bin/python3.7: no such file or directory 
It turns out that the cause was that along with reinstallation of Python, the homebrew symlinks to Jupyter are now broken. A simple solution would be
1  rm '/usr/local/bin/jupyter' 
Then the notebook starts just fine.
]]>I wrote a poker game.
This is a simple Texas Hold 'em game running on Mac OS. All scripts are written in pure Python. The main GUI is written using the Python module PySimpleGUI
, and hand evaluation is done by refering to a hand value table precalculated together with Monte Carlo simulation. See here for detailed explanation of hand evaluation. Together with the GUI version, I also include here a primitive commmandline version with ColorPrint
support, which you may download and include from this repo. The two versions are supposed to work identically.
You don't need any Python or module dependencies installed on your Mac in order to just play the game. The app itself is standalone with everything packed inside it already. There're just two steps:
Or rather, if you'd like to pack it yourself:
PyInstaller
in onefile
mode.There're several things to work on in plan:
PySimpleGUI
, MikeTheWatchGuy).Here are some bugs I'm trying to fix:
Popup
element in PySimpleGUI
.I appreciate suggestions and encourage from anyone throughout the development (which may still continue for a long time, considering the considerable time I spent just on writing this primitive game). Special thanks to my friends who ever tried to play the game and found bugs starting from the commmandline version. Also, credit for MikeTheWatchGuy who wrote the PySimpleGUI
module and helped me fix several bugs. Also, credit to Freepik from www.flaticon.com, who made this fantastic icon. Finally, I wanna give credit to myself for the nights I stayed up after lectures. There is nothing more fulfilling than realizing an impulse right away.
In the previous post, we considered the probabilities of making one specific hand with the turn/river card. This can be rather useful in specific situations, but still cannot apply thoughout a game. Poker is essentially an incomplete information game. Different from Go, where you can see all stones placed on the chessboard and thereby "solve" an optimal move, you never know you opponents' pocket cards until showdown (yet even then, people mucks). Also, you have little clue on the unshown community cards. Therefore, in order to evaluate a hand during a poker game, we'd better opt for a online evaluation algorithm instead of considering this as a DPlike problem.
For the sake of convenience, the table of hand values is shown again below.
Name  Description  Example 

High Card  Simple value of the card. Lowest: deuce; highest: ace.  As 4s 7h Td 2c 
Pair  Two cards with the same value.  As 4s 7h Td Ac 
Two Pairs  Two pairs where each pair of cards have the same value.  As 4s 4h Td Ac 
Three of a Kind  Three cards with the same value.  As 4s 4h 4d 2c 
Straight  Five cards in consecutive values (ace can precede deuce or follow up king).  9s Ts Jh Qd Kc 
Flush  Five cards of the same suit.  Ah 4h 7h Th 2h 
Full House  Three of a kind with the rest two making a pair.  As 4s 4h 4d Ac 
Four of a Kind  Four cards of the same value.  As 4s 4h 4d 4c 
Straight Flush  Straight of the same suit.  9h Th Jh Qh Kh 
Royal Flush  Straight flush from ten to ace.  Th Jh Qh Kh Ah 
Our ultimate goal is to be able to evaluate the probability of a win (and tie), i.e. the relative strength of our hand. However, let's just try to get one step back and consider evaluating the absolute strengths. Here we denote it as the hand value. Ideally, there are \(\binom{52}{5}=2,598,960\) possible hands but much less valid values. An intuitive idea to match all these hands to their values, which is also what I did, is to first generate a sparse mapping from hands to values, and then condense it. First we need a function that identifies hand types. Then, for hands within the same type, we encode them like a carry system (e.g. decimal); for hands across different types, we manually add offsets so that higher hand types always yield higher values. The final results are stored in the dictionary hv
and serialized locally. The highest value is 6144 and here is a glance of hv
:
1  { 
Now that we have the full mapping from hands to values, there are two things we can do to calculate the probabilities:
Things are gonna be much easier when we only consider the twoplayer case (a.k.a. headsup games). In that case, we can literally count and evaluate all scenarios — when there're 2 pocket cards for me and 3 community cards shown, we only need to enumerate \(\binom{525}{2}=1,081\) opponent hands, which is lightning fast to finish with modern programming languages like Python or C++. However, when we have more players, like 5, the first method gets nasty. The hands of each opponent are not independent, so we have to go through \(\binom{525}{2\times 4} > 3\times 10^8\) situations and that, different to the headsup scenario, would be unacceptably slow no matter which language we use. Therefore, we opt for the second method at some cost of precision. The code (partial) is shared below.
1  def handEval(my_pocket, community, n_players, hv, 
Below are two example tests.
1  Pocket: Tc Jd 
P(win) = 5.4688% P(tie) = 0.4883%
1  Pocket: Tc Jd 
P(win) = 1.9531% P(tie) = 0.7812%
Note here I also implement an interesting parameter called ranges
which represents the opponents prior ranges at preflop. When passing an empty value, all combinations of two cards are considered. When we specify a list of ranges (numbers from 0 to 1, say \(x\%\)), then the opponents are assumed to only play when their pocket cards are at least in the top \(x\%\) pairs of all pairs. See this table for more reasoning.
In this post we're gonna introduce one of the most widelyused results in hold 'em: the odds chart.
Before showing the odds chart, we first give the mathematical definition of odds. Here we're not focusing on winning a hand, but instead our intended issue is whether we can make an expected hand with the forthcoming unshown card(s). We call the probability of doing that as the improving probability, and define its corresponding odds as
\[\text{odds} = \frac{1}{\text{improve}\%}  1\]
which means we can bet every \(\$1\) against any pot larger or equal to this amount.
Now we try to calculate these probabilities and odds. We here only consider one card to expect and one/two community cards to unveil, namely odds on the river or turn. For example, when we're expecting any of 8 cards on the turn to make a straight, then the improving probability in this case would be
\[\text{improve}\% = 1  \frac{45 + 2  8}{45 + 2}\times \frac{45 + 1  8}{45 + 1} = \frac{340}{1081} \approx 31.45\%\]
which means we have \(31.45\%\) chance to make it and the odds, therefore, is \(1/31.45\%  1 = 2.2\), which means we can bet at most \(\$1\) against each \(\$2.2\) pot. More generally, let \(\#\text{n.s.}\) denote the number of community cards not shown yet, then we have
\[\text{improve}\% = 1  \prod_{i=1}^{\#\text{n.s.}} \frac{45 + i  \text{outs}}{45 + i}.\]
Below is the table of improving probabilities and corresponding odds w.r.t. different outs.
Outs  Improve% (River)  Odds (River)  Improve% (Turn)  Odds (Turn) 

1  2.17%  45  4.26%  22 
2  4.35%  22  8.42%  11 
3  6.52%  14  12.49%  7 
4  8.70%  11  16.47%  5.1 
5  10.87%  8.2  20.35%  3.9 
6  13.04%  6.7  24.14%  3.1 
7  15.22%  5.6  27.84%  2.6 
8  17.39%  4.8  31.45%  2.2 
9  19.57%  4.1  34.97%  1.9 
10  21.74%  3.6  38.39%  1.6 
11  23.91%  3.2  41.72%  1.4 
12  26.09%  2.8  44.96%  1.2 
13  28.26%  2.5  48.10%  1.1 
14  30.43%  2.3  51.16%  0.95 
15  32.61%  2.1  54.12%  0.85 
16  34.78%  1.9  56.98%  0.75 
17  36.96%  1.7  59.76%  0.67 
Multiplying the number of outs by two or four gives a reasonable approximation to the improve% (river) or improve% (turn) respectively, in the above table. This is a famous (yet quite rough) approximation among hold 'em gamers. The rule is a direct corollary from the abovementioned formula, as when \(\#\text{n.s.}=1\),
\[\text{improve}\% = 1  \frac{46  \text{outs}}{46} = \frac{\text{outs}}{46} \approx (2\text{outs}) \%\]
and when \(\#\text{n.s.}=2\),
\[\begin{align*}\text{improve}\% &= 1  \frac{46  \text{outs}}{46}\times \frac{47\text{outs}}{47} \\&= \frac{93}{2162}\text{outs}  \frac{1}{2162}\text{outs}^2 \approx (4\text{outs}) \%.\end{align*}\]
]]>In this post, I'll walk through the whole process to download, clean and then browse one of world's largest poker hands history dataset, the IRC Poker Database^{[1]}, which is a little bit aged but wellknown for its huge size. The work we're doing here is meant to be a preparation for further analysis and model training.
Before the advent of realmoney online poker servers, there was the Internet Relay Chat (IRC) poker server. The server was programmed by Todd Mummert, with support code by Greg Reynolds, and other Usenet rec.gambling.poke enthusiasts. The participants in these games were mostly computer geeks with a passion for poker. Many were serious students of the game, armed with the analytical skills needed to understand the mathematics, and all other aspects of advanced poker strategy.
Michael Maurer wrote a program called Observer that sat in on IRC poker channels and quietly logged the details of every game it witnessed. This resulted in the collection of the more than 10 million complete hands of poker (from 19952001) that constitute the IRC Poker Database.
Sadly, the IRC games are now gone (but might be resurrected one day).^{[2]}
We'll be using the same shorthand notations as we gave in the last post. For bet actions, we define

: no actionB
: blind betf
: foldk
: checkb
: betc
: callr
: raiseA
: allinQ
: quitK
: kicked outAs for rounds, we denote
p
: preflopf
: flopt
: turnr
: rivers
: showdownI've written several scripts^{[3]} for all sorts of data preparations and the code can be found on my GitHub repository. After entering the repo, run the following codes in order:
1  wget http://poker.cs.ualberta.ca/IRC/IRCdata.tgz # download the database (> IRCdata.tgz) 
Eventually there're \(10{,}233{,}955\) hands in hands.json
and \(437{,}862\) in hands_valid.json
after cleaning.
You may run the following code to inspect hands in their original order. Any time you'd like to stop browsing, you can just use Ctrl+C
to interrupt the process.
1  python3 browse.py # print hands in a formatted way 
The script lists extracted hands history as below.
############################################################ time : 199612 id : 2093 board : ['Qd', '6s', 'Td', 'Qc', 'Jh'] pots : [(2, 60), (2, 60), (2, 60), (2, 60)]players : Tiger (#1) {'action': 30, 'bankroll': 2922, 'bets': [{'actions': ['B', 'r'], 'stage': 'p'}, {'actions': ['k'], 'stage': 'f'}, {'actions': ['k'], 'stage': 't'}, {'actions': ['k'], 'stage': 'r'}], 'pocket_cards': ['9s', 'Ac'], 'winnings': 30}· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · jvegas2 (#2) {'action': 30, 'bankroll': 139401, 'bets': [{'actions': ['B', 'c'], 'stage': 'p'}, {'actions': ['k'], 'stage': 'f'}, {'actions': ['k'], 'stage': 't'}, {'actions': ['k'], 'stage': 'r'}], 'pocket_cards': ['9c', 'As'], 'winnings': 30}############################################################
So this screenshot describes hand #2093, which happened in December of 1996. There were two players at the table, namely Tiger (the SB) and jvegas2 (the BB). By default the game started with Tiger paying \(\$5\), who's got a 9♠ and an A♣ with a bankroll of \(\$2{,}922\), and jvegas2 paying \(\$10\), whose pocket cards were 9♣ and A♠ with a bankroll of \(\$139{,}401\). Then Tiger raised to \(\$30\) (3BB) and jvegas2 called. So preflot pot was $60 and there're two players. The flop was Q♦, 6♠ and 10♦. It was a dry hand by far so both checked at the flop. The turn was Q♣, and then both checked again. At the river came J♥, nothing special, and again checked both. Both players stuck to the showdown and it was a tie, so the two shared the total pot \(\$60\).
]]>Starting from today, I'm gonna write a series of posts on Texas Hold 'em, one of world's most famous forms of poker. The game is rather complicated, especially considering its origin dating back to early 20th century. In this post, I will list the barebones of hold 'em. These concepts may sound boring to you if you are a veteran poker player, but I just want to make sure we're talking in the same language — or building using the same bricks.
Texas hold 'em uses all cards but the two jokers from one deck of poker. So there're 13 cards wearing each of the four suits: diamonds (♦), hearts (♥), clubs (♣) and spades (♠). The exhaustive array of cards are:
(Suit)  Ace  Deuce  Three  Four  Five  Six  Seven  Eight  Nine  Ten  Jack  Queen  King 

♦  A♦  2♦  3♦  4♦  5♦  6♦  7♦  8♦  9♦  10♦  J♦  Q♦  K♦ 
♥  A♥  2♥  3♥  4♥  5♥  6♥  7♥  8♥  9♥  10♥  J♥  Q♥  K♥ 
♣  A♣  2♣  3♣  4♣  5♣  6♣  7♣  8♣  9♣  10♣  J♣  Q♣  K♣ 
♠  A♠  2♠  3♠  4♠  5♠  6♠  7♠  8♠  9♠  10♠  J♠  Q♠  K♠ 
Besides these fulllength notations, I'll also be using abbreviations where I denote d
for diamonds, h
for hearts, c
for clubs and s
for spades. Also, I use T
to represent cards of 10. Thereby, we can easily use a twocharacter string, e.g. Ts
, to refer to the card 10♠. This shorthand is gonna be especially useful when we try to program the game in the forthcoming posts.
There are in general three types of hold 'em games based on the number of players at the table: 2 (called "headsup"), 36 (called "6max") and 710 (this is called "fullring"). The maximum number of players in a Texas hold 'em game is ten, which means there're ten players and one dealer who does not play. That makes a regular 10player table full (well, with dealer standing).
There are a variety of positions of the table (ignoring the dealer's), but the most important are three: button (BTN), small blind (SB) and big blind (BB). In most selfdealt games where there is no specific person serving as a dealer, the button also serves as the dealer, which is why you can see the "dealer" chip at the same position. The blinds are forced contributions and are paid before the pocket cards (the two cards for each player at the beginning of the game, see figure above): SB pays first and is asked half the money of BB, then BB pays his forced bet and finally the dealer gives away all the pocket cards in a clockwise manner starting from SB. Therefore, BTN would be the last one to receive his pocket cards and also to act (bet or fold, see "betting" below).
Besides BTN, SB and BB, we also usually call the first player to the left of BB as underthegun (UTG), which vividly points out that he's the first to act on this table, as SB and BB are forced to pay their bets.
There are four kinds of actions you can take each round:
The hand begins with a preflop betting round, beginning with UTG and continuing clockwise. A round of betting continues until every player has folded, put in all of their chips, or matched the amount put in by all other active players.
After the preflop betting round, assuming there remain at least two players taking part in the hand, the dealer deals a flop: three faceup community cards. The flop is followed by a second betting round. This and all subsequent betting rounds begin with the player to the dealer's left and continue clockwise.
After the flop betting round ends, a single community card (called the turn or fourth street) is dealt, followed by a third betting round. A final single community card (called the river or fifth street) is then dealt, followed by a fourth betting round and the showdown, if necessary. In the third and fourth betting rounds, the stakes double.
To sum up, players have four rounds to make actions: preflop (given a pocket of two cards), flop (reveal three community cards), turn (reveal one more community card) and river (reveal the last one community card).
The following table shows the possible hand values in ascending order.
Name  Description  Example 

High Card  Simple value of the card. Lowest: deuce; highest: ace.  As 4s 7h Td 2c 
Pair  Two cards with the same value.  As 4s 7h Td Ac 
Two Pairs  Two pairs where each pair of cards have the same value.  As 4s 4h Td Ac 
Three of a Kind  Three cards with the same value.  As 4s 4h 4d 2c 
Straight  Five cards in consecutive values (ace can precede deuce or follow up king).  9s Ts Jh Qd Kc 
Flush  Five cards of the same suit.  Ah 4h 7h Th 2h 
Full House  Three of a kind with the rest two making a pair.  As 4s 4h 4d Ac 
Four of a Kind  Four cards of the same value.  As 4s 4h 4d 4c 
Straight Flush  Straight of the same suit.  9h Th Jh Qh Kh 
Royal Flush  Straight flush from ten to ace.  Th Jh Qh Kh Ah 
A player may use any five cards out of the seven available cards, namely the two pocket cards and five community cards, to reach the highest hand value he may attain. The player with the highest hand value wins the pot unless all but one player folds before showdown (showing pocket cards after the last betting round).
]]>A couple of months ago I was asked the following question during an interview (for propriatary concerns I'm not gonna disclose the industry or name of the company): \(\newcommand{R}{\mathbb{R}} \newcommand{E}{\text{E}} \newcommand{bs}{\boldsymbol} \newcommand{N}{\mathbb{N}}\)
Assume \(k\), \(n\in\N\) and \(k < n\). For a uniformly chosen subspace \(\R^k\subsetneq\R^n\) we define the orthogonal projection as \(P:\R^n\mapsto\R^n\). Find \(\E[P(\bs{v})]\) where \(\bs{v}\in\R^n\) is given.
It's an interesting question and also a totally novel one to me at that time. How do we define a "uniformly" chosen subspace and its corresponding projection? What are the possible intuitions in this simple piece of question? Despite the busy schoolwork and student projects, these thoughts persist in my mind and drive me digging this question from time to time. Curiosity has been aroused and an appetite is meant to be satisfied.
In order to solve this problem, we need to fully understand what's been asked. So now we've got two nonnegative integers \(k<n\) and two spaces, namely \(\R^n\) and \(\R^k\). We know \(\R^k\) is somehow randomly selected as a subspace of \(\R^n\) and this randomness is uniform. For each of such selection, we can make an orthogonal projection of the given point^{[1]} \(\bs{v}\) onto \(\R^k\). Note here the projection is defined from \(\R^n\) to \(\R^n\), which means we're not interested in the projected value on \(\R^k\) but the projection itself. In other words, we're focusing on the projected vector's behavior in the same space as \(\bs{v}\) here.
The simplest example (well, it's in fact not THE simplest as we could always project \(\bs{v}\) onto \(\R^0\) and the resulting expectation would be a zero vector) would be \(n=2\) and \(k=1\). For any given \(\bs{v}\), we can always draw a graph as below.
The random subspace in this case is illustrated by the straight gray line, which determines the projection \(P(\bs{v})=\bs{h}\) as in the graph. We know we're now uniformly selecting this subspace when we rotate this line centered at the origin accordingly. This means the angle between \(\bs{v}\) and \(\bs{h}\), denoted by \(\theta\), is a uniform random variable on \([0, 2\pi)\). Further, simple geometry tells us the angle between \(\bs{v}\) and \(\bs{h}  \bs{v}/2\) is merely \(2\theta\), which is therefore, also uniformly distributed on \([0, 4\pi)\). Now that we know \(\bs{h}\) uniformly lies on the red circle, we conclude the expected projection, in this particular case, is \(\bs{v}/2\).
While it gets geometrically difficult to imagine, not to mention to draw, the case of larger \(n\) and \(k\), this example has given us a pretty nice guess:
\[\E[P(v)] = \frac{k}{n}\bs{v}.\]
Can we prove it in higher dimensions and general cases?
Proof. Now we try to prove that our previous statement is true. For any set of orthogonal bases^{[2]} \(\bs{e}=(\bs{e}_1,\bs{e}_2,\dots,\bs{e}_n)\in\R^{n\times n}\), we uniformly choose a subset \((\bs{e}_{n_1}, \bs{e}_{n_2},\dots,\bs{e}_{n_k})\) and define a subspace \(\R^k\) on them. The projected value on any basis \(\bs{e}_j\) is \(\bs{e}_j'\bs{v}\) and the corresponding vector component would be \(\bs{e}_j\bs{e}_j'\bs{v}\). Therefore, the orthogonal projection of \(\bs{v}\) is given by
\[P(\bs{v}) = \sum_{j=1}^{k}\bs{e}_{n_j}\bs{e}_{n_j}'\bs{v} = \bs{eDe'v} \in \R^n\]
where we define the random matrix \(\bs{D}\) to be a diagonal matrix with \(k\) ones and \((nk)\) zeros on its diagonal. The diagonal entries are not independent, but the expectation of each entry would be the same, namely \(k/n\). The expectation of the projection, therefore, is
\[\E[P(\bs{v})] = \E[\bs{eDe'v}] = \E\{\E[\bs{eDe'v}\mid \bs{e}]\} = \E\{\bs{e}\E[\bs{D}]\bs{e'v}\} = \frac{k}{n}\E[\bs{ee'}]\bs{v}.\]
where we used the tower rule^{[3]}. Now, notice for any \(\bs{e}\) it always holds that \(\bs{e'e}=\bs{I}\), we have
\[(\bs{ee'})^2 = \bs{ee'ee'} = \bs{e(e'e)e'} = \bs{eIe'} = \bs{ee'} \Rightarrow \bs{ee'} = \bs{I}\]
and thus we may finally conclude
\[\E[P(\bs{v})] = \frac{k}{n}\E[\bs{ee'}]\bs{v} = \frac{k}{n}\bs{v}\]
which exactly coincides with our previous guess.Q.E.D.
TBD. May concern dimensional reduction, etc.
These are the lecture notes on foreign exchange market and theories. \(\newcommand{\E}{\text{E}} \newcommand{\P}{\text{P}} \newcommand{\Q}{\text{Q}} \newcommand{\F}{\mathcal{F}} \newcommand{\d}{\text{d}} \newcommand{\N}{\mathcal{N}} \newcommand{\eeq}{\ \!=\mathrel{\mkern3mu}=\ \!} \newcommand{\eeeq}{\ \!=\mathrel{\mkern3mu}=\mathrel{\mkern3mu}=\ \!} \newcommand{\MGF}{\text{MGF}}\)
The spot price of a foreign currency is (LHS as units of foreign currency, RHS as of domestic currency) \(1 = S_t\). Which is equivalent to \(1/S_t = 1\). We say \(S_t\) is a price in domestic terms.
Selling domestic currency to buy foreign currencies.
Value for the buyer is (in domestic currency) \(PV=(S_t  R)N\). This is because of the two cash flows:
Executing a spot contract at time \(T\) with given contract rate \(R\).
Value for the buyer is (in domestic currency) \(PV=(S_t\cdot P^f  R\cdot P^d)N\). This is because of the two cash flows at time \(T\):
which has present values at time \(t\)
We set \(PV=0\) for the forward contract and get \(F\equiv R=S_t\cdot P^f /\ P^d=S_t\exp[(r^d  r^f)\cdot(Tt)]\). Therefore, we also have \(FS_t\approx S_t(r^d  r^f)\cdot(Tt)\).
In order to replicate a forward contract, we can execute a spot contract, borrow domestic and lend foreign. Namely, we have cash flows at time \(t\):
and at time \(T\):
This yields \(S_t/P^dF=F\cdot 1 / P^f\), or \(F=S_t\cdot P^f / P^d\), which is what we call the CIP. This means higher interest rate currencies will be "weaker" on a forward basis.
From the CIP we have \(P^f = P^d \cdot F/S_t\), which gives \(r^f = r^d  \log(F/S_t) / (Tt)\).
Swapping a forward contract (\(T_1\), \(R_1\)) for another (\(T_2\), \(R_2\)).
Value for the buyer is (in domestic currency) \[\begin{align*}PV&=(S_t\cdot P^{f1}  R_1\cdot P^{d1}  S_t\cdot P^{f2} + R_2\cdot P^{d2})\\&=\left\{S_t\left[\exp(r^{f1}(T_1t))  \exp(r^{f2}(T_2t))\right]  R_1\exp(r^{d1}(T_1t)) + R_1\exp(r^{d1}(T_1t))\right\}\cdot N\end{align*}\] which is rather insensitive w.r.t. the spot rate: \[PV_S = \frac{\partial PV}{\partial S} = (P^{f1}  P^{f2})N = \left[\exp(r^{f1}(T_1t))  \exp(r^{f2}(T_2t))\right]\cdot N\approx N r^f(T_2 T_1)\] compared with that of a forward contract: \[PV_S = P^f\cdot N = \exp[r^f(Tt)]\cdot N \approx N.\]
The right (but not obligation) to exhcange \(N\) units of foreign currency for \(N\cdot K\) units of domestic currency at time \(T\). This is to say, we call the right to buy foreign currency as a foreign call, but in the meantime, also a domestic put.
We have the putcall parity as \(CP=P^d(FK)\) and the payoff of a foreign call option, \(\max(0, S_TK)\). We assume \(\{S_t\}_{0\le t\le T}\) follows GBM \(\d S = \mu S \d t + \sigma S \d W\) which, according to Itô's lemma, gives \[\d V = \left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t\right) \d t + V_S\d S\] where \(V\) is any derivative w.r.t. \(S\) (remark: remember that all subscript \(t\) here denote derivatives w.r.t. \(t\), not time). Now, noticing the hedged portfolio \(\Pi = \{+1 \text{ unit of }V; V_S \text{ units of } D^f\}\) has dynamics \[\begin{align*}\d\Pi &= \d V  V_S\d (S\cdot D^f) \\&= \left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t\right) \d t + V_S\d S  V_S(D^f \d S + S\cdot r^f \d t) \\&= \left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t  r^fV_S S\right) \d t\end{align*}\] where we used the fact that \(D^f(t)=1\). Now under riskneutral measure, we know \[\left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t  r^fV_S S\right)\d t = r^d(V  V_S S)\d t\] which gives the socalled GarmanKohlhagen PDE: \[\frac{1}{2}\sigma^2S^2V_{SS} + (r^d  r^f)V_S S  r^d V + V_t = 0\] with boundary conditions \(V(S_T,T)=(S_TK)^+\) and \(V(0,T)=0\).
Trade date is when the terms of the transaction are agreed. Currency trading is a global, 24hour market. The "trading day" ends at 5pm New York time. Value date is when cash flows occur, i.e., when currencies are delivered. Value date for spot transactions is "T+2" for most currency pairs. However, spot value date is "T+1" for USD versus CAD, RUB, TRY, PHP.
Trade Date (T+0)  T+1  Value Date (T+2) 

Trade terms are agreed  Two currency payments are delivered  
Good day for CCY1 and CCY2 if nonUSD  Good day for CCY1 and CCY2  
Can be a USD holiday  Cannot be a USD holiday 
We usually call currency pairs (CCY1/CCY2, usually "/" is omitted) as any of the following:
CCY1  CCY2 

Base Currency  Terms Currency 
Fixed Currency  Variable Currency 
Home Currency  Overseas Currency 
When we say EURUSD \(= 1.1860\), we mean \(1\) EUR \(=\) \(1.1860\) USD.
In the context of bid offer spreads, we denote the bid and offer prices as EURUSD \(=1.1859/1.1860\) (or \(1.1859/60\) as shorthand). These spreads may vary. The possible reasons may involve liquidity, volatility and cost of risk.
In terms of USD, the direct quotes are CCYUSD and the indirect quotes are USDCCY.
Spot rates calculated from an indirect market, e.g. when EURUSD \(=1.1882\) and USDJPY \(=109.14\), then we have cross rate EURJPY \(=129.68\), which does not necessarily coincides with the actual rate in the market.
It's neither of the interest rates of the two currencies. Instead, people use the deposit rate in this case, specifically in terms of USD, it's the Eurodollar deposit rate, or LIBOR.
Contracts with any delivery date (a.k.a. value date) other than spot are considered forward. Standard delivery dates may be in weeks or months, and otherwise called "broken". Specifically, we call the contract "cash" if its delivery date is today, and "tom" if it's tomorrow. FX forwards are OTC (overthecounter).
We define: \(\text{forward point} = \text{forward rate (outright)}  \text{spot rate}\). The number is usually scaled by \(10^4\).
We have \[\text{forward} =\text{spot} \times \frac{1 + R_{\text{variable CCY}}\times\text{days}/ 360}{1 + R_{\text{fixed CCY}}\times\text{days}/360}\] where we use \(R\) instead of \(P\) here as it's more commonly given.
Using the CIP above, we have \[R^f = \frac{(S/F)\times(1 +R_d\times\text{days}/360)  1}{\text{days}/360}\] where we assume the rates are not compounded.
Contracts that alter the value date on an existing trade by simultaneously executing two forward transactions.
# of legs  FX Risk  IR Spread Risk  

Spot  1  Yes  No 
Forward  1  Yes  Yes 
Swap  2  No  Yes 
We define: \(\text{swap point} = \text{far rate}  \text{near rate}\).
To be continued.
]]>This is a brief selection of my notes on the stochastic calculus course. Content may be updated at times. \(\newcommand{\E}{\text{E}} \newcommand{\P}{\text{P}} \newcommand{\Q}{\text{Q}} \newcommand{\F}{\mathcal{F}} \newcommand{\d}{\text{d}} \newcommand{\N}{\mathcal{N}} \newcommand{\sgn}{\text{sgn}} \newcommand{\tr}{\text{tr}} \newcommand{\bs}{\boldsymbol} \newcommand{\eeq}{\ \!=\mathrel{\mkern3mu}=\ \!} \newcommand{\eeeq}{\ \!=\mathrel{\mkern3mu}=\mathrel{\mkern3mu}=\ \!} \newcommand{\R}{\mathbb{R}} \newcommand{\MGF}{\text{MGF}}\)
For \(X\sim\N(\mu,\sigma^2)\), we have \(\MGF(\theta)=\exp(\theta\mu + \theta^2\sigma^2/2)\). We have \(\E(X^k) = \MGF^{\ (k)}(0)\).
Consider a twosided truncation \((a,b)\) on \(\N(\mu,\sigma^2)\), then \[\E[X\mid a < X < b] = \mu  \sigma\frac{\phi(\alpha)  \phi(\beta)}{\Phi(\alpha)  \Phi(\beta)}\] where \(\alpha:=(a\mu)/\sigma\) and \(\beta:=(b\mu)/\sigma\).
Let \(X\) be a MG and \(T\) a stopping time, then \(\E X_{T\wedge n} = \E X_0\) for any \(n\).
Define \((Z\cdot X)_n:=\sum_{i=1}^n Z_i(X_i  X_{i1})\) where \(X\) is MG with \(X_0=0\) and \(Z_n\) is predictable and bounded, then \((Z\cdot X)\) is MG. If \(X\) is subMG, then also is \((Z\cdot X)\). Furthermore, if \(Z\in[0,1]\), then \(\E(Z\cdot X)\le \E X\).
If \(X\) is MG and \(\phi(\cdot)\) is a convex function, then \(\phi(X)\) is subMG.
Given \(\P\)measure, we define the likelihood ratio \(Z:=\d\Q / \d\P\) for another measure \(\Q\). Then we have
CASH
\(\P\) to STOCK
\(\Q\)measure): \(Z(\omega) = (\d\Q/\d\P)(\omega) = S_N(\omega) / S_0\).If \(B\) is a BM and \(T=\tau(\cdot)\) is a stopping time, then \(\{B_{t+T}  B_T\}_{t\ge T}\) is a BM indep. of \(\{B_t\}_{t\le T}\).
If \(B\) is a standard \(k\)BM and \(U\in\mathbb{R}^{k\times k}\) is orthogonal, then \(UB\) is also a standard \(k\)BM.
For any subMG \(X\), we have unique decomposition \(X=M+A\) where \(M_n:=X_0 + \sum_{i=1}^n [X_i  \E(X_i\mid \F_{i1})]\) is a martingale and \(A_n:=\sum_{i=1}^n[\E(X_i\mid \F_{i1})  X_{i1}]\) is a nondecreasing predictable sequence.
For BM \(B\) and stopping time \(T=\tau(a)\), define \(B^*\) s.t. \(B_t^*=B_t\) for all \(t\le T\) and \(B_t^* = 2a  B_t\) for all \(t>T\), then \(B^*\) is also a BM.
\(\P(\max_{s\le t}B_s > x\text{ and }B_t < y) = \Phi\!\left(\frac{y2x}{\sqrt{t}}\right)\).
Let \(X\) and \(Y\) be indep. BM. Note that for all \(t\ge 0\), from exponential MG we know \(\E[\exp(i\theta X_t)]=\exp(\theta^2 t/2)\). Now define \(T=\tau(a)\) for \(Y\) and we have \(\E[\exp(i\theta X_T)] = \E[\exp(\theta^2 T /2)]=\exp(\theta a)\), which is the Fourier transform of the Cauchy density \(f_a(x)=\frac{1}{\pi}\frac{a}{a^2+x^2}\).
We define Itô integral \(I_t(X) := \int_0^t\! X_s\d W_s\) where \(W_t\) is a standard Brownian process and \(X_t\) is adapted.
This is the direct result from the second martingality property above. Let \(X_t\) be nonrandom and continuously differentiable, then \[\E\!\left[\!\left(\int_0^t X_t\d W_t\right)^{\!\!2}\right] = \E\!\left[\int_0^t X_t^2\d t\right].\]
Let \(W_t\) be a standard Brownian motion and let \(f:\R\mapsto\R\) be a twicecontinously differentiable function s.t. \(f\), \(f'\) and \(f''\) are all bounded, then for all \(t>0\) we have \[\d f(W_t) = f'(W_t)\d W_t + \frac{1}{2}f''(W_t) \d t.\]
Let \(W_t\) be a standard Brownian motion and let \(f:[0,\infty)\times\R\mapsto\R\) be a twicecontinously differentiable function s.t. its partial derivatives are all bounded, then for all \(t>0\) we have \[\d f(t, W_t) = f_x\d W_t + \left(f_t + \frac{1}{2}f_{xx}\right) \d t.\]
The Wiener integral is a special case of Itô integral where \(f(t)\) is here a nonrandom function of \(t\). Variance of a Wiener integral can be derived using Itô isometry.
We say \(X_t\) is an Itô process if it satisfies \[\d X_t = Y_t\d W_t + Z_t\d t\] where \(Y_t\) and \(Z_t\) are adapted and \(\forall t\) \[\int_0^t\! \E Y_s^2\d s < \infty\quad\text{and}\quad\int_0^t\! \EZ_s\d s < \infty.\] The quadratic variation of \(X_t\) is \[[X,X]_t = \int_0^t\! Y_s^2\d s.\]
Assume \(X_t\) and \(Y_t\) are two Itô processes, then \[\frac{\d (XY)}{XY} = \frac{\d X}{X} + \frac{\d Y}{Y} + \frac{\d X\d Y}{XY}\] and \[\frac{\d (X/Y)}{X/Y} = \frac{\d X}{X}  \frac{\d Y}{Y} + \left(\frac{\d Y}{Y}\right)^{\!2}  \frac{\d X\d Y}{XY}.\]
A Brownian bridge is a continuoustime stochastic process \(X_t\) with both ends pinned: \(X_0=X_T=0\). The SDE is \[\d X_t = \frac{X_t}{1t}\d t + \d W_t\] which solves to \[X_t = W_t  \frac{t}{T}W_T.\]
Let \(X_t\) be an Itô process. Let \(u(t,x)\) be a twicecontinuously differentiable function with \(u\) and its partial derivatives bounded, then \[\d u(t, X_t) =\frac{\partial u}{\partial t}(t, X_t)\d t +\frac{\partial u}{\partial x}(t, X_t)\d X_t +\frac{1}{2}\frac{\partial^2 u}{\partial x^2}(t, X_t)\d [X,X]_t.\]
The OU process describes a stochastic process that has a tendency to return to an "equilibrium" position \(0\), with returning velocity proportional to its distance from the origin. It's given by SDE \[\d X_t = \alpha X_t \d t + \d W_t \Rightarrow\d [\exp(\alpha t)X_t] = \exp(\alpha t)\d W_t \] which solves to \[X_t = \exp(\alpha t)\left[X_0 + \int_0^t\! \exp(as)\d W_s\right].\]
Remark: In finance, the OU process is often called the Vasicek model.
The SDE for general diffusion process is \(\d X_t = \mu(X_t)\d t + \sigma(X_t)\d W_t\).
In order to find \(\P(X_T=B)\) where we define \(T=\inf\{t\ge 0: X_t=A\text{ or }B\}\), we consider a harmonic function \(f(x)\) s.t. \(f(X_t)\) is a MG. This gives ODE \[f'(x)\mu(x) + f''(x)\sigma^2(x)/2 = 0\Rightarrowf(x) = \int_A^x C_1\exp\left\{\!\int_A^z\frac{2\mu(y)}{\sigma^2(y)}\d y\right\}\d z + C_2\] where \(C_{1,2}\) are constants. Then since \(f(X_{T\wedge t})\) is a bounded MG, by Doob's identity we have \[\P(X_T=B) = \frac{f(X_0)  f(A)}{f(B)  f(A)}.\]
Let \(\bs{W_t}\) be a \(K\)dimensional standard Brownian motion. Let \(u:\R^K\mapsto \R\) be a \(C^2\) function with bounded first and second partial derivatives. Then \[\d u(\bs{W}_t) = \nabla u(\bs{W}_t)\cdot \d \bs{W}_t + \frac{1}{2}\tr[\Delta u(\bs{W}_t)] \d t\] where the gradient operator \(\nabla\) gives the vector of all first order partial derivatives, and the Laplace operator (or Laplacian) \(\Delta\equiv\nabla^2\) gives the vector of all second order partial derivatives.
If \(T\) is a stopping time for \(\bs{W_t}\), then for any fixed \(t\) we have \[\E[u(\bs{W}_{T\wedge t})] = u(\bs{0}) + \frac{1}{2}\E\!\left[\int_0^{T\wedge t}\!\!\Delta u(\bs{W}_s)\d s\right].\]
A \(C^2\) function \(u:\R^k\mapsto\R\) is said to be harmonic in a region \(\mathcal{U}\) if \(\Delta u(x) = 0\) for all \(x\in \mathcal{U}\). Examples are \(u(x,y)=2\log(r)\) and \(u(x,y,z)=1/r\) where \(r\) is defined as the norm.
Remark: \(f\) being a harmonic function is equivalent to \(f(X_t)\) being a MG, i.e. \(f'(x)\mu(x) + f''(x)\sigma^2(x)/2 = 0\) for a diffusion process \(X_t\).
Let \(u\) be harmonic in the an open region \(\mathcal{U}\) with compact support, and assume that \(u\) and its partials extend continuously to the boundary \(\partial \mathcal{U}\). Define \(T\) to be the first exit time of Brownian motion from \(\mathcal{U}\). for any \(\bs{x}\in\mathcal{U}\), let \(\E^{\bs{x}}\) be the expectation under measure \(\P^{\bs{x}}\) s.t. \(\bs{W}_t  \bs{x}\) is a \(K\)dimensional standard BM. Then
A multivariate Itô process is a continuoustime stochastic process \(X_t\in\R\) of the form \[X_t = X_0 + \int_0^t\! M_s \d s + \int_0^t\! \bs{N}_s\cdot \d \bs{W}_s\] where \(\bs{N}_t\) is an adapted \(\R^K\)−valued process and \(\bs{W}_t\) is a \(K\)−dimensional standard BM.
Let \(\bs{W}_t\in\R^K\) be a standard \(K\)−dimensional BM, and let \(\bs{X}_t\in\R^m\) be a vector of \(m\) multivariate Itô processes satisfying \[\d X_t^i = M_t^i\d t + \bs{N}_t^i\cdot \d \bs{W}_t.\] Then for any \(C^2\) function \(u:\R^m\mapsto\R\) with bounded first and second partial derivatives \[\d u(\bs{X}_t) = \nabla u(\bs{X}_t)\cdot \d \bs{X}_t + \frac{1}{2}\tr[\Delta u(\bs{X}_t)\cdot \d [\bs{X},\bs{X}]_t].\]
Let \(\bs{W}_t\) be a standard \(K\)−dimensional BM, and let \(\bs{U}_t\) be an adapted \(K\)−dimensional process satisfying \[{\bs{U}_t} = 1\quad\forall t\ge 0.\] Then we know the following \(1\)dimensional Itô process is a standard BM: \[X_t := \int_0^t\!\! \bs{U}_s\cdot \d W_s.\]
Let \(\bs{W}_t\) be a standard \(K\)−dimensional BM, and let \(R_t=\bs{W}_t\) be the corresponding radial process, then \(R_t\) is a Bessel process with parameter \((K1)\) given by \[\d R_t = \frac{K1}{R_t}\d t + \d W_t^{\sgn}\] where we define \(\d W_t^{\sgn} := \sgn(\bs{W}_t)\cdot \d \bs{W}_t\).
A Bessel process with parameter \(a\) is a stochastic process \(X_t\) given by \[\d X_t = \frac{a}{X_t}\d t+ \d W_t.\] Since this is just a special case of diffusion processes, we know the corresponding harmonic function is \(f(x)=C_1x^{2a+1} + C_2\), and the hitting probability is \[\P(X_T=B) = \frac{f(X_0)  f(A)}{f(B)  f(A)} =\begin{cases}1 & \text{if }a > 1/2,\\(x/B)^{12a} & \text{otherwise}.\end{cases}\]
Let \(W_t\) be a standard \(1\)dimensional Brownian motion and let \(\F_t\) be the \(\sigma\)−algebra of all events determined by the path \(\{W_s\}_{s\le t}\). If \(Y\) is any r.v. with mean \(0\) and finite variance that is measurable with respect to \(\F_t\), then for some \(t > 0\) \[Y = \int_0^t\! A_s\d W_s\] for some adapted process \(A_t\) that satisfies \[\E(Y^2) = \int_0^t\! \E(A_s^2)\d s.\] This theorem is of importance in finance because it implies that in the BlackSholes setting, every contingent CLAIM
can be hedged.
Special case: let \(Y_t=f(W_t)\) be any mean \(0\) r.v. with \(f\in C^2\). Let \(u(s,x):=\E[f(W_t)\mid W_s = x]\), then \[Y_t = f(W_t) = \int_0^t\! u_x(s,W_s)\d W_s.\]
CASH
with nonrandom rate of return \(r_t\)STOCK
with share price \(S_t\) such that \(\d S_t = S_t(\mu_t \d t + \sigma \d W_t)\)Under a riskneutral measure \(\P\), the discounted share price \(S_t / M_t\) is a martingale and thus \[\frac{S_t}{M_t} = \frac{S_0}{M_0}\exp\left\{\sigma W_t  \frac{\sigma^2t}{2}\right\}\] where we used the fact that \(\mu_t = r_t\) by the Fundamental Theorem.
A European contingent CLAIM
with expiration date \(T > 0\) and payoff function \(f:\R\mapsto\R\) is a tradeable asset that pays \(f(S_T)\) at time \(T\). By the Fundamental Theorem we know the discounted share price of this CLAIM
at any \(t\le T\) is \(\E[f(S_T)/M_T\mid \F_t]\). In order to calculate this conditional expectation, let \(g(W_t):= f(S_t)/M_t\), then by the Markov property of BM we know \(\E[g(W_T)\mid \F_t] = \E[g(W_t + W_{Tt}^*)\mid \F_t]\) where \(W_t\) is adapted in \(\F_t\) and independent of \(W_t^*\).
The discounted time−\(t\) price of a European contingent CLAIM
with expiration date \(T\) and payoff function \(f\) is \[\E[f(S_T)/M_T\mid \F_t] = \frac{1}{M_T}\E\!\left[f\!\left(S_t\exp\!\left\{\sigma W_{Tt}^*  \frac{\sigma^2(Tt)}{2} + R_T  R_t\right\}\right)\middle\F_t\right]\] where \(S_t\) is adapted in \(\F_t\) and independent of \(W_t^*\). The expectation is calculated using normal. Note here \(R_t = \int_0^t r_s\d s\) is the logcompound interest rate.
Under riskneutral probability measure, the discounted share price of CLAIM
is a martingale, i.e. it has no drift term. So we can differentiate \(M_t^{1}u(t,S_t)\) by Itô and derive the following PDE \[u_t(t,S_t) + r_t S_tu_x(t,S_t) + \frac{\sigma^2S_t^2}{2}u_{xx}(t,S_t) = r_t u(t,S_t)\] with terminal condition \(u(T,S_T)=f(S_T)\). Note here everything is under the BS model.
A replicating portfolio for a contingent CLAIM
in STOCK
and CASH
is given by \[V_t = \alpha_t M_t + \beta_t S_t\] where \(\alpha_t = [u(t,S_t)  S_t u_x(t,S_t)]/M_t\) and \(\beta_t = u_x(t,S_t)\).
A barrier option pays \(\$1\) at time \(T\) if \(\max_{t\le T} S_t \ge AS_0\) and \(\$0\) otherwise. This is a simple example of a pathdependent option. Other commonly used examples are knockins, knockouts, lookbacks and Asian options.
The time\(0\) price of such barrier options is calculated from \[\begin{align*}V_0 &= \exp(rT)\P\!\left(\max_{t\le T} S_t \ge AS_0\right)= \exp(rT)\P\!\left(\max_{t\le T} W_t + \mu t \ge a\right)\\&= \exp(rT)\P_{\mu}\!\left(\max_{t\le T} W_t \ge a\right)\end{align*}\] where \(\mu=r\sigma^{1}  \sigma/2\) and \(a = \sigma^{1}\log A\). Now, by CameronMartin we know \[\begin{align*}\P_{\mu}\!\left(\max_{t\le T} W_t \ge a\right) &=\E_0[Z_T\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}] =\E_0[\exp(\mu W_T  \mu^2 T / 2)\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}] \\ &=\exp( \mu^2 T / 2)\cdot \E_0[\exp(\mu W_T)\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}]\end{align*}\] and by reflection principle we have \[\begin{align*}\E_0[\exp(\mu W_T)\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}] &=e^{\mu a}\int_0^{\infty} (e^{\mu y} + e^{\mu y}) \P(W_T  a \in \d y) \\&=\Phi(\mu\sqrt{T}  a/\sqrt{T}) + e^{2\mu a}\Phi(\mu\sqrt{T}a/\sqrt{T}).\end{align*}\]
The exponential process \[Z_t = \exp\!\left\{\int_0^t\! Y_s\d W_s  \frac{1}{2}\int_0^t\! Y_s^2\d s\right\}\] is a positive MG given \[\E\!\left[\int_0^t\! Z_s^2Y_s^2\d s\right] < \infty.\] Specifically, the exponential martingale is given by the SDE \(\d X_t = \theta X_t \d W_t\).
Assume that under the probability measure \(\P\) the exponential process \(Z_t(Y)\) is a MG and \(W_t\) is a standard BM. Define the absolutely continuous probability measure \(Q\) on \(\F_t\) with likelihood ratio \(Z_t\), i.e. \((\d\Q/\d\P)_{\F_t} = Z_t\), then under \(Q\) the process \[W_t^* := W_t  \int_0^t\! Y_s\d s\] is a standard BM. Girsanov's Theorem shows that drift can be added or removed by change of measure.
The exponential process \[Z_t = \exp\!\left\{\int_0^t\! Y_s \d W_s  \frac{1}{2}\!\int_0^t\! Y_s^2 \d s\right\}\] is a MG given \[\E\left[\exp\!\left\{\frac{1}{2}\!\int_0^t\! Y_s^2\d s\right\}\right] < \infty.\] This theorem gives another way to show whether an exponential process is a MG.
Assume \(W_t\) is a standard BM under \(\P\), define likelihood ratio \(Z_t = (\d\Q/\d\P)_{\F_t}\) as above where \(Y_t = \alpha W_t\), then by Girsanov \(W_t\) under \(\Q\) is an OU process.
If a system can be in one of a collection of states \(\{\omega_i\}_{i\in\mathcal{I}}\), the probability of finding it in a particular state \(\omega_i\) is proportional to \(\exp\{H(\omega_i)/kT\}\) where \(k\) is Boltzmann's constant, \(T\) is temperature and \(H(\cdot)\) is energy.
If \(W_t\) is standard BM with \(W_0 = x \in (0, A)\), how does \(W_t\) behave conditional on the event that it hits \(A\) before \(0\)? Define
Then the likelihood ratios are \[\left(\frac{\d\Q^x}{\d\P^x}\right)_{\!\F_T} \!= \frac{\mathbf{1}_{\{W_T=A\}}}{\P^x\{W_T=x\}} \Rightarrow\left(\frac{\d\Q^x}{\d\P^x}\right)_{\!\F_{T\wedge t}} \!= \E\!\left[\left(\frac{\d\Q^x}{\d\P^x}\right)_{\!\F_T}\middle\F_{T\wedge t}\right] = \frac{W_{T\wedge t}}{x}.\] Notice \[\begin{align*}\frac{W_{T\wedge t}}{x} &=\exp\left\{\log W_{T\wedge t}\right\} / x \overset{\text{Itô}}{\eeq}\exp\left\{\log W_0 + \int_0^{T\wedge t}W_s^{1}\d W_s  \frac{1}{2}\int_0^{T\wedge t} W_s^{2}\d s\right\} / x \\&=\exp\left\{\int_0^{T\wedge t}W_s^{1}\d W_s  \frac{1}{2}\int_0^{T\wedge t} W_s^{2}\d s\right\}\end{align*}\] which is a Girsanov likelihood ratio, so we conclude \(W_t\) is a BM under \(\Q^x\) with drift \(W_t^{1}\), or equivalently \[W_t^* = W_t  \int_0^{T\wedge t}W_s^{1}\d s\] is a standard BM with initial point \(W_0^* = x\).
A onedimensional Lévy process is a continuoustime random process \(\{X_t\}_{t\ge 0}\) with \(X_0=0\) and i.i.d. increments. Lévy processes are defined to be a.s. right continuous with left limits.
Remark: Brownian motion is the only Lévy process with continuous paths.
Let \(B_t\) be a standard BM. Define the FPT process as \(\tau_x = \inf\{t\ge 0: B_t \ge x\}\). Then \(\{\tau_{x}\}_{x\ge 0}\) is a Lévy process called the onesided stable\(1/2\) process. Specifically, the sample paths \(x\mapsto \tau_x\) is nondecreasing in \(x\). Such Lévy processes with nondecreasing paths are called subordinators.
A Poisson process with rate (or intensity) \(\lambda > 0\) is a Lévy process \(N_t\) such that for any \(t\ge 0\) the distribution of the random variable \(N_t\) is the Poisson distribution with mean \(\lambda t\). Thus, for any \(k=0,1,2,\cdots\) we have \(\P(N_t=k) = (\lambda t)^k\exp(\lambda t)\ /\ k!\) for all \(t > 0\).
Remark 1: (Superposition Theorem) If \(N_t\) and \(M_t\) are independent Poisson processes of rates \(\lambda\) and \(\mu\) respectively, then the superposition \(N_t + M_t\) is a Poisson process of rate \(\lambda+\mu\).
Remark 2: (Exponential Interval) Successive intervals are i.i.d. exponential r.v.s. with common mean \(1/\lambda\).
Remark 3: (Thinning Property) Bernoulli\(p\) r.v.s. by Poisson\(\lambda\) compounding is again Poisson with rate \(\lambda p\).
Remark 4: (Compounding) Every compound Poisson process is a Lévy process. We call the \(\lambda F\) the Lévy measure where \(F\) is the compounding distribution.
For \(N\sim\text{Pois}(\lambda)\), we have \(\MGF(\theta)=\exp[\lambda (e^{\theta}1)]\).
For \(X_t=\sum_{i=1}^{N_t}\!Y_i\) where \(N_t\sim\text{Pois}(\lambda t)\) and \(\MGF_Y(\theta) = \psi(\theta) < \infty\), then \(\MGF_{X_t}(\theta)=\exp[\lambda t (\psi(\theta)  1)]\).
Binomial\((n,p_n)\) distribution, where \(n\to\infty\) and \(p_n\to 0\) s.t. \(np_n\to\lambda > 0\), converges to Poisson\(\lambda\) distribution.
If \(N_t\) is a Poisson process with rate \(\lambda\), then \(Z_t=\exp[\theta N_t  (e^{\theta}  1) \lambda t]\) is a martingale for any \(\theta\in\R\).
Remark: Similar to CameronMartin, let \(N_t\) be a Poisson process with rate \(\lambda\) under \(\P\), let \(\Q\) be the measure s.t. the likelihood ratio \((\d\Q/\d\P)_{\F_t}=Z_t\) is defined as above, then \(N_t\) under \(\Q\) is a Poisson process with rate \(\lambda e^{\theta}\).
If \(X_t\) is a compound Poisson process with Lévy measure \(\lambda F\). Let the MGF of compounding distribution \(F\) be \(\psi(\theta)\), then \(Z_t=\exp[\theta X_t  (\psi(\theta)  1)\lambda t]\) is a martingale for any \(\theta\in\R\).
A \(K\)dimensional Lévy process is a continuoustime random process \(\{\bs{X}_t\}_{t\ge 0}\) with \(\bs{X}_0=\bs{0}\) and i.i.d. increments. Like the onedimensional version, vector Lévy processes are defined to be a.s. right continuous with left limits.
Remark: Given nonrandom linear transform \(F:\R^K\mapsto \R^M\) and a \(K\)dimensional Lévy process \(\{\bs{X}_t\}_{t\ge 0}\), then \(\{F(\bs{X}_t)\}_{t\ge 0}\) is a Lévy process on \(\R^M\).
]]>I am recently playing a billiard game where you can play a series of exciting tournaments. In each tournament, you pay an entrance fee of, for example, \(\$500\), to potentially win a prize of, say, \(\$2500\). There are various kinds of tournaments with different entrance fees ranging from \(\$100\) up to over \(\$10000\). After hundreds of games, my winning rate stablized around \(58\%\), which is actually pretty good as it significantly beats random draws. A natural concept therefore came into my mind: Is there an optimal strategy?
Well, I think so. I'll list two strategies below and try to explore any potential optimality. We can reasonably model these tournaments as repetitive betting with certain fixed physical probability \(p\) of winning and odds^{[1]} of \((d1)\):\(1\) against ourselves. Given that there are sufficiently sparse tournament entrance fees, we may model these fees as a real variable \(x\in\mathbb{R}_+\) to maximize our long run profitability. Without loss of generality, let's assume an initial balance of \(M_0=0\) and that money in this world is infinitely divisible. The problem then becomes determination of the optimal \(x\in[0,1]\) s.t. the expected return is maximized. Nonetheless, regarding different interpretations of this problem we have several solutions. Some are intriguing while others may be frustrating.
Let's first take a look at potential values of \(x\) and the corresponding balance trajectories \(M_t\). For any \(0 \le x \le 1\), we have probability \(p\) to get an \(x\)fraction of our whole balance \(D\)folded and \(1p\) to lose it, that is \[\text{E}(M_{t+1}\mid\mathcal{F}_t) = (1x)M_t + p\cdot xdM_t + (1p)\cdot 0 =[1 + (pd1)x] M_t\] which indicates \(M_t\) is a submartingale^{[2]} as in this particular case, \(p=0.58\), \(d=5\) and thus \(pd=2.9 > 1\). So the optimal fraction is \(x^* = 1\), which is rather aggresive and yields a ruin probability of \(1p^n\) for the first \(n\) bets. Simulation supports our worries: not once did we survived \(10\) bets in this tournament, and the maximum we ever attained is less than a million.
If consider \(\log M_t\) instead, then \[\begin{align*}\text{E}(\log M_t\mid \mathcal{F}_t) &=p\cdot \log[(1x)M_t + xdM_t] +(1p)\cdot \log[(1x)M_t + 0]\\ &=p\cdot \log[(1(1d)x)M_t] +(1p)\cdot \log[(1x)M_t].\end{align*}\] The first order condition is \[\frac{\partial}{\partial x}\text{E}(\log M_t\mid \mathcal{F}_t) =\frac{p(1d)}{1(1d)x}+\frac{1p}{1x} = 0 \quad\Rightarrow\quadx^* = \frac{pd1}{d1}=0.475\] which is more conservative and therefore, should survive longer than the previous strategy. Simulation gives the following trajectories: even the worst sim beat the best we got when \(x=1\).
According to Doob's martingale inequality^{[3]}, the probability of our balance ever attaining a value no less than \(C = 1\times10^{60}\) in \(T=500\) steps is \[\text{P}\left(\sup_{t \le T}M_t\ge C\right) \le \frac{\text{E}(M_T)}{C} = \frac{M_0}{C} \prod_{t=0}^{T1}\frac{\text{E}(M_{t+1}\mid\mathcal{F}_t)}{M_t} =\frac{[1+(pd1)x]^T}{C} \approx 4.6\times10^{139} \gg 1.\] This implies the superior limit of the probability that our balance exceeds \(1\times10^{60}\) within \(500\) steps is one (instead of what simulation gave us, which is around \(0.31\)). To put it differently, we actually might be able to find a certain strategy that is even significantly better than the one given by the Kelly criterion.
What is it, then? Or, does it actually exist? I don't have an idea yet, but perhaps exploratory algorithms like machine learning will give us some hints, and perhaps the strategy is not static but rather dynamic.
I've recently sold my Nvidia GTX 1080 eGPU^{[1]} after two month's waiting in vain for a compatible Nvidia video driver for MacOS 10.14 (Mojave). Either Apple's or Nvidia's fault, I don't care any more. Right away, I ordered an AMD Radeon RX Vega 64 on Newegg. The card arrived two days later and it looked sexy at first sight. It's plugandplay as expected and performed just as good as its predecessor, regardless of serious gaming, video editing or whatever. I would have given it a 9.5/10 if not find another issue a couple of days later — wow, there is no CUDA on this card!
Of course there isn't. Cause CUDA was developed by Nvidia who's been paying great efforts on making a more userfriendly deeplearning environment. Compared with that, AMD (yes!) used to intentionally avoid a headtohead competition against world's largest GPU factory and instead keep making gaming cards with better costtoperformance ratios. ROCm, which is an opensource HPC/Hyperscaleclass platform for GPU computing that allows cards other than Nvidia's, does make this gap much narrower than before. However, ROCm is still publicly not supporting MacOS and you have to run a Linux bootcamp to utilize the computational benefits of your AMD card, even though you can already game smoothly on you Mac. Sad it is, AMD 😰.
There are, however, several solutions if you're people just like me who really have to run your code on a Mac and would like to accelerate those Renaissance training times with a GPU. The method I adapted was by using a framework called PlaidML, and I'd like to walk you through how I installed, and configured my GPU with it.
1  pip3 install plaidmlkeras plaidbench 
After installation, we can set up the intended device for computing by running:
1  plaidmlsetup 
PlaidML Setup (0.3.5)Thanks for using PlaidML!Some Notes: * Bugs and other issues: https://github.com/plaidml/plaidml * Questions: https://stackoverflow.com/questions/tagged/plaidml * Say hello: https://groups.google.com/forum/#!forum/plaidmldev * PlaidML is licensed under the GNU AGPLv3 Default Config Devices: No devices.Experimental Config Devices: llvm_cpu.0 : CPU (LLVM) opencl_intel_intel(r)_iris(tm)_plus_graphics_655.0 : Intel Inc. Intel(R) Iris(TM) Plus Graphics 655 (OpenCL) opencl_cpu.0 : Intel CPU (OpenCL) opencl_amd_amd_radeon_rx_vega_64_compute_engine.0 : AMD AMD Radeon RX Vega 64 Compute Engine (OpenCL) metal_intel(r)_iris(tm)_plus_graphics_655.0 : Intel(R) Iris(TM) Plus Graphics 655 (Metal) metal_amd_radeon_rx_vega_64.0 : AMD Radeon RX Vega 64 (Metal)Using experimental devices can cause poor performance, crashes, and other nastiness.Enable experimental device support? (y,n)[n]:
Of course we enter y
. Before I choose device 4 (OpenCL with AMD) or 6 (Metal with AMD), I'd like to benchmark on the default device, CPU (LLVM). The test script (on MobileNet as an example) is
1  plaidbench keras mobilenet 
and the result shows^{[2]}
Running 1024 examples with mobilenet, batch size 1INFO:plaidml:Opening device "llvm_cpu.0"Downloading data from https://github.com/fchollet/deeplearningmodels/releases/download/v0.6/mobilenet_1_0_224_tf.h517227776/17225924 [==============================]  2s 0us/stepModel loaded.Compiling network...Warming up ...Main timingExample finished, elapsed: 3.0688607692718506 (compile), 61.17863607406616 (execution), 0.059744761791080236 (execution per example)Correctness: PASS, max_error: 1.7511049009044655e05, max_abs_error: 6.556510925292969e07, fail_ratio: 0.0
Now we run the setup again and choose 4 (OpenCL with AMD). The result is
Running 1024 examples with mobilenet, batch size 1INFO:plaidml:Opening device "opencl_amd_amd_radeon_rx_vega_64_compute_engine.0"Model loaded.Compiling network...Warming up ...Main timingExample finished, elapsed: 2.6935510635375977 (compile), 13.741217851638794 (execution), 0.01341915805824101 (execution per example)Correctness: PASS, max_error: 1.7511049009044655e05, max_abs_error: 1.1995434761047363e06, fail_ratio: 0.0
Finally we run the test against the expected most powerful device, i.e. device 6 (Metal with AMD).
Running 1024 examples with mobilenet, batch size 1INFO:plaidml:Opening device "metal_amd_radeon_rx_vega_64.0"Model loaded.Compiling network...Warming up ...Main timingExample finished, elapsed: 2.243159055709839 (compile), 7.515545129776001 (execution), 0.007339399540796876 (execution per example)Correctness: PASS, max_error: 1.7974503862205893e05, max_abs_error: 1.0952353477478027e06, fail_ratio: 0.0
As a conclusion, by utilizing the Metal core on my Mac as well as the external AMD GPU, the training runtime was roughly 87.7% down and I'm personally quite satisfied with that.
]]>It's been more than two years since my last trip to the Arctic Circle when I was still studying in the Netherlands. Our adventurous hike in Abisko, in endless Northern European Mountains, was still a frequent dream of mine. This time we went to Fairbanks, Alaska, for Aurora and also, for another Arctic experience.
We spent five days in Fairbanks, five days and six nights. Apart from the two simple dinners we took on the hike and one with beef noodles at the arctic circle camp, the Pump House ended up as the very choice of our bestrecommended feasts. It is a fine dining restaurant and probably the best in the town, as you get a thumbup to this little house from nearly every local you meet. It is definitely one of the most enjoyable moments throughout our stay in Fairbanks, most warming, relaxing and tastebudexciting.
其实一开始的时候我们还打算去更远的一家 Turtle Club 吃吃看，毕竟那家也在 Yelp 上评分颇高。导游也曾推荐我们去试一下当地的一家（华人开的）自助，叫 AK Buffet，所幸最后有一餐午饭跟团去了那里——吃完大失所望，于是又暗自庆幸我们没有在之前选择浪费一餐在那里。最后的结果即是上面说的：在 the Pump House 吃了整整三天，把菜单上的推荐菜几乎完完整整点了一圈。三天下来，感觉生蚝与牛排不及预期，但海鲜名副其实。其中最为推荐的是他家的海鲜浓汤 Seafood Chowder，单点或上整个 bread bowl 都可以。汤头很浓郁，且与一般的海鲜浓汤不一样的是，他在奶味外还有一种类似鸡汤的鲜味。满满一勺入口，这种鲜味与各种海鲜的口感直接冲击味蕾，裹挟着一天的疲惫融成温暖、慰藉与幸福。这大概是我眼中一碗海鲜浓汤的全部意义了。相似的感觉也出现在他家的 Seafood Risotto 上。虾肉 Q 弹、带子紧实、蟹柳饱满。更重要的是米饭熟度恰到好处，不过分奶腻，也未过分夹生，可以说比肩甚至超越了我在欧洲吃到过的最好的 risotto 了。除开这两样，他家力推的 King Salmon 口感尚可，相比之下或许 Alaskan Halibut 煎制得更鲜美（至少，好于同页的 Alaskan Cod）。最后，自然不可避嫌地需要提一提他家的 Steamed King Crab——新鲜捕捞的阿拉斯加帝王蟹^{[1]}被用于最能保留海鲜风味的蒸制，不加任何佐料，甚至连余温都未有散去就上桌呈现给食客们。用专门的钳子打开，蟹肉温软而有弹性，散发着清蒸海鲜特有的香气。整一根蟹腿在手，甚至满满一口有多，这种满足感是阿拉斯加之外的任何地方不能给予的。
他家进门处立着一个一人多高的棕熊标本^{[2]}，憨态可掬，可惜去了三次都忘记拍照。我们坐了两次的座位上方，悬着一匹巨大的驼鹿标本，初见颇有压迫感，去过一次再看，倒也很具情调。The Pump House 地处 Fairbanks 最大的河流 Chena River 一岸，白天的景致据说很美，在夏季开放的露天座位据说又是另一番风情，这些我们都无从知晓了。最后从网上找来一张他家入口处的照片（由于我们去时都是晚上，所以一张都没拍），以飨各位。