This is an exploratory research trying to implement a simple yet robust sentiment analysis mechanism in the cryptocurrency market. We call it a hybrid sentimentmomentum strategy as it applies context analysis (or more specifically, word analysis) into traditional momentum strategies and adjusts position automatically over time using both factors. The hybrid strategy is implemented with backtesing period from Dec. 17th 2017 to May 2nd 2019. It yields cumulative return of 229% and a remarkable Sharpe ratio of 2.68 with acceptable maximum drawdown in outofsample backtest, which is a huge improvement from classical momentum strategies in our analysis.
Cryptocurrency has been a hot topic ever since the creation of Bitcoin in 2009. Unlike stocks which are backed by real companies, cryptocurrency has no intrinsic value, causing its price to be heavily reliant on market information. The decentralization of exchanges, the inconsistency of prices, the lack of tangible entity and assets all present serious analytics challenges that are difficult to solve with traditional tools. Therefore, we want to introduce news data into our strategy.
There are many reasons to believe that momentum strategy will perform well in this market. Compared to other markets, there are fewer participants in the cryptocurrency market; thus, information is both diffused and acted on slowly. In addition, a smaller number of participants leads to a lack of liquidity, forcing large investors to spread out transactions over time. Last but not least, the limited information source makes the decision of investors largely dependent on recent performance, which further drives the herd mentality. However, there are no less reasons to think that momentum strategy will fail. We need something else to help us decide when we should use momentum strategy and when we may want to go opposite. So we go one step further and develop a new strategy that we call hybrid sentimentmomentum strategy.
On the other hand, the cryptocurrency market is mostly driven by the overall investors' fanaticism, which suggests that momentum strategy would perform extremely well when this market is optimistic and bullish. However, when fanaticism is dampened by negative news, people go back to a more rational and conservative investing style, driving a momentum reversal strategy to gain.
In the following sections, we showcase our hybrid sentimentmomentum strategy step by step. First of all, we introduce how we use web crawling to get news data, coin prices and market capitalization and how we process them. After that, we illustrate our model construction and the underlying mathematical support. We then present our evaluation on strategy performance through comparing it with benchmark strategies including buyandhold, momentum and momentum reversal. Before wrapping up with our conclusions, we demonstrate our analysis on model sensitivity to multiple parameters.
In this section, we are going to discuss our source of data as well as how we prepare our data for the sentiment model.
1  import os 
News Data
We crawl all posts from CoinDesk. The crawling scripts are included in Appendix A. We have news data from 20130415 to 20190502 (latest by the time of drafting). Entries are title
, abstract
and contents
.
1  df = pd.read_csv('data/posts.csv', index_col=0) 
date  title  abstract  content 

20130415  ‘Zerocoin’ widget promises Bitcoin privacy  Researchers at Johns Hopkins University are pr...  Researchers at Johns Hopkins University are pr... 
20130416  OpenCoin aims to build a better digital currency  Digital currency promises to solve some proble...  Digital currency promises to solve some proble... 
20130416  Meet the money behind bitcoin competitor OpenCoin  At first glance, especially to a digital curre...  At first glance, especially to a digital curre... 
20130417  $500k in funding paves way for bitcoin trading  New York Citybased bitcoin startup Coinsetter...  New York Citybased bitcoin startup Coinsetter... 
20130422  Argentina trades $50k of bitcoins  A record $50,000 of bitcoins were traded in Ar...  A record $50,000 of bitcoins were traded in Ar... 
...  ...  ...  ... 
20190501  CoinMarketCap Forms Alliance to Tackle Concern...  Crypto data provider CoinMarketCap is working ...  Cryptocurrency data provider CoinMarketCap is ... 
20190501  TechstarsBacked Alkemi Enters DeFi Race With ...  This new startup wants to plug exchanges, fund...  A new type of decentralized finance (DeFi) app... 
20190502  When Tether Warnings Are Marketing Tools  Stablecoin issuers are taking advantage of Tet...  “We’re in a scary place, because if Tether’s p... 
20190502  3 Price Hurdles That Could Complicate a Bitcoi...  Bitcoin is on the offensive, having defended k...  Bitcoin (BTC) is on the offensive, having defe... 
20190502  Diamond Standard Launches BlockchainPowered T...  A new startup is looking to make diamonds as a...  As a store of value, diamonds have their advan... 
EOD Price & Market Cap Data
Also, we crawl the CoinMarketCap for historical EOD data of over 10 major coins. Only daily close prices and market caps are kept. See Appendix B for codes. Cryptocurrency market is 24/7, for the purpose of this research, we define daily close price as the price at 24:00:00 UTC.
Step 0: Group up post (news) data by dates.
1  df_eod = df.copy() 
date  title  abstract  content 

20130415  ‘Zerocoin’ widget promises Bitcoin privacy  Researchers at Johns Hopkins University are pr...  Researchers at Johns Hopkins University are pr... 
20130416  OpenCoin aims to build a better digital curren...  Digital currency promises to solve some proble...  Digital currency promises to solve some proble... 
20130417  $500k in funding paves way for bitcoin trading  New York Citybased bitcoin startup Coinsetter...  New York Citybased bitcoin startup Coinsetter... 
20130422  Argentina trades $50k of bitcoins Bitcoin pric...  A record $50,000 of bitcoins were traded in Ar...  A record $50,000 of bitcoins were traded in Ar... 
20130423  Mt Gox CEO on Bitcoin’s future Altcurrency fi...  Mark Karpeles gives a great explanation of how...  Mark Karpeles gives a great explanation of how... 
...  ...  ...  ... 
20190426  How Crypto Markets Are Reacting to the Tether...  The crypto markets endured a loss of as much a...  The cryptocurrency markets endured a loss of a... 
20190429  US Stock Broker E*Trade to Launch Bitcoin and ...  Online stock brokerage E*Trade is preparing to...  Online stock brokerage E*Trade Financial is sa... 
20190430  Want to Understand Bitfinex? Understand Mt. Go...  People interested in understanding Bitfinex ar...  Daniel Cawrey is chief executive officer of Pa... 
20190501  Where Crypto Exchanges Are Beating the Bear Ma...  Turkish exchanges OKEx and BtcTurk are onboard...  With the Turkish lira dropping to a sixmonth ... 
20190502  When Tether Warnings Are Marketing Tools 3 Pri...  Stablecoin issuers are taking advantage of Tet...  “We’re in a scary place, because if Tether’s p... 
1  print(df_eod.iloc[0].content) 
Researchers at Johns Hopkins University are proposing a cryptographic extension to bitcoin that could enable fully anonymous transactions on the network. The extension, called Zerocoin, works – as NewScientist explains it – by “allowing bitcoin users to leave their coins floating on the network for someone else to redeem, on the condition that they can redeem the same amount of bitcoin, similarly left floating on the network, at an arbitrary time in the future.”
Step 1: Get everything in its lowercase (if possible).
1  df_eod.title = df_eod.title.str.lower() 
date  title  abstract  content 

20130415  ‘zerocoin’ widget promises bitcoin privacy  researchers at johns hopkins university are pr...  researchers at johns hopkins university are pr... 
20130416  opencoin aims to build a better digital curren...  digital currency promises to solve some proble...  digital currency promises to solve some proble... 
20130417  $500k in funding paves way for bitcoin trading  new york citybased bitcoin startup coinsetter...  new york citybased bitcoin startup coinsetter... 
20130422  argentina trades $50k of bitcoins bitcoin pric...  a record $50,000 of bitcoins were traded in ar...  a record $50,000 of bitcoins were traded in ar... 
20130423  mt gox ceo on bitcoin’s future altcurrency fi...  mark karpeles gives a great explanation of how...  mark karpeles gives a great explanation of how... 
...  ...  ...  ... 
20190426  how crypto markets are reacting to the tether...  the crypto markets endured a loss of as much a...  the cryptocurrency markets endured a loss of a... 
20190429  us stock broker e*trade to launch bitcoin and ...  online stock brokerage e*trade is preparing to...  online stock brokerage e*trade financial is sa... 
20190430  want to understand bitfinex? understand mt. go...  people interested in understanding bitfinex ar...  daniel cawrey is chief executive officer of pa... 
20190501  where crypto exchanges are beating the bear ma...  turkish exchanges okex and btcturk are onboard...  with the turkish lira dropping to a sixmonth ... 
20190502  when tether warnings are marketing tools 3 pri...  stablecoin issuers are taking advantage of tet...  “we’re in a scary place, because if tether’s p... 
1  print(df_eod.iloc[0].content) 
researchers at johns hopkins university are proposing a cryptographic extension to bitcoin that could enable fully anonymous transactions on the network. the extension, called zerocoin, works – as newscientist explains it – by “allowing bitcoin users to leave their coins floating on the network for someone else to redeem, on the condition that they can redeem the same amount of bitcoin, similarly left floating on the network, at an arbitrary time in the future.”
Step 2: Remove trivial abbreviations starting with '
e.g. 's
, 'll
. Join all \n
(paragraphs).
1  remove_abbr = lambda x: ' '.join([_.split('’')[0] for _ in x.split()]) 
date  title  abstract  content 

20130415  ‘zerocoin widget promises bitcoin privacy  researchers at johns hopkins university are pr...  researchers at johns hopkins university are pr... 
20130416  opencoin aims to build a better digital curren...  digital currency promises to solve some proble...  digital currency promises to solve some proble... 
20130417  $500k in funding paves way for bitcoin trading  new york citybased bitcoin startup coinsetter...  new york citybased bitcoin startup coinsetter... 
20130422  argentina trades $50k of bitcoins bitcoin pric...  a record $50,000 of bitcoins were traded in ar...  a record $50,000 of bitcoins were traded in ar... 
20130423  mt gox ceo on bitcoin future altcurrency firm...  mark karpeles gives a great explanation of how...  mark karpeles gives a great explanation of how... 
...  ...  ...  ... 
20190426  how crypto markets are reacting to the tether...  the crypto markets endured a loss of as much a...  the cryptocurrency markets endured a loss of a... 
20190429  us stock broker e*trade to launch bitcoin and ...  online stock brokerage e*trade is preparing to...  online stock brokerage e*trade financial is sa... 
20190430  want to understand bitfinex? understand mt. go...  people interested in understanding bitfinex ar...  daniel cawrey is chief executive officer of pa... 
20190501  where crypto exchanges are beating the bear ma...  turkish exchanges okex and btcturk are onboard...  with the turkish lira dropping to a sixmonth ... 
20190502  when tether warnings are marketing tools 3 pri...  stablecoin issuers are taking advantage of tet...  “we in a scary place, because if tether pretty... 
1  print(df_eod.iloc[0].content) 
researchers at johns hopkins university are proposing a cryptographic extension to bitcoin that could enable fully anonymous transactions on the network. the extension, called zerocoin, works – as newscientist explains it – by “allowing bitcoin users to leave their coins floating on the network for someone else to redeem, on the condition that they can redeem the same amount of bitcoin, similarly left floating on the network, at an arbitrary time in the future.”
Step 3: Keep only alphabetical symbols.
1  alph_space = lambda x: (ord('a') <= ord(x) <= ord('z')) or x == ' ' 
date  title  abstract  content 

20130415  zerocoin widget promises bitcoin privacy  researchers at johns hopkins university are pr...  researchers at johns hopkins university are pr... 
20130416  opencoin aims to build a better digital curren...  digital currency promises to solve some proble...  digital currency promises to solve some proble... 
20130417  k in funding paves way for bitcoin trading  new york citybased bitcoin startup coinsetter ...  new york citybased bitcoin startup coinsetter ... 
20130422  argentina trades k of bitcoins bitcoin prices ...  a record of bitcoins were traded in argentina...  a record of bitcoins were traded in argentina... 
20130423  mt gox ceo on bitcoin future altcurrency firm ...  mark karpeles gives a great explanation of how...  mark karpeles gives a great explanation of how... 
...  ...  ...  ... 
20190426  how crypto markets are reacting to the tetherb...  the crypto markets endured a loss of as much a...  the cryptocurrency markets endured a loss of a... 
20190429  us stock broker etrade to launch bitcoin and e...  online stock brokerage etrade is preparing to ...  online stock brokerage etrade financial is sai... 
20190430  want to understand bitfinex understand mt gox ...  people interested in understanding bitfinex ar...  daniel cawrey is chief executive officer of pa... 
20190501  where crypto exchanges are beating the bear ma...  turkish exchanges okex and btcturk are onboard...  with the turkish lira dropping to a sixmonth l... 
20190502  when tether warnings are marketing tools pric...  stablecoin issuers are taking advantage of tet...  we in a scary place because if tether pretty s... 
1  print(df_eod.iloc[0].content) 
researchers at johns hopkins university are proposing a cryptographic extension to bitcoin that could enable fully anonymous transactions on the network the extension called zerocoin works as newscientist explains it by allowing bitcoin users to leave their coins floating on the network for someone else to redeem on the condition that they can redeem the same amount of bitcoin similarly left floating on the network at an arbitrary time in the future
Step 4: Remove all prepositions and other auxiliary words (aka "stop words" in NLP).
1  replace_stop_words = lambda x: ' '.join(_ for _ in x.split() if _ not in STOPWORDS) 
date  title  abstract  content 

20130415  zerocoin widget promises bitcoin privacy  researchers johns hopkins university proposing...  researchers johns hopkins university proposing... 
20130416  opencoin aims build better digital currency me...  digital currency promises solve problems real ...  digital currency promises solve problems real ... 
20130417  funding paves way bitcoin trading  new york citybased bitcoin startup coinsetter ...  new york citybased bitcoin startup coinsetter ... 
20130422  argentina trades bitcoins bitcoin prices yoyo ...  record bitcoins traded argentina second week a...  record bitcoins traded argentina second week a... 
20130423  mt gox ceo bitcoin future altcurrency firm biz...  mark karpeles gives great explanation foot bit...  mark karpeles gives great explanation foot bit... 
...  ...  ...  ... 
20190426  crypto markets reacting tetherbitfinex allegat...  crypto markets endured loss much billion aroun...  cryptocurrency markets endured loss much billi... 
20190429  us stock broker etrade launch bitcoin ether tr...  online stock brokerage etrade preparing launch...  online stock brokerage etrade financial said p... 
20190430  want understand bitfinex understand mt gox bit...  people interested understanding bitfinex wells...  daniel cawrey chief executive officer pactum c... 
20190501  crypto exchanges beating bear market digging d...  turkish exchanges okex btcturk onboarding thou...  turkish lira dropping sixmonth low dollar last... 
20190502  tether warnings marketing tools price hurdles ...  stablecoin issuers taking advantage tethers tr...  scary place tether pretty systematically embed... 
1  print(df_eod.iloc[0].content) 
researchers johns hopkins university proposing cryptographic extension bitcoin enable fully anonymous transactions network extension called zerocoin works newscientist explains allowing bitcoin users leave coins floating network someone redeem condition redeem amount bitcoin similarly left floating network arbitrary time future
Step 5: Convert dataframe entries into lists (or "bags") of words
1  df_eod = df_eod.applymap(str.split).applymap(sorted) 
date  title  abstract  content 

20130415  [bitcoin, privacy, promises, widget, zerocoin]  [anonymous, bitcoin, cryptographic, enable, ex...  [allowing, amount, anonymous, arbitrary, bitco... 
20130416  [aims, behind, better, bitcoin, build, competi...  [among, beyond, bitcoin, currency, currency, d...  [abound, accepts, according, actually, advanta... 
20130417  [bitcoin, funding, paves, trading, way]  [bitcoin, citybased, closing, coinsetter, firs...  [barry, ben, bitcoin, bitcoin, bitcoin, bitcoi... 
20130422  [argentina, bitcoin, bitcoins, bites, bitfloor...  [account, alternative, ancient, announcing, ap...  [able, abroad, according, account, account, ac... 
20130423  [altcurrency, bitcoin, bizx, ceo, firm, future...  [alternative, bitcoin, bitcoin, bizx, buzz, co...  [accelerator, access, account, account, accoun... 
...  ...  ...  ... 
20190426  [allegations, bitfinex, cfo, crypto, funds, ma...  [allegations, around, attorney, billion, bitfi...  [access, according, according, according, acco... 
20190429  [aims, bid, bitcoin, bitcoin, bounce, broker, ...  [according, bitcoins, bleak, bloomberg, broker...  [abort, access, according, according, accordin... 
20190430  [adding, bitcoin, bitfinex, bitfinex, closes, ...  [according, adding, aspiring, bitcoin, bitcoin...  [able, able, acceptance, acceptance, access, a... 
20190501  [alkemi, alliance, april, bear, beating, bigge...  [ago, aimed, april, argues, bitcoin, bitwise, ...  [access, access, accessible, according, accord... 
20190502  [backed, bitcoin, blockchainpowered, complicat...  [advantage, alternates, attractive, bitcoin, c...  [able, aborted, access, accompanied, according... 
1  print(df_eod.iloc[0].content) 
['allowing', 'amount', 'anonymous', 'arbitrary', 'bitcoin', 'bitcoin', 'bitcoin', 'called', 'coins', 'condition', 'cryptographic', 'enable', 'explains', 'extension', 'extension', 'floating', 'floating', 'fully', 'future', 'hopkins', 'johns', 'leave', 'left', 'network', 'network', 'network', 'newscientist', 'proposing', 'redeem', 'redeem', 'researchers', 'similarly', 'someone', 'time', 'transactions', 'university', 'users', 'works', 'zerocoin']
In this part we'll introduce and implement the socalled hybrid model. We mainly follow the algorithm introduced in Ke, Kelly and Xiu (2019). To keep consistency, we use the same notations as used in the paper. In their paper, the underlying assets are stocks; and news data for stocks, unlike them for cryptocurrency, are abundant. To tailor the model without compromising the virtues of it, we keep the algorithm of the model but modify the definitions slightly. In the paper, \(p\) is defined as the sentiment score of an article; in our research, we define \(p\) to be the sentiment score of all the articles in a day. \(\bs{O}^+\) and \(\bs{O}^\) are the distributions of the positive and negative sentiment topics respectively in the paper. We instead use them to represent the positive and negative sentiment topics in a day.
First let's take a look at the price data.
1  price = pd.read_csv('data/price.csv', index_col=0) 
date  BTC  DOGE  FTC  IXC  LTC  MEC  NMC  NVC  NXT  OMNI  PPC  XPM  XRP 

20131224  665.58  0.000686  0.272888  0.122467  17.64  0.496110  4.130000  12.050000  0.038054  107.16  3.320000  2.220000  0.022575 
20131225  682.21  0.000587  0.286528  0.115989  21.15  0.534457  4.630000  12.930000  0.064458  132.42  3.460000  2.340000  0.022042 
20131226  761.98  0.000602  0.365750  0.129925  24.76  0.838330  5.330000  14.360000  0.084504  186.70  3.980000  2.750000  0.024385 
20131227  735.07  0.000522  0.338132  0.126579  23.27  0.852615  4.840000  13.640000  0.071800  181.66  3.810000  2.570000  0.027076 
20131228  727.83  0.000459  0.334802  0.120077  22.56  0.816654  4.810000  13.430000  0.060045  170.10  4.250000  2.580000  0.027303 
...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ... 
20190516  7884.91  0.003168  0.023343  0.022766  95.59  0.008262  0.384813  0.808549  0.036080  2.77  0.392510  0.246049  0.419707 
20190517  7343.90  0.002962  0.019941  0.022766  89.00  0.008841  0.384493  0.755183  0.032642  2.62  0.383351  0.231594  0.386193 
20190518  7271.21  0.003005  0.020308  0.022766  86.70  0.007275  0.547807  0.745842  0.032121  2.35  0.390182  0.230064  0.372736 
20190519  8197.69  0.003178  0.022818  0.022766  95.32  0.008361  0.659869  0.840955  0.034374  2.47  0.442125  0.265329  0.417700 
20190520  7978.31  0.003041  0.021786  0.015150  91.49  0.007867  0.583019  0.823103  0.032977  2.32  0.424123  0.237132  0.398003 
And below is the market cap data.
1  mkcap = pd.read_csv('data/mkcap.csv', index_col=0) 
date  BTC  DOGE  FTC  IXC  LTC  MEC  NMC  NVC  NXT  OMNI  PPC  XPM  XRP 

20131224  8.1e+09  9.076e+06  7.476e+06  2.115e+06  4.275e+08  1.063e+07  3.133e+07  6.412e+06  3.805e+07  6.638e+07  6.962e+07  8.238e+06  1.765e+08 
20131225  8.305e+09  8.194e+06  7.885e+06  2.005e+06  5.131e+08  1.146e+07  3.515e+07  6.898e+06  6.446e+07  8.203e+07  7.252e+07  8.73e+06  1.723e+08 
20131226  9.28e+09  8.837e+06  1.01e+07  2.249e+06  6.014e+08  1.798e+07  4.046e+07  7.681e+06  8.45e+07  1.157e+08  8.341e+07  1.03e+07  1.906e+08 
20131227  8.955e+09  8.017e+06  9.38e+06  2.193e+06  5.661e+08  1.83e+07  3.684e+07  7.309e+06  7.18e+07  1.125e+08  7.986e+07  9.642e+06  2.117e+08 
20131228  8.87e+09  7.374e+06  9.328e+06  2.082e+06  5.496e+08  1.754e+07  3.662e+07  7.216e+06  6.005e+07  1.054e+08  8.917e+07  9.746e+06  2.135e+08 
...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ... 
20190516  1.396e+11  3.787e+08  5.543e+06  4.796e+05  5.908e+09  3.142e+05  5.671e+06  1.889e+06  3.604e+07  1.555e+06  9.953e+06  6.795e+06  1.768e+10 
20190517  1.3e+11  3.541e+08  4.737e+06  4.796e+05  5.502e+09  3.363e+05  5.666e+06  1.764e+06  3.261e+07  1.47e+06  9.722e+06  6.399e+06  1.627e+10 
20190518  1.288e+11  3.593e+08  4.827e+06  4.796e+05  5.361e+09  2.768e+05  8.073e+06  1.742e+06  3.209e+07  1.321e+06  9.895e+06  6.359e+06  1.57e+10 
20190519  1.452e+11  3.801e+08  5.426e+06  4.796e+05  5.895e+09  3.181e+05  9.724e+06  1.964e+06  3.434e+07  1.387e+06  1.121e+07  7.336e+06  1.76e+10 
20190520  1.413e+11  3.637e+08  5.183e+06  3.192e+05  5.66e+09  2.993e+05  8.592e+06  1.923e+06  3.294e+07  1.304e+06  1.076e+07  6.56e+06  1.676e+10 
Now we construct the momentum strategy with \(n=3\) coins for both long and short sides, and a lookback window of \(d=60\) days to determine the best and worstperforming coins. This type of classical strategies was popularized before Richard Donchian, i.e. mid 20th century, and got rather prevalent around 2000 (Antonacci, 2014). Due to the diminishing market cap of some coins, we limit our momentum strategy to hold only top 10 coins considering market caps. For more detailed analysis on the efficiency of momentum strategies in the cryptocurrency market, see Rohrbach, Suremann & Osterrieder (2017).
The selected coins in the long and short positions are listed below (training part only), together with a histogram illustrating the empirical distribution of our momentum strategy. The table below the figure, which contains three entries, words
(list of words on that day), r
(momentum return) and p
(rank score of r
), is the result of our preliminary data preparation (training part only).
1  n, d = 3, 60 
date  coin1  coin2  coin3  coin4  coin5  coin6 

20140222  DOGE  NXT  OMNI  LTC  NVC  XRP 
20140223  DOGE  NXT  OMNI  LTC  NVC  XRP 
20140224  DOGE  NXT  OMNI  NVC  LTC  XRP 
20140225  DOGE  NXT  PPC  MEC  XRP  NVC 
20140226  DOGE  NXT  PPC  MEC  XRP  LTC 
...  ...  ...  ...  ...  ...  ... 
20190516  IXC  BTC  FTC  NMC  PPC  XPM 
20190517  IXC  FTC  BTC  NMC  PPC  XPM 
20190518  IXC  BTC  FTC  NMC  PPC  XPM 
20190519  IXC  BTC  FTC  PPC  XPM  NMC 
20190520  IXC  BTC  FTC  PPC  XPM  NVC 
Benchmark strategy has mean return 0.30% with std 0.12
date  words  r  p 

20140222  [address, aggregators, along, amount, antinucl...  0.051496  0.961217 
20140223  [according, additional, akin, allocated, along...  0.018420  0.236502 
20140224  [able, able, abolished, absence, accept, accep...  0.005015  0.650951 
20140225  [abiding, abiding, able, able, abruptly, accel...  0.012776  0.842586 
20140226  [abandoned, able, able, able, able, abrupt, ac...  0.028932  0.525475 
...  ...  ...  ... 
20171212  [able, access, access, access, accessibility, ...  0.303756  0.006084 
20171213  [abort, accept, accept, accepts, according, ac...  0.008222  0.453232 
20171214  [according, according, accounts, acknowledge, ...  0.097781  0.657795 
20171215  [action, ahead, already, altcoins, alternative...  0.058729  0.178707 
20171217  [accepted, accepting, access, accounts, accumu...  0.045690  0.218251 
Daily returns are positive for about half of the time and negative for the rest of the time. They follow a normal distribution with mean near 0.
Relative Strength Index (RSI) is a momentum indicator that measures the magnitude of recent price changes. Typically, RSIs above 70 or below 30 are considered overbought or oversold. The plot below shows RSIs of the coins, the values lie in between 30 and 70 most of the time. From our understanding, it does not imply strong momentum, which also agrees with the return results above. Because of the dissatisfying return results, we are motivated to make use of news data to help us determine our positions in the portfolio formed by the momentum strategy.
1  upcloses = lambda x: x[x > 0].mean() 
date  BTC  DOGE  FTC  IXC  LTC  MEC  NMC  NVC  NXT  OMNI  PPC  XPM  XRP 

20140221  49.058925  65.156681  53.862094  47.657728  52.325117  54.570707  51.521608  45.433964  62.574567  56.648806  58.893839  51.256646  54.260447 
20140222  49.522637  64.596253  53.460666  48.032722  52.782995  56.676395  51.591070  45.009183  62.061175  57.091294  58.428030  51.131053  54.255182 
20140223  48.945374  65.151465  52.941467  48.025571  49.752735  55.938327  50.257133  43.981559  58.732269  56.156984  58.260900  50.581732  54.383604 
20140224  45.481897  65.836615  49.930642  46.876769  46.763598  48.205328  48.231589  42.018331  57.449330  53.390006  56.612380  47.974323  50.894997 
20140225  45.934169  66.236415  49.859535  45.478231  46.928958  49.006769  49.343941  42.250401  57.224469  53.054030  56.703915  48.377624  48.364130 
...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ... 
20190516  61.091263  61.702769  60.626529  60.119838  68.908869  57.451714  39.882253  61.219468  52.235643  52.792051  45.629136  64.855576  65.275916 
20190517  57.887408  61.331565  59.917963  59.276013  67.447653  58.057233  40.322335  59.049640  50.480915  53.292698  45.266453  64.395536  62.578782 
20190518  58.808677  61.532147  58.773376  59.728521  68.151764  56.264933  46.899272  60.010127  50.113854  53.579574  45.828418  65.141129  62.913106 
20190519  61.600190  61.397682  59.773348  59.821076  69.692913  57.514977  48.673904  62.446071  50.531950  52.962195  49.985647  65.180359  65.565234 
20190520  60.822707  60.249918  59.637880  58.087599  69.291624  57.013209  47.243223  62.012823  50.487780  53.777199  49.512879  64.345972  64.806905 
Now we may finally start to implement the hybrid model. The algorithm is illustrated below. Instead of title
or abstract
, here we test on content
directly. Contents contain more information than titles or abstracts only, thus enlarging our sample set.
Step 0: Calculate the percentage of positive returns on each word \(f_j\). Filter words occurring too infrequently. Keep only the top \(\alpha_+\) and bottom \(\alpha_\) as the sentimentcharged words. Assume we have in total \(k\) such words.
1  words_all = sorted(list(set(w for words in df_train.words for w in words))) 
word  alphapoint  korean  masters  argentina  wilson  ...  mccaleb  atlas  coinjar  mintpal  steem 

r  0.117647  0.162500  0.178082  0.178571  0.187500  ...  0.754717  0.796610  0.816901  0.826923  0.96 
k  51.000000  80.000000  73.000000  56.000000  64.000000  ...  53.000000  59.000000  71.000000  52.000000  50.00 
p  0.000000  0.000516  0.001032  0.001548  0.002064  ...  0.997936  0.998452  0.998968  0.999484  1.00 
Step 1 (Training): For each date \(\newcommand{\bs}{\mathbf}\bs{d}_i = (d_{i,1}, d_{i,2},\ldots,d_{i,k})'\) where \(d_{i,j}\) is the count of sentimentcharged word \(w_j\) in this day, assume model
\[\bs{d}_i \sim \text{Multinomial}\left[k, \textstyle{\sum_{j=1}^{k}} d_{i,j}, p_i \bs{O}^+ + (1  p_i) \bs{O}^\right].\]
Where \(\bs{O}^+\) is a "positive sentiment day", and describes expected counts of words in a maximally positive sentiment day (one for which \(p_i\) = 1). Likewise, \(\bs{O}^\) is a "negative sentiment day" that describes the distribution of word probabilities in maximally negative days (those for which \(p_i\) = 0).
Calculate the rank scores of all days as \(p_i\), then estimate \(\bs{O}^+\) and \(\bs{O}^\).
1  alpha = .1 
There are 194 positive words, 194 negative words
1  font_path = 'misc/Palatino.ttf' 
The generated positive and negative word sets are shown above. The bigger the word appears, the more positive or negative it is. Let's look at some examples:
The left green diagram gives all the positive words with \(p_i > 1  \alpha\). The biggest word in the middle is "profit". Most articles containing it express an optimistic view. Examples include "generate a $150,000 profit in bitcoins", and "taking profit from legitimate miners". Other positives words include "boost", "financing", and etc. "Venture", which usually indicates risk, occupies the most space of the right red diagram.
We cannot deny the fact that the relationship between words and returns may be specious. Problems also occur when words such as "not" negate the meaning of sentences. But overall, the classification of words works well with sufficient rationale behind.
We can then estimate \(\bs{d}\).
1  keywords = pos_words + neg_words 
1  df_train['d'] = df_train.words.apply(count_words).values 
date  words  r  p  d 

20140222  [address, aggregators, along, amount, antinucl...  0.051496  0.961217  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 
20140223  [according, additional, akin, allocated, along...  0.018420  0.236502  [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ... 
20140224  [able, able, abolished, absence, accept, accep...  0.005015  0.650951  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ... 
20140225  [abiding, abiding, able, able, abruptly, accel...  0.012776  0.842586  [0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, ... 
20140226  [abandoned, able, able, able, able, abrupt, ac...  0.028932  0.525475  [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 3, ... 
...  ...  ...  ...  ... 
20171212  [able, access, access, access, accessibility, ...  0.303756  0.006084  [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 
20171213  [abort, accept, accept, accepts, according, ac...  0.008222  0.453232  [0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, ... 
20171214  [according, according, accounts, acknowledge, ...  0.097781  0.657795  [0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, ... 
20171215  [action, ahead, already, altcoins, alternative...  0.058729  0.178707  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 
20171217  [accepted, accepting, access, accounts, accumu...  0.045690  0.218251  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ... 
Now we estimate \(\newcommand{\d}{\text{d}}\bs{O}^{+/}\) using \(\bs{d}\) and \(p\).
Estimation by MLE:
The likelihood is given by
\[L(\bs{d};\bs{O^{+/}}) \propto \textstyle{\prod_i\prod_{j=1}^k} [p_i O_j^+ + (1  p_i) O_j^]^{d_{i,j}}\]
and thus the loglikelihood is
\[LL(\bs{d};\bs{O^{+/}}) = C + \textstyle{\sum_i \sum_{j=1}^k} d_{i,j} \log[p_i O_j^+ + (1  p_i) O_j^]\]
where \(C\) is not a function of \(\bs{O}^{+/}\). Therefore, for each keyword \(w_h\),
\[\frac{\partial LL}{\partial O_h^+} =\textstyle{\sum_i \sum_{j=1}^k} \frac{d_{i,j}p_i\mathbf{1}\{j=h\}}{p_i O_j^+ + (1  p_i) O_j^} =\textstyle{\sum_i} \frac{d_{i,h}p_i}{p_i O_h^+ + (1  p_i) O_h^} > 0\]
and
\[\frac{\partial LL}{\partial O_h^} =\textstyle{\sum_i \sum_{j=1}^k} \frac{d_{i,j}(1p_i)\mathbf{1}\{j=h\}}{p_i O_j^+ + (1  p_i) O_j^} =\textstyle{\sum_i} \frac{d_{i,h}(1p_i)}{p_i O_h^+ + (1  p_i) O_h^} > 0.\]
The optimization problem, therefore, is
\[\begin{align*}\max_{\bs{O}^{+/}}\ & LL(\bs{d};\bs{O}^{+/})\\\text{s.t. } & \textstyle{\sum_{j=1}^k} O_j^+ = \textstyle{\sum_{j=1}^k} O_j^ = 1\\ & O_j^+ \ge 0 ,\quad O_j^ \ge 0,\quad \forall 1 \le j \le k.\end{align*}\]
Instead of solving this problem using the KarushKuhnTucker Conditions, we may as well do the following transform: let
\[O_j^{+/}:=\exp(Q_j^{+/})\]
so that the unknowns are now \(Q_j^{+/}\in\mathbb{R}\) and we no longer need any inequality constraints. The Lagrangian is
\[f(\bs{Q}^{+/}) = LL\!\left[\bs{d};\exp(\bs{Q}^{+/})\right] + \lambda^+\!\left[1  \textstyle{\sum_{j=1}^k} \exp(Q_j^+)\right] + \lambda^\!\left[1  \textstyle{\sum_{j=1}^k} \exp(Q_j^)\right]\]
and the FOCs are \(2k\) equations
\[0 = \frac{\partial f}{\partial Q_j^{+/}} =\frac{\partial LL}{\partial O_j^{+/}}\frac{\d O_j^{+/}}{\d Q_j^{+/}}  \lambda^{+/} \frac{\d O_j^{+/}}{\d Q_j^{+/}}. \quad (1 \le j \le k)\]
The FOCs can be simplified into
\[\frac{\partial LL}{\partial O_1^{+/}} =\frac{\partial LL}{\partial O_2^{+/}} =\cdots =\frac{\partial LL}{\partial O_k^{+/}} = \lambda^{+/}\]
which is still extremely hard to solve analytically.
Estimation by Regression:
This method is proposed in Ke, Kelly and Xiu (2019).
Assume there're in total \(\newcommand{\T}{\mathsf{T}}n\) days. Define
\[\bs{W} = \begin{bmatrix}p_1 & p_2 & \cdots & p_n\\1  p_1 & 1  p_2 & \cdots & 1  p_n\\\end{bmatrix}^\T\]
and \(\bs{D} = \left(\hat{\bs{d}}_1, \hat{\bs{d}}_1, \cdots, \hat{\bs{d}}_n\right)^{\!\T}\) where \(\hat{\bs{d}}_i = \bs{d}_i / \textstyle{\sum_j} d_{i,j}\). Then \(\bs{O}^{+/}\) would be estimated by regressing \(\bs{D}\) on \(\bs{W}\), namely
\[\hat{\bs{O}}^{+/} = (\bs{W}^\T \bs{W})^{1} \bs{W}^\T \bs{D}.\]
See proof of accuracy in the appendix of the paper.
1  def print_array(array): 
1  p = df_train.p.values 
W has shape (921, 2): [0.96121673 0.03878327] [0.23650190 0.76349810] [0.65095057 0.34904943] ... [0.65779468 0.34220532] [0.17870722 0.82129278] [0.21825095 0.78174905]
1  D = np.vstack(df_train.d.values) 
D has shape (921, 388): [0.00000000 0.00000000 0.00000000 ... 0.00000000 0.06666667 0.00000000] [0.00000000 0.00000000 0.00000000 ... 0.00000000 0.00000000 0.00000000] [0.00000000 0.00000000 0.00000000 ... 0.00000000 0.00000000 0.00000000] ... [0.00000000 0.00000000 0.00000000 ... 0.00000000 0.00000000 0.00000000] [0.00000000 0.00000000 0.00000000 ... 0.00000000 0.00000000 0.00000000] [0.00000000 0.00000000 0.00000000 ... 0.00000000 0.00000000 0.01219512]
1  O_pos, O_neg = np.maximum(np.linalg.pinv(W.T @ W) @ (W.T @ D), 0) 
O± has shape (388, 2): [0.00188745 0.00293239] [0.00111679 0.00332143] [0.00349693 0.00450815] ... [0.00396731 0.00174460] [0.00407812 0.00318298] [0.00753035 0.00733650]
Now that we have already estimated \(\bs{O}^{+/}\), we can start predicting outofsample \(p\), which indicates the expected rank of performance of a certain day. We're solving \(p\) from the following optimization problem
\[\hat{p} = \arg\!\max_{p\in[0,1]} \left\{\sum_{j=1}^k \hat{d}_j \log\!\left[p O_j^+ + (1  p) O_j^\right] + \lambda \log\left[p(1p)\right] \right\}\]
where \(\hat{\bs{d}}=\bs{d} / \textstyle{\sum_j}d_j\) like defined above, and \(\lambda>0\) is a hyperparameter for tuning. Let's call it the penalized loglikelihood pen_ll
.
1  def pen_ll(p, d, lamda): 
Tune \(\lambda\) using training set:
1  def mse_p(lamda): 
1  best_lamda = minimize(mse_p, x0=0, method='NelderMead', bounds=(0, None)).x[0] 
3.6398125000000046
Step 2 (Testing): For each date \(\bs{d}_i\) observed and \(\bs{O}^{+/}\) fixed, estimate \(p_i\) as our final indicator. Specifically, we here use the best \(\lambda\) we just estimated above to predict the testing set.
1  df_test['d'] = df_test.words.apply(count_words).values 
date  words  r  p  p_hat  d 

20171218  [according, according, according, according, a...  0.127551  0.234221  0.313379  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ... 
20171219  [ability, able, able, able, abrupt, absolutely...  0.005238  0.561217  0.560449  [1, 0, 0, 0, 0, 0, 1, 2, 0, 1, 1, 0, 0, 0, 1, ... 
20171220  [able, able, abruptly, abruptly, accelerate, a...  0.064455  0.574144  0.343652  [0, 0, 0, 1, 0, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, ... 
20171221  [able, according, according, according, accoun...  0.151503  0.171863  0.313184  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ... 
20171222  [able, abound, acceptance, accepting, acceptin...  0.083399  0.768061  0.340234  [4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, ... 
...  ...  ...  ...  ...  ... 
20190426  [access, according, according, according, acco...  0.058712  0.466920  0.470312  [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 
20190429  [abort, access, according, according, accordin...  0.041062  0.963498  0.416309  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ... 
20190430  [able, able, acceptance, acceptance, access, a...  0.011006  0.533080  0.628125  [0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, ... 
20190501  [access, access, accessible, according, accord...  0.058620  0.979468  0.484668  [1, 1, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, ... 
20190502  [able, aborted, access, accompanied, according...  0.006681  0.374144  0.445898  [0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, ... 
Finally, given the predicted p_hat
values, we can trade based on certain sizing functions and calculate the cumulative return as well as all other trading related statistics of our strategy. We will also calculate these for benchmark strategies.
A short description of benchmark strategies and our hybrid sentimentmomentum strategy:
Buyandhold: This strategy simply buys Bitcoin and hold for the whole trading period. Bitcoin is the most commonly traded coin in the cryptocurrency market and is representative of the market. Thus, we choose Bitcoin as the asset we buy and hold.
Momentum: As explained above, the momentum strategy takes long and short positions separately in the top n
and bottom n
coins. Here, by "top" and "bottom", we mean the best and worstperforming coins in the lookback window of d
trading days. Specifically, we take n=3
and d=60
.
Momentum Reversal: The momentum reversal strategy does exactly the opposite of the momentum strategy. It takes short positions in the top n
coins and long positions in the bottom n
coins. The definition and values of n
and d
remain the same.
Hybrid SentimentMomentum: Our hybrid sentimentmomentum strategy combines sentiment analysis and momentum trading strategy. Using the core idea of momentum, the strategy ranks all the coins based on their previous price changes and then forms a portfolio that enters long positions in coins that headed up and short positions in those that went down. This step is the same as a usual momentum trading strategy. Taking advantage of the sentiment analysis results on present da's news, the strategy then decides on how many units of the portfolio to buy or sell.
1  size = lambda x: x * 2  1 # actual size of position based on predicted p values 
Statistics  Buy & Hold  Momentum  Mom. Rev.  Strategy 

Win%  0.524051  0.437975  0.562025  0.536709 
Sharpe  0.975205  0.605331  0.605331  2.677082 
MDD  0.826419  0.970956  0.947963  0.206906 
As can be seen in the figure above, the momentum strategy performed almost as poorly as a simple buy & hold throughout the bearish year of 2018. The momentum reversal strategy, on the other hand, performed extremely well at the first half year and then collapsed with the market. The partially reflects the existence of price overreactions as suggested by Caporale and Plastun (2018). Our hybrid strategy, on the other hand, outperformed all of them by over 100% cumulative returns by the end of backtest, and gave a Sharpe ratio of 2.68 when best of the rest, which is given by momentum reversal, is merely 0.60 or so. Volatility plot also shows that by adding sentiment analysis to determine position sides and sizes of the portfolio, the hybrid strategy significantly reduces the risks that existed in the momentum strategy. Also, considering the maximum drawdowns, all of the three benchmarks gave gigantic drawdowns in this unfortunate year, while our strategy's was well limited under 20% most of the time.
The following parameters are included in our model. We want to test how the model reacts towards changes in them.
lookback_window
is the number of days we feed our model for training,minimum_occurrence
is the minimum number of occurrences for a word to be considered sentimentcharged,minimum_length
is the minimum acceptable length of a word to be considered sentimentcharged,alpha
is the top (and bottom) percentile for sentimentcharged word selectionsize
is a univariate function mapping \([0,1]\subset\mathbb{R}\) to \([1,1]\subset\mathbb{R}\).1  Image('misc/lookback_window.png') 
A longer lookback window provides more information in the past that can be used to help decisionmaking on forming the portfolio. However, the longer the window, the more stale information is included. The empirical result tells us using past 60 day' information to determine the best and worst performing coins eventually leads to the highest Win rate and Sharpe Ratio together with the lowest Maximum Drawdown.
1  Image('misc/minimum_occurrence.png') 
Higher minimum occurrence implies a stricter constraint on selecting sentimentcharged words. While noises coming from words which do not show up frequently are diminished, if minimum occurrence is set too high, some useful information is left out. Moreover, a very small pool of sentimentcharged words will cause the training and predicting to be extremely hard and even impossible.
From the results, we can conclude that setting minimum occurrence equal to 50 generates better Sharpe Ratio and Maximum Drawdown than setting it to 10 or 100.
1  Image('misc/minimum_length.png') 
Many of the short words, those whose lengths are 2 or 3, are not really informative, thus conveying nearly no sentiment. For instance, words such as "a", "is", and "the" show up in almost every piece of news multiple times. Whereas they don't offer much help in understanding the sentiment of the news, so they can be eliminated without hurting the results. Of course, this simple criterion is arbitrary, thus needs to be tested. Based on the results we get, different minimum length values don't impact the strategy statistics significantly.
1  Image('misc/alpha.png') 
Alpha is used to define the percentage of sentimentcharged words that are positive or negative. Similar to minimum occurrence, a strategy with a larger alpha includes more information into the analysis and vice versa. An alpha of 0.05 gives the best results as shown above.
For sizing function:
Trinary has much worse performance than binary and continuous as both Win rate and Sharpe ratio are worse for Trinary than for the other two. While continuous and binary are similar on Win rate and Sharpe ratio, continuous always has smaller Maximum Drawdowns. Therefore, it is a more stable strategy and is thus best for riskaverse investors.
For a more comprehensive practice of hyperparameter exploration, we summarized the scripts above into a local module backtest
for convenient backtesting. The general usage is summarized as below:
1  from misc.backtest import BacktestEngine 
See Appendix C for the complete code and the detailed parameter testing outputs.
Momentum trading strategy is essentially buying on gainers and selling on losers. Due to the low liquidity and slow information dissemination nature of cryptocurrency trading, trading on momentum is, theoretically speaking, profitable. However, there is no guarantee that what makes gains will continue being profitable and what generates losses will keep losing money. Therefore, a risk management technique to prevent the strategy from taking huge amount of unexpected losses is a musthave with no doubt.
One of the most common approaches to controlling risks is to include a stoploss feature. A usual way to apply a stoploss is to calculate returns, determine if it falls below the threshold and then exit the position if it does. Here is where our strategy differs. It does not need an additional stoploss as a stoploss feature is embedded in the strategy already. The reason is that the underlying momentum strategy invests in a dynamic portfolio, meaning that all the positions the strategy enters on the previous day will be closed today no matter how the performance is on the previous day. In other words, the momentum strategy does not pick the same set of coins to invest from day to day, previous losses will not accumulate in following days.
In this research paper, we tested momentum and momentum reversal strategies in the cryptocurrency market and improved the traditional momentum strategy by introducing a brandnew sentimentbased model. The model significantly improved the general strategy scenario. It turned out that our hybrid model can handle massive market collapses e.g. mid2018 while taking valid nonzero positions, and can grasp profits at potential opportunities.
There are, however, still several limitations in this research. First, due to the limited history of cryptocurrency, our dataset is rather short compared with stock data, with only 2014  2019 making up both training and testing sets. This limitation does not only occur w.r.t. time, but also troubles us during dynamic coin selection. Since we need a long period of historical data, our choices of coins became considerably restricted (less than 20), which further causes us to hold less (to be precise, 6 in this case) coins at each day. Idiosyncratic risks may not be well diversified and there is nonnegligible likelihood that our strategy fails us because of failure in one certain coin  though luckily, we didn't encounter this issue.
Last but not least, more rigorous backtests could be implemented to perform against our strategies. We may use intraday data, if possible, for better accuracy. Furthermore, we can take order book into account to cope with microstructure related concerns. By and large, our hybrid strategy satisfyingly outperformed all benchmark strategies in the same market and we deem this as a strong and good result.
Antonacci, Gary (2014). Dual Momentum Investing: An Innovative Approach for Higher Returns with Lower Risk. New York: McGrawHill Education, 13–18. ↩︎
Rohrbach, J., Suremann, S., & Osterrieder, J. (2017). Momentum and Trend Following Trading Strategies for Currencies RevisitedCombining Academia and Industry. ↩︎
Ke, Z. T., Kelly, B. T., & Xiu, D. (2019). Predicting Returns With Text Data. Available at SSRN. ↩︎
Caporale, G. M., & Plastun, O. (2018). Price Overreactions in the Cryptocurrency Market. ↩︎
We include all miscellaneous scripts (mostly for data crawling) as below.
1  import time 
1  import os 
Now we starting generating new features from these posts. First let's take a look at the data.
In this section, we give the full code of our backtest engine as well as the full list of hyperparameter testing results. The scripts for engine is as below.
1  import numpy as np 
The hyperparameter testing, using the backtest engine above, is listed as follows.
1  if not os.path.isfile('misc/results.csv'): 
#  lookback_window  minimum_occurrence  minimum_length  alpha  size  strategy  win_rate  sharpe  mdd 

0  20  10  1  0.05  binary  Buy & Hold  0.527094  0.441738  0.831663 
1  20  10  1  0.05  binary  Momentum  0.423645  1.686962  0.998461 
2  20  10  1  0.05  binary  Mom. Rev.  0.576355  1.686962  0.781847 
3  20  10  1  0.05  binary  Strategy  0.482759  0.521645  0.988099 
4  20  10  1  0.05  continuous  Buy & Hold  0.527094  0.441738  0.831663 
...  ...  ...  ...  ...  ...  ...  ...  ...  ... 
967  120  100  5  0.20  continuous  Strategy  0.494709  0.831074  0.582280 
968  120  100  5  0.20  trinary  Buy & Hold  0.526455  0.971476  0.789994 
969  120  100  5  0.20  trinary  Momentum  0.433862  0.060313  0.925483 
970  120  100  5  0.20  trinary  Mom. Rev.  0.566138  0.060313  0.987388 
971  120  100  5  0.20  trinary  Strategy  0.047619  0.988654  0.665146 
1  df_results_strategy = df_results[df_results.strategy=='Strategy'].drop('strategy', axis=1).melt( 
#  lookback_window  minimum_occurrence  minimum_length  alpha  size  statistic  value 

0  20  10  1  0.05  binary  win_rate  0.482759 
1  20  10  1  0.05  continuous  win_rate  0.470443 
2  20  10  1  0.05  trinary  win_rate  0.172414 
3  20  10  1  0.10  binary  win_rate  0.514778 
4  20  10  1  0.10  continuous  win_rate  0.512315 
...  ...  ...  ...  ...  ...  ...  ... 
724  120  100  5  0.10  continuous  mdd  0.376395 
725  120  100  5  0.10  trinary  mdd  0.407608 
726  120  100  5  0.20  binary  mdd  0.996307 
727  120  100  5  0.20  continuous  mdd  0.582280 
728  120  100  5  0.20  trinary  mdd  0.665146 
1  sns.set({ 
In this post, I'll introduce four stochastic processes commonly used to simulate stock prices. Formulation and Python implementation are presented one by one, with brief comments afterwards.
Before introducing the four methods, we first define a handy function to prepend a zero before a numpy array:
1  def prepend(arr, val=0): 
Brownian motion (BM) was initially exhibited and modeled in gas or liquid particle movements. The random motion was the solution to massive number of tiny collisions of particles in certain medium. Brownian motion was named after the botanist, Robert Brown, in 1827, and after almost a century introduced into the financial markets. People usually call a standard Brownian motion as a Wiener process, which has the following properties:
A general BM follows SDE:
\[\d S_t = \mu \d t + \sigma \d W_t\]
which directly gives solution by take integrals on both sides:
\[S_t = S_0 + \mu t + \sigma W_t.\]
With all of these given, we can easily simulate the process as below.
1  class BM: 
Comments: One may notice that we don't have any constraint posed on \(W\) — it can be however positive — and however negative. Therefore, while BM might give a simple and easytoimplement solution for a short run, we don't really want to use it as it potentially gives us negative prices.
Geometric Brownian Motion (GBM) was famous as been used in Fisher Black and Myron Scholes's 1973 paper, The Pricing of Options and Corporate Liabilities. The process is by definition positive and thus gives a fix for what we doubted in BM. The corresponding SDE is
\[\d S_t = S_t(\mu \d t + \sigma \d W_t)\]
which gives solution
\[S_t = S_0\exp\left\{\left(\mu\frac{\sigma^2}{2}\right)t + \sigma W_t\right\}.\]
1  class GBM: 
Comments: The GBM is good enough for most simulations. However, it is also wellknown that the BlackScholes model cannot give fat tails as is inspected empirically in stock markets.
Robert C. Merton, who shared the 1997 Nobel Price with Scholes (Black had passes away unfortunately), was one of the first academics to address some of the limitations in the GBM. In his 1976 paper, Option Pricing when Underlying Stock Returns are Discontinuous, he superimposed a "jump" component on the diffusion term so that the model can now simulate sudden economic shocks, i.e. jumps in prices. The jump \(J\) is given by the exponential of a compound Poisson process \(N\) with normal underlyings. The SDE is as follows.
\[\begin{align*}Y_i&\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\gamma, \delta^2)&\text{(Jump Magnitude)}\\\d N_t & \sim \text{Pois}(\lambda \d t)&\text{(Poisson Process)}\\J_t &= \textstyle{\sum_{i=1}^{N_t}}Y_i&\text{(Jump)}\\\d S_t &= S_t (\mu \d t + \sigma \d W_t + \d J_t).\\\end{align*}\]
Merton's jump diffusion SDE has a closedform solution:
\[S_t = S_0 \exp\left\{\left(\mu \frac{\sigma}{2}\right)t + \sigma W_t + J_t\right\}.\]
1  class MertonJump: 
Comments: Merton's jump process solved the kurtosis mismatch problem in empirical financial data by minimally changing the GBM. However, with the discontinuous (and usually negative, corresponding to market clashes) jumps introduced in the model, we may witness frequent slumps and in general a decline in the total drift. On the other, the jump process still did not solve the constant volatility issue.
In the early 1990's Steven Heston introduced this model where volatilities, different from the original GBM, are no longer constant. In the Heston model, volatilities evolve according to the CoxIngersollRoss process with a meanreverting essense. As there're now two stochastic processes, we need two (potentially correlated) Wiener processes. The SDE is now
\[\begin{align*}\d W_t^S\d W_t^V &= \rho\d t & \text{(Correlated Wiener)}\\\d V_t &= \kappa (\theta  V_t) \d t + \xi \sqrt{V_t} \d W_t^V &\text{(CoxIngersollRoss)}\\\d S_t &= \mu S_t \d t + \sqrt{V_t}S_t \d W_t^S & \text{(Heston)}\end{align*}\]
1  class Heston: 
Comments: The Heston model is one of the most popular stochastic volatility models in finance. In case one need even more freedom, he may opt for timevarying parameters e.g. \(\mu\to\mu_t\) and \(\xi\to\xi_t\).
Finally, let's take a look at all the simulated price processes. The Brownian motion is shifted s.t. \(S_0=1\) like the rest models. Mutual parameters like \(\mu\) and \(\sigma\) are set to the same values. Each figure shows \(1000\) paths.
]]>It's always been a headache to me that I cannot have my blog's search engine to show content I want — there're always something you don't want 'em to show up in a search result, like password protected posts (shown as encrypted codes) and random pages for a certain project (some even don't have a title, and this tipuesearch would still show them in the searching result — with a blank title and a bunch of html raw codes). Even worse, it seems there's no offical way to set this sort of content filters. This feels bad. This terrible feeling has tortured me for months till I made up my mind and fixed it from source codes today.
The fix turned out, well, quite straightforward. First, we locate the node package folder hexogeneratortipuesearchjson
. The package structure shows
node_modules└───hexogeneratortipuesearchjson ├───index.js ├───LICENSE ├───package.json ├───README.md └───node_modules └───...
The file we need to edit is index.js
. Below I've attached the full codes after modification:
1  var util = require('hexoutil'); 
Note the second line of the definition of postsContent
and the lines we comment out. These modifications are made such that encrypted posts and standalone pages won't be searched.
There is a piece of nicely given advice I'd like to share with you: do never post anything too large on your Hexo blog.
The suggestion given above was a joke to me until yesterday. I thought I can just wait for some more minutes and then everything will be fine. Pages will end up posted with probability one in the long run — well, they didn't make it this time. The html files were so big that GitHub returned a file oversize error and rejected my push from Hexo. Everything got messy and however I tried, deployment was always rejected.
If you also encounter this problem, well, lucky you, cause I've managed a fix to it. The first step would be locating the .git
folder in our hexo directory. Here I used the builtin command
1  find . name ".git" 
under the hexo
folder. The corresponding location was hexo/.deploy_git
. Then we enter this directory and go onto our step 2, which is basically git commit history reversion. First go to the GitHub website and find an earlier commit and copy its SHA codes. Then, in the terminal we opened, enter:
1  git reset hard {{SHA}} 
and then make a regular push.
]]>In this tiny piece of post I'm gonna post how you can make animations using the matplotlib
module in Python. Things get much more intuitive when they move, don't they?
We're here trying to plot two (thick) sine curves which have different offsets on the xaxis direction, and those offsets increases with an indicator called frame_no
.
1  import numpy as np 
The plot is altered bit by bit as frame_no
changes with frames. You can also try different fps configurations by changing the interval
argument. Also, repetition is set true by default, but you may still forbid it by specifying repeat
as false in FuncAnimation
. Finally, you would have a plot as follows:
Lovely, isn't it?
]]>Although it's not recomended, people sometimes need the variable names. For example, you want to automate the process of generating a dictionary with variable names as keys, or use variable names as columns names in a pandas dataframe. How are we gonna implement this in Python?
There is a nasty workaround provided somewhere on Stackoverflow (sorry but I forgot the actual thread):
1  def varName(p): 
foofarvarName
The method utilized the fact that Python stores all variables in the global()
dictionary where keys are corresponding id
values. Enjoy coding 🙃
In this research report we try to simulate and explore difference scenarios for a "perfect" market maker in the Bitcoin market. By "perfect" we refer to the capability to capture all spreads on the right side of any trade, i.e. there will be no spread loss at all. Although this setting is too perfect to be considered comparable with real trading, our analysis w.r.t. model parameters are believed to be insightful still.
Import necessary modules and set up corresponding configurations. In this research notebook, we are using the following packages:
1  %config InlineB_smallackend.figure_format = 'retina' 
In this section, several useful functions are introduced for later use during the backtest.
cumsum
: It's a modified version of the original cumulative summation function, the summation now handles two boundaries during calculation. Calculation is accelerated by JIT.sharpe_ratio
: It's a handy function that calculates the annualized Sharpe ratio based on highfreq returns (returns are defined in percentage).sortino_ratio
: Similar as above, the function gives the annualized sortino ratio.1 

A backtest engine is designed for this problem.
We have the following parameters (the first two are datasets) for simulation:
Trade are participated only if:
1  be = BacktestEngine() 
Specially, the class provides a great feature that prints the whole backtest process in an animation. In order to activate the feature, run command below
1  be.run(s=..., j=..., k=..., animation=True) 
1  class BacktestEngine: 
The BacktestEngine
has been accelerated largely thanks to the vectorization and JIT features. Per each loop the calculation performance is shown as below (~2 ms per be.run
).
1  be = BacktestEngine() 
1  %timeit be.run(s=0.001, j=0.010, k=0.010) 
2.11 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In this part we try to load a small dataset and compare our results against the one in reference. The order book is as below.
1  be = BacktestEngine() 
time  ask1price  ask1size  ask2price  ask2size  ask3price  ask3size  bid1price  bid1size  bid2price  bid2size  bid3price  bid3size 

20180408 17:08:00.246  7035.55  66.062339  7035.56  0.5  7035.57  1.50587  7035.54  5.582917  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:01.426  7035.55  66.062339  7035.56  0.5  7035.57  1.50587  7035.54  5.584317  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:08.293  7035.55  65.958939  7035.56  0.5  7035.57  1.50587  7035.54  5.584317  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:08.437  7035.55  65.958939  7035.56  0.5  7035.57  1.50587  7035.54  5.570860  7035.53  0.00142  7035.5  0.011361 
20180408 17:08:08.485  7035.55  65.958939  7035.56  0.5  7035.57  1.50587  7035.54  5.591242  7035.53  0.00142  7035.5  0.011361 
The trade data is as below.
1  be.Z.head() 
time  price  size 

20180408 17:08:08.293  7035.55  0.1034 
20180408 17:08:13.472  7035.54  0.3900 
20180408 17:08:19.105  7035.55  0.1502 
20180408 17:08:20.858  7035.54  0.0630 
20180408 17:08:23.087  7035.54  0.1030 
Now, the backtest result is as below. Here we use \(s=0.01\), \(j=0.055\) and \(k=0.035\) like in the reference. We conclude that our model is valid as the result coincides with the one given.
1  T = be.run(0.01, 0.055, 0.035) 
time  trade  cash  position 

20180408 17:08:08.293  0.01  70.3555  0.01 
20180408 17:08:13.472  0.01  0.0001  0.00 
20180408 17:08:19.105  0.01  70.3556  0.01 
20180408 17:08:20.858  0.01  0.0002  0.00 
20180408 17:08:23.087  0.01  70.3552  0.01 
20180408 17:08:42.770  0.01  140.7106  0.02 
20180408 17:08:47.415  0.01  211.0660  0.03 
20180408 17:08:49.413  0.01  281.4214  0.04 
20180408 17:08:51.663  0.01  351.7768  0.05 
20180408 17:08:54.890  0.01  281.4213  0.04 
20180408 17:09:07.259  0.01  211.0658  0.03 
20180408 17:09:10.259  0.01  281.4212  0.04 
20180408 17:09:14.027  0.01  351.7766  0.05 
20180408 17:09:53.208  0.01  281.4866  0.04 
In this section, we opt for a simple grid search to find the best parameters of our strategy. There are several things to consider before we actually start searching.
I believe the answer is yes. There is little reason we want to find the interrelationship between the upper and lower bounds of our positions. Since we assume a short position yields direct cash out, we don't distinguish between a long and a short trade really. Of course market may have its trend, but theoretically we don't care about the result from searching on a \((j,k)\) grid.
Like in most backtest scenarios, we use Sharpe and Sortino ratios as metrics. Besides these two, we also consider the final P&L as a crucial statistic here.
Hence, we run simulation on a \(100\times 100\) grid of \((s,j=k)\) grid and keep track of outstanding results. Then we filter these results by the three metrics and keep only the best \(10\) in all of the three.
1  be = BacktestEngine() 
1  s_grid = np.arange(0.001, 0.101, 0.001) 
Running 10000 simulations: best_pnl=366.5568, best_sr=14.1144, best_st=136.7319  100.00% finished, ETA=0 s
1  best_params = list(filter(lambda x: (np.nan not in x) and (min(x[3:]) > 0), 
(Record 0) s=0.006, j=0.780, k=0.780  pnl=10.1482, sr= 0.0645, st=0.0905(Record 1) s=0.005, j=0.690, k=0.690  pnl=15.4882, sr= 5.8961, st=12.0982(Record 2) s=0.006, j=0.830, k=0.830  pnl=27.7178, sr= 5.4487, st=11.4175(Record 3) s=0.005, j=0.790, k=0.790  pnl=51.7044, sr= 5.9154, st=12.0906(Record 4) s=0.002, j=0.520, k=0.520  pnl=97.8652, sr=14.1144, st=136.7319
The best parameters, together with their corresponding performance metrics, are plotted as below. The left plot shows the relative performance from record \(0\) up to record \(4\) (we filtered away most records as they give negative returns), which is a monotoniclike one and we have record \(4\) an undoubtable winner. The right plot shows how our best \(5\) sets of parameters differ from each other. Despite the total P&L is increasing, the Sharpe ratios hardly changes  this implies our search converged  or more possibly, end up overfitted.
1  def lim_generator(values, extend=.05): 
1  rec = np.arange(len(best_params)) 
Before a thorough parameter analysis, we can also view the backtest performance in animation (ffmpeg
required on your computer). It can be seen that our P&L has a rather similar trajectory comparing with the Bitcoin price, only that it's direction of movement is opposite to the second. This implies we're probably holding short positions most of the time.
1  be.run(*best_params[4][:3], animation=True) 
In this section, we try to take an overall look on the whole parameter grid as well as the outputs. Here are several questions we intend to answer by the end of this part:
As we found in the previous section, with larger \(j\) (and \(k\)) values we have higher P&L and ratios. We will investigate into this issue here. The two plots below gives some insight on this question. As we can tell from the left figure below, the best performance from larger \(j\) (and \(k\)) values are indeed greater than those from smaller values, however, so are the worst results. This result coincides with the intuitive that larger position range means larger risk exposure over time, and therefore, more uncertainty in performance.
1  st001 = [] 
Similar as above, this guess is suggested from our grid search. First, from the right figure above we may tell that smaller \(s\) yields a more volatile performance  by volatile, it means we have more chance to attain better results. In the contrast, larger values give significantly more robust (yet around a negative Sortino ratio) performance and thus we conclude smaller \(s\) are more preferable.
There could be a lot of problems in fact, e.g. we are never a "perfect" market maker. But more severely, we may encounter some problem that we could've avoided, e.g. are we significantly biased to one side of trade, or, are we overfitting our model?
1  T = be.run(*record.records[4][:3]) 
The two figures above shows the progress of our position over time. The position is, by and large, negative throughout the day. This can be infered either from the left scatter plot or from the right histogram (which is extremely biased to the left). The positive skewness in our position is a ruthless indication that we've overfit the model, mostly due to limitation of data. On such a small dataset, overfitting is highly risky and likely without crossvalidatin methods etc. A potential cure for this may be using a larger dataset, or try to kfold the timespan for CV.
In the meantime, let's take a step back and analyze why the grid search gives us short position most time of the day. As far as I'm concerned, this is mainly because the profit we can obtain from taking a short position most of the time overwhelms that we can achieve from dynamic adjusting our side of trade and maintaining a neutral position. In a particular market like this given one where the general tendency of price is declining, a simple grid search ends up like this and we should've been aware of this before the whole analysis.
Numerically, the market making profit in this particular example is the price difference from each matched bid/ask and the corresponding mid price, which we used to calculate position market values. Under this setting, every trade we made we obtain a certain piece of revenue at no cost. The buyandhold profit, on the other hand, comes from holding a short position (in our story) and wait for the price to decline. We know the second profit is significantly larger than the first.
Theoretically, in order to fix this problem in its essence, we need to add one more parameter into our model, a parameter that rewards neutral positions or punishes holding a outstanding one. Available candidates include timedollar product of a lasting position and moving averages of positions.
In this research, we tried to wrap up a simple backtest engine with a very special "perfect" market making setting. The setting is proved to be unrealistic but still provided a number of insights after detailed analysis. In the meantime, we may improve the model in a variety of ways based on the last sector Parameter Analysis.
]]>In this strategy we try to do spread trading based on the M
day (adjusted) returns of two highly related ETFs (exchangetraded funds). The intuition is to hedge the onesided risks of buyandholding one specific ETF with (in expectation) increasing returns, by holding an opposite position of another ETF with decreasing returns. Once we have that the two ETFs' returns are highly correlated, we can trade and make profit by this sort of pair trading.
Apart from M
, we define trading thresholds g
and j
, together with stoploss threshold s
. Total capital limit K
is assumed to be twice of N
, namely the 15day rolling median volume (of the less liquid ETF). Specifically, we first calculate the array of daily minimums of the two (adjusted) volume series, and then calculate the 15day rolling median of this series as N
. Apart from the capital limit, we also define daily position value (if any) based on N
, which is N/100
.
Specifically, for each trading day, we have workflow as below.
Apart from this process, we also keep track of our risk exposure with a stoploss threshold, and try to do trading only within a month's time, i.e. we start trading only when it's the start of a new month, and kill any position every time it's the end of a month.
Import necessary modules and set up corresponding configurations. In this research notebook, we are using the following packages:
1  import warnings 
In this part, we will try to analyze the economic and statistical features in the data. Here the two ETFs we're using are XOP and DRIP. Data is retrieved from Quandl from 20151202
to 20181231
. We'll use only data from 20151202
to 20161201
for this section, as we don't want to include future information during the backtest. Also, while it's always better to have longer historical data for analysis, due to limited length of ETF data on Quandl (specifically for these two ETFs) we're unfortunately restrained to this short timespan.
The SPDR S&P Oil & Gas Exploration & Production ETF (XOP) seeks to provide investment results that, before fees and expenses, correspond generally to the total return performance of the S&P Oil & Gas Exploration & Production Select Industry Index. See here for more detailed description.
The Direxion Daily S&P Oil & Gas Exp. & Prod. Bull and Bear 3X Shares (DRIP) seek daily investment results, before fees and expenses, of 300% of the inverse, of the performance of the S&P Oil & Gas Exploration & Production Select Industry Index. See here for more detailed description.
By definition of the two ETFs, we expect DRIP to track 300% the daily return of XOP. This means the spread we should be tracking is, instead of the return difference between the two, the difference of 300% of M
day returns of XOP and the M
day returns of DRIP. Also, we are supposed to hold, if any, positions of these two ETFs in a ratio of XOP:DRIP = 3, no matter we're long or short in the spread.
A peek into the data (assume M
is 5.):
DRIP
: price of DRIPXOP
: price of XOPrDRIP
: 5day return of DRIPrXOP
: 5day return of XOPrXOPn3
: 300% of 5day return of XOPspread
: spread of rXOPn3
from rDRIP
( spread = rDRIP  rXOPn3
)The first few entries of our data reads:
1  # exploratory settings 
Date  DRIP  XOP  rDRIP  rXOP  rXOPn3  spread 

20151209  93.228406  31.481858  0.300551  0.091874  0.275621  0.024930 
20151210  87.783600  32.033661  0.175401  0.063402  0.190207  0.014806 
20151211  100.660997  30.465377  0.247456  0.084908  0.254725  0.007269 
20151214  107.982471  29.719958  0.113006  0.041524  0.124571  0.011565 
20151215  101.685757  30.310485  0.065597  0.028545  0.085635  0.020037 
20151216  108.538063  29.632831  0.164217  0.058733  0.176199  0.011983 
20151217  118.711577  28.616350  0.352321  0.106679  0.320036  0.032284 
20151218  125.551661  28.154731  0.247272  0.075845  0.227535  0.019737 
20151221  130.267899  27.862871  0.206380  0.062486  0.187459  0.018921 
20151222  125.008291  28.203374  0.229359  0.069518  0.208553  0.020806 
Also we may plot the histogram of the spread. Here we plot it against the fitted normal and t distributions. Apparently the t distribution matches our spread data better, which coincides with our expectation as financial data is commonly seen with fat tails. Also, we may notice that the spread is well centered around zero, which reassures us that we can assume symmetrical thresholds for trading.
1  fig = plt.figure(figsize=(20, 7.5)) 
In the second subplot, we can see that the spread series is quite "stationary" over time, but we'd better not stop just observing by eye. (Also it's a bit heteroskedastic, but we're not focusing on that in this research.)
Below are some statistical tests we need to run through before actual pair traing. For detailed reasoning please refer to this post.
1  result = adfuller(df.spread) 
ADF Statistic: 8.614239430241229pvalue: 6.353844261802846e14Critical Values: 1%: 3.458 5%: 2.874 10%: 2.573
1  def hurst(ts): 
H: 0.0390
Based on the previous two test results we conclude our spread is meanreverting and the strategy is reasonable.
In this part we design a simple backtest engine that takes ETF symbols, backtest timespan and the theoretical return ratio. It then provides an interface to run backtest against different parameters. I've encapsulated private methods/variables in the class BacktestEngine
and there are only three attributes available:
BacktestEngine.symbols
: tuple of ETF symbolsBacktestEngine.run
: run backtest (returns Sortino ratio, Sharpe ratio, maximum drawdown and YoY return)BacktestEngine.df
: stores the data from backtest (trade log)The basic usage of this engine would be
1  be = BacktestEngine('DRIP', 'XOP', '20161202', '20181231', ratio=3) 
and if you want to check the tradelog during the timespan, call be.df
. Note in this data frame, we denote the two ETFs by X
and Y
, and instead of the original M
day return of Y
, we denote rY
as ratio
times the original M
day returns. The positions of X
and Y
are also reported in be.df
together with daily and cumulative returns (in percentages of K
).
An example of this be.df
would be
Date  X  Y  rX  rY  spread  N  pX  pY  daily_rtn  cum_rtn 

20161222  12.267480  41.323160  0.019732  0.017567  0.002165  1.549400e+07  0  0  0.0  0.0 
20161223  12.208217  41.450761  0.017488  0.015711  0.001778  1.210184e+07  0  0  0.0  0.0 
20161227  12.000796  41.666701  0.018578  0.016343  0.002235  1.064284e+07  0  0  0.0  0.0 
20161228  12.445270  41.166112  0.003185  0.004286  0.001101  9.867562e+06  0  0  0.0  0.0 
20161229  12.692200  40.901094  0.017419  0.017180  0.000239  9.607867e+06  0  0  0.0  0.0 
1  class BacktestEngine: 
Here for illustration, we make a test run with parameters M
=5, g
=0.010, j
=0.005 and s
=0.01. The timespan, as required throughout the analysis, is set from 20161202
to 20181231
inclusive. The special meta parameter ratio
is set to 3.
1  be = BacktestEngine('DRIP', 'XOP', start_date='20161202', end_date='20181231', ratio=3) 
Sortino Ratio=0.0574, Sharpe Ratio=0.0387, Maximum Drawdown=1.118e11, YoY Return=0.09%
With only a Sortino ratio of 0.0574, a Sharpe ratio of 0.0387 and YoY return of 0.09%, it's definitely not a good strategy. Not to mention the unsatisfactory return plots. The top right subplot together with the bottom left one suggests that we might be using too wide thresholds. In case of detailed analysis, we can also take a look at be.df
and specifically the trading days when we have nonzero positions, which turns out rather few (and supports our worry about wideness of thresholds):
1  be.df.loc[be.df.pX != 0] 
Date  X  Y  rX  rY  spread  N  pX  pY  daily_rtn  cum_rtn  

20170420  19.072870  34.413981  0.191975  0.178984  0.012991  1.533244e+07  25014  4643  0.000000  0.000000  
20170421  18.845694  34.532005  0.097182  0.100743  0.003561  1.496846e+07  25014  4643  0.000011  0.000011  
20170612  21.522415  32.279704  0.076695  0.055866  0.020829  9.166219e+06  13122  2984  0.000000  0.000056  
20170804  22.747188  30.827585  0.142928  0.143423  0.000495  1.754893e+07  21483  5827  0.000000  0.000029  
20171102  15.319535  34.463445  0.210285  0.227637  0.017352  1.302198e+07  26349  3734  0.000000  0.000259  
20171226  11.398287  37.260895  0.221848  0.249496  0.027649  1.189655e+07  29352  3264  0.000000  0.000180  
20171227  11.645217  36.963916  0.200136  0.218041  0.017905  1.351392e+07  29352  3264  0.000135  0.000315  
20171228  11.418041  37.241096  0.152493  0.163117  0.010624  1.189655e+07  29352  3264  0.000092  0.000406  
20171229  11.714357  36.805528  0.054226  0.042553  0.011673  1.246303e+07  29352  3264  0.000131  0.000537  
20180205  14.331815  34.023830  0.357343  0.307833  0.049510  1.823786e+07  40842  5061  0.000000  0.000619  
20180328  13.193795  33.978894  0.097942  0.103973  0.006031  2.054748e+07  52227  6579  0.000000  0.000305  
20180403  12.630042  34.326023  0.045008  0.062801  0.017792  2.304128e+07  51093  6703  0.000000  0.000390  
20180511  7.140870  40.931377  0.120585  0.125726  0.005141  3.427444e+07  150390  8464  0.000000  0.000188  
20180823  6.237512  41.241724  0.144022  0.155054  0.011033  2.413513e+07  118650  5898  0.000000  0.000023  
20180824  6.039495  41.778234  0.157459  0.177582  0.020123  2.205453e+07  118650  5898  0.000041  0.000018  
20180827  5.960289  41.877588  0.146099  0.152580  0.006481  2.155575e+07  118650  5898  0.000098  0.000081  
20181024  9.096368  35.500295  0.494290  0.398785  0.095505  3.827526e+07  146172  9913  0.000000  0.000239  
20181112  9.106299  34.953065  0.168153  0.166935  0.001217  4.281623e+07  156027  11825  0.000000  0.001016  
20181114  9.801436  34.107346  0.324832  0.280804  0.044028  4.281623e+07  141351  13473  0.000000  0.000185  
20181115  9.364493  34.614778  0.136145  0.133480  0.002665  4.050823e+07  141351  13473  0.000016  0.000169  
20181123  11.211572  32.336311  0.197243  0.197471  0.000228  4.050823e+07  120567  12081  0.000000  0.000222  
20181206  11.797473  31.520441  0.111319  0.125227  0.013908  3.503895e+07  97920  10763  0.000000  0.001057  
20181207  11.906709  31.381147  0.147368  0.155996  0.008628  3.503895e+07  97920  10763  0.000635  0.001692  
20181212  12.989137  30.475730  0.209991  0.191626  0.018365  4.187946e+07  89073  12988  0.000000  0.002391  
20181213  13.187748  30.286687  0.117845  0.117424  0.000421  3.936253e+07  89073  12988  0.000148  0.002539  
20181217  16.375449  28.077868  0.248297  0.226081  0.022215  4.187946e+07  78804  13600  0.000000  0.002583 
As mentioned above, in this section we try to fit the best set of parameters from 20151202
to 20161201
, i.e. the training set. As the focus of this report is not about efficient optimization, we opt for a simple grid search here. The parameter grids are defined as
M_grid
: 5, 10, 15, 20 (4 in total)g_grid
: 0.001, 0.003, ..., 0.011 (6 in total)j_grid
: 0.010, 0.008, ..., 0.010 (11 in total)s_grid
: 1e3, 5e3, 1e2, 5e2, 1e1 (5 in total)So no more than 1320 simulations are run. Note here parameter combinations where g < j < g
does not hold are neglected. Below are a selection of outstanding parameter sets during simulation.
1  from time import time 
(Record 0) M=15, g=0.007, j=0.004, s=0.05000, st=1.4542 sr=0.3393, md=1.934e10, rt= 8.99%(Record 1) M=15, g=0.011, j=0.010, s=0.05000, st=1.4591 sr=0.3430, md=1.673e10, rt= 9.17%(Record 2) M=20, g=0.007, j=0.006, s=0.05000, st=1.5146 sr=0.3540, md=2.076e10, rt= 9.47%(Record 3) M=20, g=0.007, j= 0.006, s=0.05000, st=1.6001 sr=0.3534, md= 1.7e10, rt= 9.33%(Record 4) M=15, g=0.011, j=0.004, s=0.05000, st=1.4680 sr=0.3452, md=1.673e10, rt= 9.15%
From the two plots below, we can tell that Record 3, or the parameter set M=20, g=0.007, j=0.006, s=0.05000
, is a good choice as it has both large Sortino ratio/YoY return and a relatively small maximum drawdown. Record 2, or M=20, g=0.007, j=0.006, s=0.05000
is also playing well among all outstanding parameter sets, with slightly better Sortino ratio and YoY returns but larger maximum drawdown. We'll test on both sets.
1  rec = np.arange(len(record.best)) 
Using the parameters from Record 3, we run backtest against the test set, i.e. from 20161202
to 20181231
. The plots are as below.
1  be = BacktestEngine('DRIP', 'XOP', start_date='20161202', end_date='20181231', ratio=3) 
Sortino Ratio=0.7407, Sharpe Ratio=0.2447, Maximum Drawdown=2.005e11, YoY Return=1.76%
Using the parameters from Record 3, the backtest result is as below.
1  be = BacktestEngine('DRIP', 'XOP', start_date='20161202', end_date='20181231', ratio=3) 
Sortino Ratio=0.9394, Sharpe Ratio=0.2872, Maximum Drawdown=1.997e11, YoY Return=2.24%
Both results are amazingly great (especially compared with our result using random parameters before any tuning). Considering here we're not utilizing any future data in backtest, the performance is satisfactory despite we're neglecting a lot executional details in our analysis, like transaction costs and market impacts. There are also several comments on the processing of data:
M
days to calculate the rolling median of N
, which causes a loss in data. Perhaps we should use M
days of further previous historical data to make up this loss.After messing up with my Python virtualenv
my computer finally started going nuts. Jupyter notebook threw me the following error every time I start it:
1  zsh: /usr/local/bin/jupyter: bad interpreter: /usr/local/opt/python/bin/python3.7: no such file or directory 
It turns out that the cause was that along with reinstallation of Python, the homebrew symlinks to Jupyter are now broken. A simple solution would be
1  rm '/usr/local/bin/jupyter' 
Then the notebook starts just fine.
]]>I wrote a poker game.
This is a simple Texas Hold 'em game running on Mac OS. All scripts are written in pure Python. The main GUI is written using the Python module PySimpleGUI
, and hand evaluation is done by refering to a hand value table precalculated together with Monte Carlo simulation. See here for detailed explanation of hand evaluation. Together with the GUI version, I also include here a primitive commmandline version with ColorPrint
support, which you may download and include from this repo. The two versions are supposed to work identically.
You don't need any Python or module dependencies installed on your Mac in order to just play the game. The app itself is standalone with everything packed inside it already. There're just two steps:
Or rather, if you'd like to pack it yourself:
PyInstaller
in onefile
mode.There're several things to work on in plan:
PySimpleGUI
, MikeTheWatchGuy).Here are some bugs I'm trying to fix:
Popup
element in PySimpleGUI
.I appreciate suggestions and encourage from anyone throughout the development (which may still continue for a long time, considering the considerable time I spent just on writing this primitive game). Special thanks to my friends who ever tried to play the game and found bugs starting from the commmandline version. Also, credit for MikeTheWatchGuy who wrote the PySimpleGUI
module and helped me fix several bugs. Also, credit to Freepik from www.flaticon.com, who made this fantastic icon. Finally, I wanna give credit to myself for the nights I stayed up after lectures. There is nothing more fulfilling than realizing an impulse right away.
In the previous post, we considered the probabilities of making one specific hand with the turn/river card. This can be rather useful in specific situations, but still cannot apply thoughout a game. Poker is essentially an incomplete information game. Different from Go, where you can see all stones placed on the chessboard and thereby "solve" an optimal move, you never know you opponents' pocket cards until showdown (yet even then, people mucks). Also, you have little clue on the unshown community cards. Therefore, in order to evaluate a hand during a poker game, we'd better opt for a online evaluation algorithm instead of considering this as a DPlike problem.
For the sake of convenience, the table of hand values is shown again below.
Name  Description  Example 

High Card  Simple value of the card. Lowest: deuce; highest: ace.  As 4s 7h Td 2c 
Pair  Two cards with the same value.  As 4s 7h Td Ac 
Two Pairs  Two pairs where each pair of cards have the same value.  As 4s 4h Td Ac 
Three of a Kind  Three cards with the same value.  As 4s 4h 4d 2c 
Straight  Five cards in consecutive values (ace can precede deuce or follow up king).  9s Ts Jh Qd Kc 
Flush  Five cards of the same suit.  Ah 4h 7h Th 2h 
Full House  Three of a kind with the rest two making a pair.  As 4s 4h 4d Ac 
Four of a Kind  Four cards of the same value.  As 4s 4h 4d 4c 
Straight Flush  Straight of the same suit.  9h Th Jh Qh Kh 
Royal Flush  Straight flush from ten to ace.  Th Jh Qh Kh Ah 
Our ultimate goal is to be able to evaluate the probability of a win (and tie), i.e. the relative strength of our hand. However, let's just try to get one step back and consider evaluating the absolute strengths. Here we denote it as the hand value. Ideally, there are \(\binom{52}{5}=2,598,960\) possible hands but much less valid values. An intuitive idea to match all these hands to their values, which is also what I did, is to first generate a sparse mapping from hands to values, and then condense it. First we need a function that identifies hand types. Then, for hands within the same type, we encode them like a carry system (e.g. decimal); for hands across different types, we manually add offsets so that higher hand types always yield higher values. The final results are stored in the dictionary hv
and serialized locally. The highest value is 6144 and here is a glance of hv
:
1  { 
Now that we have the full mapping from hands to values, there are two things we can do to calculate the probabilities:
Things are gonna be much easier when we only consider the twoplayer case (a.k.a. headsup games). In that case, we can literally count and evaluate all scenarios — when there're 2 pocket cards for me and 3 community cards shown, we only need to enumerate \(\binom{525}{2}=1,081\) opponent hands, which is lightning fast to finish with modern programming languages like Python or C++. However, when we have more players, like 5, the first method gets nasty. The hands of each opponent are not independent, so we have to go through \(\binom{525}{2\times 4} > 3\times 10^8\) situations and that, different to the headsup scenario, would be unacceptably slow no matter which language we use. Therefore, we opt for the second method at some cost of precision. The code (partial) is shared below.
1  def handEval(my_pocket, community, n_players, hv, 
Below are two example tests.
1  Pocket: Tc Jd 
P(win) = 5.4688% P(tie) = 0.4883%
1  Pocket: Tc Jd 
P(win) = 1.9531% P(tie) = 0.7812%
Note here I also implement an interesting parameter called ranges
which represents the opponents prior ranges at preflop. When passing an empty value, all combinations of two cards are considered. When we specify a list of ranges (numbers from 0 to 1, say \(x\%\)), then the opponents are assumed to only play when their pocket cards are at least in the top \(x\%\) pairs of all pairs. See this table for more reasoning.
In this post we're gonna introduce one of the most widelyused results in hold 'em: the odds chart.
Before showing the odds chart, we first give the mathematical definition of odds. Here we're not focusing on winning a hand, but instead our intended issue is whether we can make an expected hand with the forthcoming unshown card(s). We call the probability of doing that as the improving probability, and define its corresponding odds as
\[\text{odds} = \frac{1}{\text{improve}\%}  1\]
which means we can bet every \(\$1\) against any pot larger or equal to this amount.
Now we try to calculate these probabilities and odds. We here only consider one card to expect and one/two community cards to unveil, namely odds on the river or turn. For example, when we're expecting any of 8 cards on the turn to make a straight, then the improving probability in this case would be
\[\text{improve}\% = 1  \frac{45 + 2  8}{45 + 2}\times \frac{45 + 1  8}{45 + 1} = \frac{340}{1081} \approx 31.45\%\]
which means we have \(31.45\%\) chance to make it and the odds, therefore, is \(1/31.45\%  1 = 2.2\), which means we can bet at most \(\$1\) against each \(\$2.2\) pot. More generally, let \(\#\text{n.s.}\) denote the number of community cards not shown yet, then we have
\[\text{improve}\% = 1  \prod_{i=1}^{\#\text{n.s.}} \frac{45 + i  \text{outs}}{45 + i}.\]
Below is the table of improving probabilities and corresponding odds w.r.t. different outs.
Outs  Improve% (River)  Odds (River)  Improve% (Turn)  Odds (Turn) 

1  2.17%  45  4.26%  22 
2  4.35%  22  8.42%  11 
3  6.52%  14  12.49%  7 
4  8.70%  11  16.47%  5.1 
5  10.87%  8.2  20.35%  3.9 
6  13.04%  6.7  24.14%  3.1 
7  15.22%  5.6  27.84%  2.6 
8  17.39%  4.8  31.45%  2.2 
9  19.57%  4.1  34.97%  1.9 
10  21.74%  3.6  38.39%  1.6 
11  23.91%  3.2  41.72%  1.4 
12  26.09%  2.8  44.96%  1.2 
13  28.26%  2.5  48.10%  1.1 
14  30.43%  2.3  51.16%  0.95 
15  32.61%  2.1  54.12%  0.85 
16  34.78%  1.9  56.98%  0.75 
17  36.96%  1.7  59.76%  0.67 
Multiplying the number of outs by two or four gives a reasonable approximation to the improve% (river) or improve% (turn) respectively, in the above table. This is a famous (yet quite rough) approximation among hold 'em gamers. The rule is a direct corollary from the abovementioned formula, as when \(\#\text{n.s.}=1\),
\[\text{improve}\% = 1  \frac{46  \text{outs}}{46} = \frac{\text{outs}}{46} \approx (2\text{outs}) \%\]
and when \(\#\text{n.s.}=2\),
\[\begin{align*}\text{improve}\% &= 1  \frac{46  \text{outs}}{46}\times \frac{47\text{outs}}{47} \\&= \frac{93}{2162}\text{outs}  \frac{1}{2162}\text{outs}^2 \approx (4\text{outs}) \%.\end{align*}\]
]]>In this post, I'll walk through the whole process to download, clean and then browse one of world's largest poker hands history dataset, the IRC Poker Database^{[1]}, which is a little bit aged but wellknown for its huge size. The work we're doing here is meant to be a preparation for further analysis and model training.
Before the advent of realmoney online poker servers, there was the Internet Relay Chat (IRC) poker server. The server was programmed by Todd Mummert, with support code by Greg Reynolds, and other Usenet rec.gambling.poke enthusiasts. The participants in these games were mostly computer geeks with a passion for poker. Many were serious students of the game, armed with the analytical skills needed to understand the mathematics, and all other aspects of advanced poker strategy.
Michael Maurer wrote a program called Observer that sat in on IRC poker channels and quietly logged the details of every game it witnessed. This resulted in the collection of the more than 10 million complete hands of poker (from 19952001) that constitute the IRC Poker Database.
Sadly, the IRC games are now gone (but might be resurrected one day).^{[2]}
We'll be using the same shorthand notations as we gave in the last post. For bet actions, we define

: no actionB
: blind betf
: foldk
: checkb
: betc
: callr
: raiseA
: allinQ
: quitK
: kicked outAs for rounds, we denote
p
: preflopf
: flopt
: turnr
: rivers
: showdownI've written several scripts^{[3]} for all sorts of data preparations and the code can be found on my GitHub repository. After entering the repo, run the following codes in order:
1  wget http://poker.cs.ualberta.ca/IRC/IRCdata.tgz # download the database (> IRCdata.tgz) 
Eventually there're \(10{,}233{,}955\) hands in hands.json
and \(437{,}862\) in hands_valid.json
after cleaning.
You may run the following code to inspect hands in their original order. Any time you'd like to stop browsing, you can just use Ctrl+C
to interrupt the process.
1  python3 browse.py # print hands in a formatted way 
The script lists extracted hands history as below.
############################################################ time : 199612 id : 2093 board : ['Qd', '6s', 'Td', 'Qc', 'Jh'] pots : [(2, 60), (2, 60), (2, 60), (2, 60)]players : Tiger (#1) {'action': 30, 'bankroll': 2922, 'bets': [{'actions': ['B', 'r'], 'stage': 'p'}, {'actions': ['k'], 'stage': 'f'}, {'actions': ['k'], 'stage': 't'}, {'actions': ['k'], 'stage': 'r'}], 'pocket_cards': ['9s', 'Ac'], 'winnings': 30}· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · jvegas2 (#2) {'action': 30, 'bankroll': 139401, 'bets': [{'actions': ['B', 'c'], 'stage': 'p'}, {'actions': ['k'], 'stage': 'f'}, {'actions': ['k'], 'stage': 't'}, {'actions': ['k'], 'stage': 'r'}], 'pocket_cards': ['9c', 'As'], 'winnings': 30}############################################################
So this screenshot describes hand #2093, which happened in December of 1996. There were two players at the table, namely Tiger (the SB) and jvegas2 (the BB). By default the game started with Tiger paying \(\$5\), who's got a 9♠ and an A♣ with a bankroll of \(\$2{,}922\), and jvegas2 paying \(\$10\), whose pocket cards were 9♣ and A♠ with a bankroll of \(\$139{,}401\). Then Tiger raised to \(\$30\) (3BB) and jvegas2 called. So preflot pot was $60 and there're two players. The flop was Q♦, 6♠ and 10♦. It was a dry hand by far so both checked at the flop. The turn was Q♣, and then both checked again. At the river came J♥, nothing special, and again checked both. Both players stuck to the showdown and it was a tie, so the two shared the total pot \(\$60\).
]]>Starting from today, I'm gonna write a series of posts on Texas Hold 'em, one of world's most famous forms of poker. The game is rather complicated, especially considering its origin dating back to early 20th century. In this post, I will list the barebones of hold 'em. These concepts may sound boring to you if you are a veteran poker player, but I just want to make sure we're talking in the same language — or building using the same bricks.
Texas hold 'em uses all cards but the two jokers from one deck of poker. So there're 13 cards wearing each of the four suits: diamonds (♦), hearts (♥), clubs (♣) and spades (♠). The exhaustive array of cards are:
(Suit)  Ace  Deuce  Three  Four  Five  Six  Seven  Eight  Nine  Ten  Jack  Queen  King 

♦  A♦  2♦  3♦  4♦  5♦  6♦  7♦  8♦  9♦  10♦  J♦  Q♦  K♦ 
♥  A♥  2♥  3♥  4♥  5♥  6♥  7♥  8♥  9♥  10♥  J♥  Q♥  K♥ 
♣  A♣  2♣  3♣  4♣  5♣  6♣  7♣  8♣  9♣  10♣  J♣  Q♣  K♣ 
♠  A♠  2♠  3♠  4♠  5♠  6♠  7♠  8♠  9♠  10♠  J♠  Q♠  K♠ 
Besides these fulllength notations, I'll also be using abbreviations where I denote d
for diamonds, h
for hearts, c
for clubs and s
for spades. Also, I use T
to represent cards of 10. Thereby, we can easily use a twocharacter string, e.g. Ts
, to refer to the card 10♠. This shorthand is gonna be especially useful when we try to program the game in the forthcoming posts.
There are in general three types of hold 'em games based on the number of players at the table: 2 (called "headsup"), 36 (called "6max") and 710 (this is called "fullring"). The maximum number of players in a Texas hold 'em game is ten, which means there're ten players and one dealer who does not play. That makes a regular 10player table full (well, with dealer standing).
There are a variety of positions of the table (ignoring the dealer's), but the most important are three: button (BTN), small blind (SB) and big blind (BB). In most selfdealt games where there is no specific person serving as a dealer, the button also serves as the dealer, which is why you can see the "dealer" chip at the same position. The blinds are forced contributions and are paid before the pocket cards (the two cards for each player at the beginning of the game, see figure above): SB pays first and is asked half the money of BB, then BB pays his forced bet and finally the dealer gives away all the pocket cards in a clockwise manner starting from SB. Therefore, BTN would be the last one to receive his pocket cards and also to act (bet or fold, see "betting" below).
Besides BTN, SB and BB, we also usually call the first player to the left of BB as underthegun (UTG), which vividly points out that he's the first to act on this table, as SB and BB are forced to pay their bets.
There are four kinds of actions you can take each round:
The hand begins with a preflop betting round, beginning with UTG and continuing clockwise. A round of betting continues until every player has folded, put in all of their chips, or matched the amount put in by all other active players.
After the preflop betting round, assuming there remain at least two players taking part in the hand, the dealer deals a flop: three faceup community cards. The flop is followed by a second betting round. This and all subsequent betting rounds begin with the player to the dealer's left and continue clockwise.
After the flop betting round ends, a single community card (called the turn or fourth street) is dealt, followed by a third betting round. A final single community card (called the river or fifth street) is then dealt, followed by a fourth betting round and the showdown, if necessary. In the third and fourth betting rounds, the stakes double.
To sum up, players have four rounds to make actions: preflop (given a pocket of two cards), flop (reveal three community cards), turn (reveal one more community card) and river (reveal the last one community card).
The following table shows the possible hand values in ascending order.
Name  Description  Example 

High Card  Simple value of the card. Lowest: deuce; highest: ace.  As 4s 7h Td 2c 
Pair  Two cards with the same value.  As 4s 7h Td Ac 
Two Pairs  Two pairs where each pair of cards have the same value.  As 4s 4h Td Ac 
Three of a Kind  Three cards with the same value.  As 4s 4h 4d 2c 
Straight  Five cards in consecutive values (ace can precede deuce or follow up king).  9s Ts Jh Qd Kc 
Flush  Five cards of the same suit.  Ah 4h 7h Th 2h 
Full House  Three of a kind with the rest two making a pair.  As 4s 4h 4d Ac 
Four of a Kind  Four cards of the same value.  As 4s 4h 4d 4c 
Straight Flush  Straight of the same suit.  9h Th Jh Qh Kh 
Royal Flush  Straight flush from ten to ace.  Th Jh Qh Kh Ah 
A player may use any five cards out of the seven available cards, namely the two pocket cards and five community cards, to reach the highest hand value he may attain. The player with the highest hand value wins the pot unless all but one player folds before showdown (showing pocket cards after the last betting round).
]]>A couple of months ago I was asked the following question during an interview (for propriatary concerns I'm not gonna disclose the industry or name of the company): \(\newcommand{R}{\mathbb{R}} \newcommand{E}{\text{E}} \newcommand{bs}{\boldsymbol} \newcommand{N}{\mathbb{N}}\)
Assume \(k\), \(n\in\N\) and \(k < n\). For a uniformly chosen subspace \(\R^k\subsetneq\R^n\) we define the orthogonal projection as \(P:\R^n\mapsto\R^n\). Find \(\E[P(\bs{v})]\) where \(\bs{v}\in\R^n\) is given.
It's an interesting question and also a totally novel one to me at that time. How do we define a "uniformly" chosen subspace and its corresponding projection? What are the possible intuitions in this simple piece of question? Despite the busy schoolwork and student projects, these thoughts persist in my mind and drive me digging this question from time to time. Curiosity has been aroused and an appetite is meant to be satisfied.
In order to solve this problem, we need to fully understand what's been asked. So now we've got two nonnegative integers \(k<n\) and two spaces, namely \(\R^n\) and \(\R^k\). We know \(\R^k\) is somehow randomly selected as a subspace of \(\R^n\) and this randomness is uniform. For each of such selection, we can make an orthogonal projection of the given point^{[1]} \(\bs{v}\) onto \(\R^k\). Note here the projection is defined from \(\R^n\) to \(\R^n\), which means we're not interested in the projected value on \(\R^k\) but the projection itself. In other words, we're focusing on the projected vector's behavior in the same space as \(\bs{v}\) here.
The simplest example (well, it's in fact not THE simplest as we could always project \(\bs{v}\) onto \(\R^0\) and the resulting expectation would be a zero vector) would be \(n=2\) and \(k=1\). For any given \(\bs{v}\), we can always draw a graph as below.
The random subspace in this case is illustrated by the straight gray line, which determines the projection \(P(\bs{v})=\bs{h}\) as in the graph. We know we're now uniformly selecting this subspace when we rotate this line centered at the origin accordingly. This means the angle between \(\bs{v}\) and \(\bs{h}\), denoted by \(\theta\), is a uniform random variable on \([0, 2\pi)\). Further, simple geometry tells us the angle between \(\bs{v}\) and \(\bs{h}  \bs{v}/2\) is merely \(2\theta\), which is therefore, also uniformly distributed on \([0, 4\pi)\). Now that we know \(\bs{h}\) uniformly lies on the red circle, we conclude the expected projection, in this particular case, is \(\bs{v}/2\).
While it gets geometrically difficult to imagine, not to mention to draw, the case of larger \(n\) and \(k\), this example has given us a pretty nice guess:
\[\E[P(v)] = \frac{k}{n}\bs{v}.\]
Can we prove it in higher dimensions and general cases?
Proof. Now we try to prove that our previous statement is true. For any set of orthogonal bases^{[2]} \(\bs{e}=(\bs{e}_1,\bs{e}_2,\dots,\bs{e}_n)\in\R^{n\times n}\), we uniformly choose a subset \((\bs{e}_{n_1}, \bs{e}_{n_2},\dots,\bs{e}_{n_k})\) and define a subspace \(\R^k\) on them. The projected value on any basis \(\bs{e}_j\) is \(\bs{e}_j'\bs{v}\) and the corresponding vector component would be \(\bs{e}_j\bs{e}_j'\bs{v}\). Therefore, the orthogonal projection of \(\bs{v}\) is given by
\[P(\bs{v}) = \sum_{j=1}^{k}\bs{e}_{n_j}\bs{e}_{n_j}'\bs{v} = \bs{eDe'v} \in \R^n\]
where we define the random matrix \(\bs{D}\) to be a diagonal matrix with \(k\) ones and \((nk)\) zeros on its diagonal. The diagonal entries are not independent, but the expectation of each entry would be the same, namely \(k/n\). The expectation of the projection, therefore, is
\[\E[P(\bs{v})] = \E[\bs{eDe'v}] = \E\{\E[\bs{eDe'v}\mid \bs{e}]\} = \E\{\bs{e}\E[\bs{D}]\bs{e'v}\} = \frac{k}{n}\E[\bs{ee'}]\bs{v}.\]
where we used the tower rule^{[3]}. Now, notice for any \(\bs{e}\) it always holds that \(\bs{e'e}=\bs{I}\), we have
\[(\bs{ee'})^2 = \bs{ee'ee'} = \bs{e(e'e)e'} = \bs{eIe'} = \bs{ee'} \Rightarrow \bs{ee'} = \bs{I}\]
and thus we may finally conclude
\[\E[P(\bs{v})] = \frac{k}{n}\E[\bs{ee'}]\bs{v} = \frac{k}{n}\bs{v}\]
which exactly coincides with our previous guess.Q.E.D.
TBD. May concern dimensional reduction, etc.
These are the lecture notes on foreign exchange market and theories. \(\newcommand{\E}{\text{E}} \newcommand{\P}{\text{P}} \newcommand{\Q}{\text{Q}} \newcommand{\F}{\mathcal{F}} \newcommand{\d}{\text{d}} \newcommand{\N}{\mathcal{N}} \newcommand{\eeq}{\ \!=\mathrel{\mkern3mu}=\ \!} \newcommand{\eeeq}{\ \!=\mathrel{\mkern3mu}=\mathrel{\mkern3mu}=\ \!} \newcommand{\MGF}{\text{MGF}}\)
The spot price of a foreign currency is (LHS as units of foreign currency, RHS as of domestic currency) \(1 = S_t\). Which is equivalent to \(1/S_t = 1\). We say \(S_t\) is a price in domestic terms.
Selling domestic currency to buy foreign currencies.
Value for the buyer is (in domestic currency) \(PV=(S_t  R)N\). This is because of the two cash flows:
Executing a spot contract at time \(T\) with given contract rate \(R\).
Value for the buyer is (in domestic currency) \(PV=(S_t\cdot P^f  R\cdot P^d)N\). This is because of the two cash flows at time \(T\):
which has present values at time \(t\)
We set \(PV=0\) for the forward contract and get \(F\equiv R=S_t\cdot P^f /\ P^d=S_t\exp[(r^d  r^f)\cdot(Tt)]\). Therefore, we also have \(FS_t\approx S_t(r^d  r^f)\cdot(Tt)\).
In order to replicate a forward contract, we can execute a spot contract, borrow domestic and lend foreign. Namely, we have cash flows at time \(t\):
and at time \(T\):
This yields \(S_t/P^dF=F\cdot 1 / P^f\), or \(F=S_t\cdot P^f / P^d\), which is what we call the CIP. This means higher interest rate currencies will be "weaker" on a forward basis.
From the CIP we have \(P^f = P^d \cdot F/S_t\), which gives \(r^f = r^d  \log(F/S_t) / (Tt)\).
Swapping a forward contract (\(T_1\), \(R_1\)) for another (\(T_2\), \(R_2\)).
Value for the buyer is (in domestic currency) \[\begin{align*}PV&=(S_t\cdot P^{f1}  R_1\cdot P^{d1}  S_t\cdot P^{f2} + R_2\cdot P^{d2})\\&=\left\{S_t\left[\exp(r^{f1}(T_1t))  \exp(r^{f2}(T_2t))\right]  R_1\exp(r^{d1}(T_1t)) + R_1\exp(r^{d1}(T_1t))\right\}\cdot N\end{align*}\] which is rather insensitive w.r.t. the spot rate: \[PV_S = \frac{\partial PV}{\partial S} = (P^{f1}  P^{f2})N = \left[\exp(r^{f1}(T_1t))  \exp(r^{f2}(T_2t))\right]\cdot N\approx N r^f(T_2 T_1)\] compared with that of a forward contract: \[PV_S = P^f\cdot N = \exp[r^f(Tt)]\cdot N \approx N.\]
The right (but not obligation) to exhcange \(N\) units of foreign currency for \(N\cdot K\) units of domestic currency at time \(T\). This is to say, we call the right to buy foreign currency as a foreign call, but in the meantime, also a domestic put.
We have the putcall parity as \(CP=P^d(FK)\) and the payoff of a foreign call option, \(\max(0, S_TK)\). We assume \(\{S_t\}_{0\le t\le T}\) follows GBM \(\d S = \mu S \d t + \sigma S \d W\) which, according to Itô's lemma, gives \[\d V = \left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t\right) \d t + V_S\d S\] where \(V\) is any derivative w.r.t. \(S\) (remark: remember that all subscript \(t\) here denote derivatives w.r.t. \(t\), not time). Now, noticing the hedged portfolio \(\Pi = \{+1 \text{ unit of }V; V_S \text{ units of } D^f\}\) has dynamics \[\begin{align*}\d\Pi &= \d V  V_S\d (S\cdot D^f) \\&= \left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t\right) \d t + V_S\d S  V_S(D^f \d S + S\cdot r^f \d t) \\&= \left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t  r^fV_S S\right) \d t\end{align*}\] where we used the fact that \(D^f(t)=1\). Now under riskneutral measure, we know \[\left(\frac{1}{2}\sigma^2S^2V_{SS} + V_t  r^fV_S S\right)\d t = r^d(V  V_S S)\d t\] which gives the socalled GarmanKohlhagen PDE: \[\frac{1}{2}\sigma^2S^2V_{SS} + (r^d  r^f)V_S S  r^d V + V_t = 0\] with boundary conditions \(V(S_T,T)=(S_TK)^+\) and \(V(0,T)=0\).
Trade date is when the terms of the transaction are agreed. Currency trading is a global, 24hour market. The "trading day" ends at 5pm New York time. Value date is when cash flows occur, i.e., when currencies are delivered. Value date for spot transactions is "T+2" for most currency pairs. However, spot value date is "T+1" for USD versus CAD, RUB, TRY, PHP.
Trade Date (T+0)  T+1  Value Date (T+2) 

Trade terms are agreed  Two currency payments are delivered  
Good day for CCY1 and CCY2 if nonUSD  Good day for CCY1 and CCY2  
Can be a USD holiday  Cannot be a USD holiday 
We usually call currency pairs (CCY1/CCY2, usually "/" is omitted) as any of the following:
CCY1  CCY2 

Base Currency  Terms Currency 
Fixed Currency  Variable Currency 
Home Currency  Overseas Currency 
When we say EURUSD \(= 1.1860\), we mean \(1\) EUR \(=\) \(1.1860\) USD.
In the context of bid offer spreads, we denote the bid and offer prices as EURUSD \(=1.1859/1.1860\) (or \(1.1859/60\) as shorthand). These spreads may vary. The possible reasons may involve liquidity, volatility and cost of risk.
In terms of USD, the direct quotes are CCYUSD and the indirect quotes are USDCCY.
Spot rates calculated from an indirect market, e.g. when EURUSD \(=1.1882\) and USDJPY \(=109.14\), then we have cross rate EURJPY \(=129.68\), which does not necessarily coincides with the actual rate in the market.
It's neither of the interest rates of the two currencies. Instead, people use the deposit rate in this case, specifically in terms of USD, it's the Eurodollar deposit rate, or LIBOR.
Contracts with any delivery date (a.k.a. value date) other than spot are considered forward. Standard delivery dates may be in weeks or months, and otherwise called "broken". Specifically, we call the contract "cash" if its delivery date is today, and "tom" if it's tomorrow. FX forwards are OTC (overthecounter).
We define: \(\text{forward point} = \text{forward rate (outright)}  \text{spot rate}\). The number is usually scaled by \(10^4\).
We have \[\text{forward} =\text{spot} \times \frac{1 + R_{\text{variable CCY}}\times\text{days}/ 360}{1 + R_{\text{fixed CCY}}\times\text{days}/360}\] where we use \(R\) instead of \(P\) here as it's more commonly given.
Using the CIP above, we have \[R^f = \frac{(S/F)\times(1 +R_d\times\text{days}/360)  1}{\text{days}/360}\] where we assume the rates are not compounded.
Contracts that alter the value date on an existing trade by simultaneously executing two forward transactions.
# of legs  FX Risk  IR Spread Risk  

Spot  1  Yes  No 
Forward  1  Yes  Yes 
Swap  2  No  Yes 
We define: \(\text{swap point} = \text{far rate}  \text{near rate}\).
To be continued.
]]>This is a brief selection of my notes on the stochastic calculus course. Content may be updated at times. \(\newcommand{\E}{\text{E}} \newcommand{\P}{\text{P}} \newcommand{\Q}{\text{Q}} \newcommand{\F}{\mathcal{F}} \newcommand{\d}{\text{d}} \newcommand{\N}{\mathcal{N}} \newcommand{\sgn}{\text{sgn}} \newcommand{\tr}{\text{tr}} \newcommand{\bs}{\boldsymbol} \newcommand{\eeq}{\ \!=\mathrel{\mkern3mu}=\ \!} \newcommand{\eeeq}{\ \!=\mathrel{\mkern3mu}=\mathrel{\mkern3mu}=\ \!} \newcommand{\R}{\mathbb{R}} \newcommand{\MGF}{\text{MGF}}\)
For \(X\sim\N(\mu,\sigma^2)\), we have \(\MGF(\theta)=\exp(\theta\mu + \theta^2\sigma^2/2)\). We have \(\E(X^k) = \MGF^{\ (k)}(0)\).
Consider a twosided truncation \((a,b)\) on \(\N(\mu,\sigma^2)\), then \[\E[X\mid a < X < b] = \mu  \sigma\frac{\phi(\alpha)  \phi(\beta)}{\Phi(\alpha)  \Phi(\beta)}\] where \(\alpha:=(a\mu)/\sigma\) and \(\beta:=(b\mu)/\sigma\).
Let \(X\) be a MG and \(T\) a stopping time, then \(\E X_{T\wedge n} = \E X_0\) for any \(n\).
Define \((Z\cdot X)_n:=\sum_{i=1}^n Z_i(X_i  X_{i1})\) where \(X\) is MG with \(X_0=0\) and \(Z_n\) is predictable and bounded, then \((Z\cdot X)\) is MG. If \(X\) is subMG, then also is \((Z\cdot X)\). Furthermore, if \(Z\in[0,1]\), then \(\E(Z\cdot X)\le \E X\).
If \(X\) is MG and \(\phi(\cdot)\) is a convex function, then \(\phi(X)\) is subMG.
Given \(\P\)measure, we define the likelihood ratio \(Z:=\d\Q / \d\P\) for another measure \(\Q\). Then we have
CASH
\(\P\) to STOCK
\(\Q\)measure): \(Z(\omega) = (\d\Q/\d\P)(\omega) = S_N(\omega) / S_0\).If \(B\) is a BM and \(T=\tau(\cdot)\) is a stopping time, then \(\{B_{t+T}  B_T\}_{t\ge T}\) is a BM indep. of \(\{B_t\}_{t\le T}\).
If \(B\) is a standard \(k\)BM and \(U\in\mathbb{R}^{k\times k}\) is orthogonal, then \(UB\) is also a standard \(k\)BM.
For any subMG \(X\), we have unique decomposition \(X=M+A\) where \(M_n:=X_0 + \sum_{i=1}^n [X_i  \E(X_i\mid \F_{i1})]\) is a martingale and \(A_n:=\sum_{i=1}^n[\E(X_i\mid \F_{i1})  X_{i1}]\) is a nondecreasing predictable sequence.
For BM \(B\) and stopping time \(T=\tau(a)\), define \(B^*\) s.t. \(B_t^*=B_t\) for all \(t\le T\) and \(B_t^* = 2a  B_t\) for all \(t>T\), then \(B^*\) is also a BM.
\(\P(\max_{s\le t}B_s > x\text{ and }B_t < y) = \Phi\!\left(\frac{y2x}{\sqrt{t}}\right)\).
Let \(X\) and \(Y\) be indep. BM. Note that for all \(t\ge 0\), from exponential MG we know \(\E[\exp(i\theta X_t)]=\exp(\theta^2 t/2)\). Now define \(T=\tau(a)\) for \(Y\) and we have \(\E[\exp(i\theta X_T)] = \E[\exp(\theta^2 T /2)]=\exp(\theta a)\), which is the Fourier transform of the Cauchy density \(f_a(x)=\frac{1}{\pi}\frac{a}{a^2+x^2}\).
We define Itô integral \(I_t(X) := \int_0^t\! X_s\d W_s\) where \(W_t\) is a standard Brownian process and \(X_t\) is adapted.
This is the direct result from the second martingality property above. Let \(X_t\) be nonrandom and continuously differentiable, then \[\E\!\left[\!\left(\int_0^t X_t\d W_t\right)^{\!\!2}\right] = \E\!\left[\int_0^t X_t^2\d t\right].\]
Let \(W_t\) be a standard Brownian motion and let \(f:\R\mapsto\R\) be a twicecontinously differentiable function s.t. \(f\), \(f'\) and \(f''\) are all bounded, then for all \(t>0\) we have \[\d f(W_t) = f'(W_t)\d W_t + \frac{1}{2}f''(W_t) \d t.\]
Let \(W_t\) be a standard Brownian motion and let \(f:[0,\infty)\times\R\mapsto\R\) be a twicecontinously differentiable function s.t. its partial derivatives are all bounded, then for all \(t>0\) we have \[\d f(t, W_t) = f_x\d W_t + \left(f_t + \frac{1}{2}f_{xx}\right) \d t.\]
The Wiener integral is a special case of Itô integral where \(f(t)\) is here a nonrandom function of \(t\). Variance of a Wiener integral can be derived using Itô isometry.
We say \(X_t\) is an Itô process if it satisfies \[\d X_t = Y_t\d W_t + Z_t\d t\] where \(Y_t\) and \(Z_t\) are adapted and \(\forall t\) \[\int_0^t\! \E Y_s^2\d s < \infty\quad\text{and}\quad\int_0^t\! \EZ_s\d s < \infty.\] The quadratic variation of \(X_t\) is \[[X,X]_t = \int_0^t\! Y_s^2\d s.\]
Assume \(X_t\) and \(Y_t\) are two Itô processes, then \[\frac{\d (XY)}{XY} = \frac{\d X}{X} + \frac{\d Y}{Y} + \frac{\d X\d Y}{XY}\] and \[\frac{\d (X/Y)}{X/Y} = \frac{\d X}{X}  \frac{\d Y}{Y} + \left(\frac{\d Y}{Y}\right)^{\!2}  \frac{\d X\d Y}{XY}.\]
A Brownian bridge is a continuoustime stochastic process \(X_t\) with both ends pinned: \(X_0=X_T=0\). The SDE is \[\d X_t = \frac{X_t}{1t}\d t + \d W_t\] which solves to \[X_t = W_t  \frac{t}{T}W_T.\]
Let \(X_t\) be an Itô process. Let \(u(t,x)\) be a twicecontinuously differentiable function with \(u\) and its partial derivatives bounded, then \[\d u(t, X_t) =\frac{\partial u}{\partial t}(t, X_t)\d t +\frac{\partial u}{\partial x}(t, X_t)\d X_t +\frac{1}{2}\frac{\partial^2 u}{\partial x^2}(t, X_t)\d [X,X]_t.\]
The OU process describes a stochastic process that has a tendency to return to an "equilibrium" position \(0\), with returning velocity proportional to its distance from the origin. It's given by SDE \[\d X_t = \alpha X_t \d t + \d W_t \Rightarrow\d [\exp(\alpha t)X_t] = \exp(\alpha t)\d W_t \] which solves to \[X_t = \exp(\alpha t)\left[X_0 + \int_0^t\! \exp(as)\d W_s\right].\]
Remark: In finance, the OU process is often called the Vasicek model.
The SDE for general diffusion process is \(\d X_t = \mu(X_t)\d t + \sigma(X_t)\d W_t\).
In order to find \(\P(X_T=B)\) where we define \(T=\inf\{t\ge 0: X_t=A\text{ or }B\}\), we consider a harmonic function \(f(x)\) s.t. \(f(X_t)\) is a MG. This gives ODE \[f'(x)\mu(x) + f''(x)\sigma^2(x)/2 = 0\Rightarrowf(x) = \int_A^x C_1\exp\left\{\!\int_A^z\frac{2\mu(y)}{\sigma^2(y)}\d y\right\}\d z + C_2\] where \(C_{1,2}\) are constants. Then since \(f(X_{T\wedge t})\) is a bounded MG, by Doob's identity we have \[\P(X_T=B) = \frac{f(X_0)  f(A)}{f(B)  f(A)}.\]
Let \(\bs{W_t}\) be a \(K\)dimensional standard Brownian motion. Let \(u:\R^K\mapsto \R\) be a \(C^2\) function with bounded first and second partial derivatives. Then \[\d u(\bs{W}_t) = \nabla u(\bs{W}_t)\cdot \d \bs{W}_t + \frac{1}{2}\tr[\Delta u(\bs{W}_t)] \d t\] where the gradient operator \(\nabla\) gives the vector of all first order partial derivatives, and the Laplace operator (or Laplacian) \(\Delta\equiv\nabla^2\) gives the vector of all second order partial derivatives.
If \(T\) is a stopping time for \(\bs{W_t}\), then for any fixed \(t\) we have \[\E[u(\bs{W}_{T\wedge t})] = u(\bs{0}) + \frac{1}{2}\E\!\left[\int_0^{T\wedge t}\!\!\Delta u(\bs{W}_s)\d s\right].\]
A \(C^2\) function \(u:\R^k\mapsto\R\) is said to be harmonic in a region \(\mathcal{U}\) if \(\Delta u(x) = 0\) for all \(x\in \mathcal{U}\). Examples are \(u(x,y)=2\log(r)\) and \(u(x,y,z)=1/r\) where \(r\) is defined as the norm.
Remark: \(f\) being a harmonic function is equivalent to \(f(X_t)\) being a MG, i.e. \(f'(x)\mu(x) + f''(x)\sigma^2(x)/2 = 0\) for a diffusion process \(X_t\).
Let \(u\) be harmonic in the an open region \(\mathcal{U}\) with compact support, and assume that \(u\) and its partials extend continuously to the boundary \(\partial \mathcal{U}\). Define \(T\) to be the first exit time of Brownian motion from \(\mathcal{U}\). for any \(\bs{x}\in\mathcal{U}\), let \(\E^{\bs{x}}\) be the expectation under measure \(\P^{\bs{x}}\) s.t. \(\bs{W}_t  \bs{x}\) is a \(K\)dimensional standard BM. Then
A multivariate Itô process is a continuoustime stochastic process \(X_t\in\R\) of the form \[X_t = X_0 + \int_0^t\! M_s \d s + \int_0^t\! \bs{N}_s\cdot \d \bs{W}_s\] where \(\bs{N}_t\) is an adapted \(\R^K\)−valued process and \(\bs{W}_t\) is a \(K\)−dimensional standard BM.
Let \(\bs{W}_t\in\R^K\) be a standard \(K\)−dimensional BM, and let \(\bs{X}_t\in\R^m\) be a vector of \(m\) multivariate Itô processes satisfying \[\d X_t^i = M_t^i\d t + \bs{N}_t^i\cdot \d \bs{W}_t.\] Then for any \(C^2\) function \(u:\R^m\mapsto\R\) with bounded first and second partial derivatives \[\d u(\bs{X}_t) = \nabla u(\bs{X}_t)\cdot \d \bs{X}_t + \frac{1}{2}\tr[\Delta u(\bs{X}_t)\cdot \d [\bs{X},\bs{X}]_t].\]
Let \(\bs{W}_t\) be a standard \(K\)−dimensional BM, and let \(\bs{U}_t\) be an adapted \(K\)−dimensional process satisfying \[{\bs{U}_t} = 1\quad\forall t\ge 0.\] Then we know the following \(1\)dimensional Itô process is a standard BM: \[X_t := \int_0^t\!\! \bs{U}_s\cdot \d W_s.\]
Let \(\bs{W}_t\) be a standard \(K\)−dimensional BM, and let \(R_t=\bs{W}_t\) be the corresponding radial process, then \(R_t\) is a Bessel process with parameter \((K1)\) given by \[\d R_t = \frac{K1}{R_t}\d t + \d W_t^{\sgn}\] where we define \(\d W_t^{\sgn} := \sgn(\bs{W}_t)\cdot \d \bs{W}_t\).
A Bessel process with parameter \(a\) is a stochastic process \(X_t\) given by \[\d X_t = \frac{a}{X_t}\d t+ \d W_t.\] Since this is just a special case of diffusion processes, we know the corresponding harmonic function is \(f(x)=C_1x^{2a+1} + C_2\), and the hitting probability is \[\P(X_T=B) = \frac{f(X_0)  f(A)}{f(B)  f(A)} =\begin{cases}1 & \text{if }a > 1/2,\\(x/B)^{12a} & \text{otherwise}.\end{cases}\]
Let \(W_t\) be a standard \(1\)dimensional Brownian motion and let \(\F_t\) be the \(\sigma\)−algebra of all events determined by the path \(\{W_s\}_{s\le t}\). If \(Y\) is any r.v. with mean \(0\) and finite variance that is measurable with respect to \(\F_t\), then for some \(t > 0\) \[Y = \int_0^t\! A_s\d W_s\] for some adapted process \(A_t\) that satisfies \[\E(Y^2) = \int_0^t\! \E(A_s^2)\d s.\] This theorem is of importance in finance because it implies that in the BlackSholes setting, every contingent CLAIM
can be hedged.
Special case: let \(Y_t=f(W_t)\) be any mean \(0\) r.v. with \(f\in C^2\). Let \(u(s,x):=\E[f(W_t)\mid W_s = x]\), then \[Y_t = f(W_t) = \int_0^t\! u_x(s,W_s)\d W_s.\]
CASH
with nonrandom rate of return \(r_t\)STOCK
with share price \(S_t\) such that \(\d S_t = S_t(\mu_t \d t + \sigma \d W_t)\)Under a riskneutral measure \(\P\), the discounted share price \(S_t / M_t\) is a martingale and thus \[\frac{S_t}{M_t} = \frac{S_0}{M_0}\exp\left\{\sigma W_t  \frac{\sigma^2t}{2}\right\}\] where we used the fact that \(\mu_t = r_t\) by the Fundamental Theorem.
A European contingent CLAIM
with expiration date \(T > 0\) and payoff function \(f:\R\mapsto\R\) is a tradeable asset that pays \(f(S_T)\) at time \(T\). By the Fundamental Theorem we know the discounted share price of this CLAIM
at any \(t\le T\) is \(\E[f(S_T)/M_T\mid \F_t]\). In order to calculate this conditional expectation, let \(g(W_t):= f(S_t)/M_t\), then by the Markov property of BM we know \(\E[g(W_T)\mid \F_t] = \E[g(W_t + W_{Tt}^*)\mid \F_t]\) where \(W_t\) is adapted in \(\F_t\) and independent of \(W_t^*\).
The discounted time−\(t\) price of a European contingent CLAIM
with expiration date \(T\) and payoff function \(f\) is \[\E[f(S_T)/M_T\mid \F_t] = \frac{1}{M_T}\E\!\left[f\!\left(S_t\exp\!\left\{\sigma W_{Tt}^*  \frac{\sigma^2(Tt)}{2} + R_T  R_t\right\}\right)\middle\F_t\right]\] where \(S_t\) is adapted in \(\F_t\) and independent of \(W_t^*\). The expectation is calculated using normal. Note here \(R_t = \int_0^t r_s\d s\) is the logcompound interest rate.
Under riskneutral probability measure, the discounted share price of CLAIM
is a martingale, i.e. it has no drift term. So we can differentiate \(M_t^{1}u(t,S_t)\) by Itô and derive the following PDE \[u_t(t,S_t) + r_t S_tu_x(t,S_t) + \frac{\sigma^2S_t^2}{2}u_{xx}(t,S_t) = r_t u(t,S_t)\] with terminal condition \(u(T,S_T)=f(S_T)\). Note here everything is under the BS model.
A replicating portfolio for a contingent CLAIM
in STOCK
and CASH
is given by \[V_t = \alpha_t M_t + \beta_t S_t\] where \(\alpha_t = [u(t,S_t)  S_t u_x(t,S_t)]/M_t\) and \(\beta_t = u_x(t,S_t)\).
A barrier option pays \(\$1\) at time \(T\) if \(\max_{t\le T} S_t \ge AS_0\) and \(\$0\) otherwise. This is a simple example of a pathdependent option. Other commonly used examples are knockins, knockouts, lookbacks and Asian options.
The time\(0\) price of such barrier options is calculated from \[\begin{align*}V_0 &= \exp(rT)\P\!\left(\max_{t\le T} S_t \ge AS_0\right)= \exp(rT)\P\!\left(\max_{t\le T} W_t + \mu t \ge a\right)\\&= \exp(rT)\P_{\mu}\!\left(\max_{t\le T} W_t \ge a\right)\end{align*}\] where \(\mu=r\sigma^{1}  \sigma/2\) and \(a = \sigma^{1}\log A\). Now, by CameronMartin we know \[\begin{align*}\P_{\mu}\!\left(\max_{t\le T} W_t \ge a\right) &=\E_0[Z_T\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}] =\E_0[\exp(\mu W_T  \mu^2 T / 2)\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}] \\ &=\exp( \mu^2 T / 2)\cdot \E_0[\exp(\mu W_T)\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}]\end{align*}\] and by reflection principle we have \[\begin{align*}\E_0[\exp(\mu W_T)\cdot \mathbf{1}_{\{\max_{t\le T} W_t\ge a\}}] &=e^{\mu a}\int_0^{\infty} (e^{\mu y} + e^{\mu y}) \P(W_T  a \in \d y) \\&=\Phi(\mu\sqrt{T}  a/\sqrt{T}) + e^{2\mu a}\Phi(\mu\sqrt{T}a/\sqrt{T}).\end{align*}\]
The exponential process \[Z_t = \exp\!\left\{\int_0^t\! Y_s\d W_s  \frac{1}{2}\int_0^t\! Y_s^2\d s\right\}\] is a positive MG given \[\E\!\left[\int_0^t\! Z_s^2Y_s^2\d s\right] < \infty.\] Specifically, the exponential martingale is given by the SDE \(\d X_t = \theta X_t \d W_t\).
Assume that under the probability measure \(\P\) the exponential process \(Z_t(Y)\) is a MG and \(W_t\) is a standard BM. Define the absolutely continuous probability measure \(Q\) on \(\F_t\) with likelihood ratio \(Z_t\), i.e. \((\d\Q/\d\P)_{\F_t} = Z_t\), then under \(Q\) the process \[W_t^* := W_t  \int_0^t\! Y_s\d s\] is a standard BM. Girsanov's Theorem shows that drift can be added or removed by change of measure.
The exponential process \[Z_t = \exp\!\left\{\int_0^t\! Y_s \d W_s  \frac{1}{2}\!\int_0^t\! Y_s^2 \d s\right\}\] is a MG given \[\E\left[\exp\!\left\{\frac{1}{2}\!\int_0^t\! Y_s^2\d s\right\}\right] < \infty.\] This theorem gives another way to show whether an exponential process is a MG.
Assume \(W_t\) is a standard BM under \(\P\), define likelihood ratio \(Z_t = (\d\Q/\d\P)_{\F_t}\) as above where \(Y_t = \alpha W_t\), then by Girsanov \(W_t\) under \(\Q\) is an OU process.
If a system can be in one of a collection of states \(\{\omega_i\}_{i\in\mathcal{I}}\), the probability of finding it in a particular state \(\omega_i\) is proportional to \(\exp\{H(\omega_i)/kT\}\) where \(k\) is Boltzmann's constant, \(T\) is temperature and \(H(\cdot)\) is energy.
If \(W_t\) is standard BM with \(W_0 = x \in (0, A)\), how does \(W_t\) behave conditional on the event that it hits \(A\) before \(0\)? Define
Then the likelihood ratios are \[\left(\frac{\d\Q^x}{\d\P^x}\right)_{\!\F_T} \!= \frac{\mathbf{1}_{\{W_T=A\}}}{\P^x\{W_T=x\}} \Rightarrow\left(\frac{\d\Q^x}{\d\P^x}\right)_{\!\F_{T\wedge t}} \!= \E\!\left[\left(\frac{\d\Q^x}{\d\P^x}\right)_{\!\F_T}\middle\F_{T\wedge t}\right] = \frac{W_{T\wedge t}}{x}.\] Notice \[\begin{align*}\frac{W_{T\wedge t}}{x} &=\exp\left\{\log W_{T\wedge t}\right\} / x \overset{\text{Itô}}{\eeq}\exp\left\{\log W_0 + \int_0^{T\wedge t}W_s^{1}\d W_s  \frac{1}{2}\int_0^{T\wedge t} W_s^{2}\d s\right\} / x \\&=\exp\left\{\int_0^{T\wedge t}W_s^{1}\d W_s  \frac{1}{2}\int_0^{T\wedge t} W_s^{2}\d s\right\}\end{align*}\] which is a Girsanov likelihood ratio, so we conclude \(W_t\) is a BM under \(\Q^x\) with drift \(W_t^{1}\), or equivalently \[W_t^* = W_t  \int_0^{T\wedge t}W_s^{1}\d s\] is a standard BM with initial point \(W_0^* = x\).
A onedimensional Lévy process is a continuoustime random process \(\{X_t\}_{t\ge 0}\) with \(X_0=0\) and i.i.d. increments. Lévy processes are defined to be a.s. right continuous with left limits.
Remark: Brownian motion is the only Lévy process with continuous paths.
Let \(B_t\) be a standard BM. Define the FPT process as \(\tau_x = \inf\{t\ge 0: B_t \ge x\}\). Then \(\{\tau_{x}\}_{x\ge 0}\) is a Lévy process called the onesided stable\(1/2\) process. Specifically, the sample paths \(x\mapsto \tau_x\) is nondecreasing in \(x\). Such Lévy processes with nondecreasing paths are called subordinators.
A Poisson process with rate (or intensity) \(\lambda > 0\) is a Lévy process \(N_t\) such that for any \(t\ge 0\) the distribution of the random variable \(N_t\) is the Poisson distribution with mean \(\lambda t\). Thus, for any \(k=0,1,2,\cdots\) we have \(\P(N_t=k) = (\lambda t)^k\exp(\lambda t)\ /\ k!\) for all \(t > 0\).
Remark 1: (Superposition Theorem) If \(N_t\) and \(M_t\) are independent Poisson processes of rates \(\lambda\) and \(\mu\) respectively, then the superposition \(N_t + M_t\) is a Poisson process of rate \(\lambda+\mu\).
Remark 2: (Exponential Interval) Successive intervals are i.i.d. exponential r.v.s. with common mean \(1/\lambda\).
Remark 3: (Thinning Property) Bernoulli\(p\) r.v.s. by Poisson\(\lambda\) compounding is again Poisson with rate \(\lambda p\).
Remark 4: (Compounding) Every compound Poisson process is a Lévy process. We call the \(\lambda F\) the Lévy measure where \(F\) is the compounding distribution.
For \(N\sim\text{Pois}(\lambda)\), we have \(\MGF(\theta)=\exp[\lambda (e^{\theta}1)]\).
For \(X_t=\sum_{i=1}^{N_t}\!Y_i\) where \(N_t\sim\text{Pois}(\lambda t)\) and \(\MGF_Y(\theta) = \psi(\theta) < \infty\), then \(\MGF_{X_t}(\theta)=\exp[\lambda t (\psi(\theta)  1)]\).
Binomial\((n,p_n)\) distribution, where \(n\to\infty\) and \(p_n\to 0\) s.t. \(np_n\to\lambda > 0\), converges to Poisson\(\lambda\) distribution.
If \(N_t\) is a Poisson process with rate \(\lambda\), then \(Z_t=\exp[\theta N_t  (e^{\theta}  1) \lambda t]\) is a martingale for any \(\theta\in\R\).
Remark: Similar to CameronMartin, let \(N_t\) be a Poisson process with rate \(\lambda\) under \(\P\), let \(\Q\) be the measure s.t. the likelihood ratio \((\d\Q/\d\P)_{\F_t}=Z_t\) is defined as above, then \(N_t\) under \(\Q\) is a Poisson process with rate \(\lambda e^{\theta}\).
If \(X_t\) is a compound Poisson process with Lévy measure \(\lambda F\). Let the MGF of compounding distribution \(F\) be \(\psi(\theta)\), then \(Z_t=\exp[\theta X_t  (\psi(\theta)  1)\lambda t]\) is a martingale for any \(\theta\in\R\).
A \(K\)dimensional Lévy process is a continuoustime random process \(\{\bs{X}_t\}_{t\ge 0}\) with \(\bs{X}_0=\bs{0}\) and i.i.d. increments. Like the onedimensional version, vector Lévy processes are defined to be a.s. right continuous with left limits.
Remark: Given nonrandom linear transform \(F:\R^K\mapsto \R^M\) and a \(K\)dimensional Lévy process \(\{\bs{X}_t\}_{t\ge 0}\), then \(\{F(\bs{X}_t)\}_{t\ge 0}\) is a Lévy process on \(\R^M\).
]]>I am recently playing a billiard game where you can play a series of exciting tournaments. In each tournament, you pay an entrance fee of, for example, \(\$500\), to potentially win a prize of, say, \(\$2500\). There are various kinds of tournaments with different entrance fees ranging from \(\$100\) up to over \(\$10000\). After hundreds of games, my winning rate stablized around \(58\%\), which is actually pretty good as it significantly beats random draws. A natural concept therefore came into my mind: Is there an optimal strategy?
Well, I think so. I'll list two strategies below and try to explore any potential optimality. We can reasonably model these tournaments as repetitive betting with certain fixed physical probability \(p\) of winning and odds^{[1]} of \((d1)\):\(1\) against ourselves. Given that there are sufficiently sparse tournament entrance fees, we may model these fees as a real variable \(x\in\mathbb{R}_+\) to maximize our long run profitability. Without loss of generality, let's assume an initial balance of \(M_0=0\) and that money in this world is infinitely divisible. The problem then becomes determination of the optimal \(x\in[0,1]\) s.t. the expected return is maximized. Nonetheless, regarding different interpretations of this problem we have several solutions. Some are intriguing while others may be frustrating.
Let's first take a look at potential values of \(x\) and the corresponding balance trajectories \(M_t\). For any \(0 \le x \le 1\), we have probability \(p\) to get an \(x\)fraction of our whole balance \(D\)folded and \(1p\) to lose it, that is \[\text{E}(M_{t+1}\mid\mathcal{F}_t) = (1x)M_t + p\cdot xdM_t + (1p)\cdot 0 =[1 + (pd1)x] M_t\] which indicates \(M_t\) is a submartingale^{[2]} as in this particular case, \(p=0.58\), \(d=5\) and thus \(pd=2.9 > 1\). So the optimal fraction is \(x^* = 1\), which is rather aggresive and yields a ruin probability of \(1p^n\) for the first \(n\) bets. Simulation supports our worries: not once did we survived \(10\) bets in this tournament, and the maximum we ever attained is less than a million.
If consider \(\log M_t\) instead, then \[\begin{align*}\text{E}(\log M_t\mid \mathcal{F}_t) &=p\cdot \log[(1x)M_t + xdM_t] +(1p)\cdot \log[(1x)M_t + 0]\\ &=p\cdot \log[(1(1d)x)M_t] +(1p)\cdot \log[(1x)M_t].\end{align*}\] The first order condition is \[\frac{\partial}{\partial x}\text{E}(\log M_t\mid \mathcal{F}_t) =\frac{p(1d)}{1(1d)x}+\frac{1p}{1x} = 0 \quad\Rightarrow\quadx^* = \frac{pd1}{d1}=0.475\] which is more conservative and therefore, should survive longer than the previous strategy. Simulation gives the following trajectories: even the worst sim beat the best we got when \(x=1\).
According to Doob's martingale inequality^{[3]}, the probability of our balance ever attaining a value no less than \(C = 1\times10^{60}\) in \(T=500\) steps is \[\text{P}\left(\sup_{t \le T}M_t\ge C\right) \le \frac{\text{E}(M_T)}{C} = \frac{M_0}{C} \prod_{t=0}^{T1}\frac{\text{E}(M_{t+1}\mid\mathcal{F}_t)}{M_t} =\frac{[1+(pd1)x]^T}{C} \approx 4.6\times10^{139} \gg 1.\] This implies the superior limit of the probability that our balance exceeds \(1\times10^{60}\) within \(500\) steps is one (instead of what simulation gave us, which is around \(0.31\)). To put it differently, we actually might be able to find a certain strategy that is even significantly better than the one given by the Kelly criterion.
What is it, then? Or, does it actually exist? I don't have an idea yet, but perhaps exploratory algorithms like machine learning will give us some hints, and perhaps the strategy is not static but rather dynamic.
I've recently sold my Nvidia GTX 1080 eGPU^{[1]} after two month's waiting in vain for a compatible Nvidia video driver for MacOS 10.14 (Mojave). Either Apple's or Nvidia's fault, I don't care any more. Right away, I ordered an AMD Radeon RX Vega 64 on Newegg. The card arrived two days later and it looked sexy at first sight. It's plugandplay as expected and performed just as good as its predecessor, regardless of serious gaming, video editing or whatever. I would have given it a 9.5/10 if not find another issue a couple of days later — wow, there is no CUDA on this card!
Of course there isn't. Cause CUDA was developed by Nvidia who's been paying great efforts on making a more userfriendly deeplearning environment. Compared with that, AMD (yes!) used to intentionally avoid a headtohead competition against world's largest GPU factory and instead keep making gaming cards with better costtoperformance ratios. ROCm, which is an opensource HPC/Hyperscaleclass platform for GPU computing that allows cards other than Nvidia's, does make this gap much narrower than before. However, ROCm is still publicly not supporting MacOS and you have to run a Linux bootcamp to utilize the computational benefits of your AMD card, even though you can already game smoothly on you Mac. Sad it is, AMD 😰.
There are, however, several solutions if you're people just like me who really have to run your code on a Mac and would like to accelerate those Renaissance training times with a GPU. The method I adapted was by using a framework called PlaidML, and I'd like to walk you through how I installed, and configured my GPU with it.
1  pip3 install plaidmlkeras plaidbench 
After installation, we can set up the intended device for computing by running:
1  plaidmlsetup 
PlaidML Setup (0.3.5)Thanks for using PlaidML!Some Notes: * Bugs and other issues: https://github.com/plaidml/plaidml * Questions: https://stackoverflow.com/questions/tagged/plaidml * Say hello: https://groups.google.com/forum/#!forum/plaidmldev * PlaidML is licensed under the GNU AGPLv3 Default Config Devices: No devices.Experimental Config Devices: llvm_cpu.0 : CPU (LLVM) opencl_intel_intel(r)_iris(tm)_plus_graphics_655.0 : Intel Inc. Intel(R) Iris(TM) Plus Graphics 655 (OpenCL) opencl_cpu.0 : Intel CPU (OpenCL) opencl_amd_amd_radeon_rx_vega_64_compute_engine.0 : AMD AMD Radeon RX Vega 64 Compute Engine (OpenCL) metal_intel(r)_iris(tm)_plus_graphics_655.0 : Intel(R) Iris(TM) Plus Graphics 655 (Metal) metal_amd_radeon_rx_vega_64.0 : AMD Radeon RX Vega 64 (Metal)Using experimental devices can cause poor performance, crashes, and other nastiness.Enable experimental device support? (y,n)[n]:
Of course we enter y
. Before I choose device 4 (OpenCL with AMD) or 6 (Metal with AMD), I'd like to benchmark on the default device, CPU (LLVM). The test script (on MobileNet as an example) is
1  plaidbench keras mobilenet 
and the result shows^{[2]}
Running 1024 examples with mobilenet, batch size 1INFO:plaidml:Opening device "llvm_cpu.0"Downloading data from https://github.com/fchollet/deeplearningmodels/releases/download/v0.6/mobilenet_1_0_224_tf.h517227776/17225924 [==============================]  2s 0us/stepModel loaded.Compiling network...Warming up ...Main timingExample finished, elapsed: 3.0688607692718506 (compile), 61.17863607406616 (execution), 0.059744761791080236 (execution per example)Correctness: PASS, max_error: 1.7511049009044655e05, max_abs_error: 6.556510925292969e07, fail_ratio: 0.0
Now we run the setup again and choose 4 (OpenCL with AMD). The result is
Running 1024 examples with mobilenet, batch size 1INFO:plaidml:Opening device "opencl_amd_amd_radeon_rx_vega_64_compute_engine.0"Model loaded.Compiling network...Warming up ...Main timingExample finished, elapsed: 2.6935510635375977 (compile), 13.741217851638794 (execution), 0.01341915805824101 (execution per example)Correctness: PASS, max_error: 1.7511049009044655e05, max_abs_error: 1.1995434761047363e06, fail_ratio: 0.0
Finally we run the test against the expected most powerful device, i.e. device 6 (Metal with AMD).
Running 1024 examples with mobilenet, batch size 1INFO:plaidml:Opening device "metal_amd_radeon_rx_vega_64.0"Model loaded.Compiling network...Warming up ...Main timingExample finished, elapsed: 2.243159055709839 (compile), 7.515545129776001 (execution), 0.007339399540796876 (execution per example)Correctness: PASS, max_error: 1.7974503862205893e05, max_abs_error: 1.0952353477478027e06, fail_ratio: 0.0
As a conclusion, by utilizing the Metal core on my Mac as well as the external AMD GPU, the training runtime was roughly 87.7% down and I'm personally quite satisfied with that.
]]>It's been more than two years since my last trip to the Arctic Circle when I was still studying in the Netherlands. Our adventurous hike in Abisko, in endless Northern European Mountains, was still a frequent dream of mine. This time we went to Fairbanks, Alaska, for Aurora and also, for another Arctic experience.