On many Kaggle public leaderboards, you will see an algorithm called "XGBoost", or "Xtreme Gradient Boosting". This is an optimized algorithm under the Gradient Boosting framework. The key difference between the normal boosting models and XGBoost, as far as I can see, is that XGBoost allows for parallel processing while keeps the accuracy on its performance. This dramatically makes large-scale machine learning algorithms (since boosting is essentially an ensemble of several weak learning algorithms) possible on personal computers (even on my MBP 2015), and further empowers people like me to apply them onto more practical scenarios, e.g. stock price prediction as we are gonna talk about today. Highest tribute to Tianqi Chen and his great paper!

## Performance

Backtest Interval: 20130101 - 20170101 Initial Capital: 1,000,000 Annualized Return (Strategy): TBD Annualized Return (Benchmark): TBD Sharpe Ratio: TBD Maximum Drawdown: TBD

(Performance screenshot not available at present. Will be uploaded later.)

## Intuition: Long-term vs. Short-term Prediction

I got the inspiration from this paper. The author raised an interesting but also convincing point when doing stock price prediction: the long-term trend is always easier to predict than the short-term. Again, let's take AAPL for example. The Apple Inc. witnessed a close at \$139.14 per share today, a slight rise by 0.33 percent point. What is the expected price change in percentage for, say, tomorrow? Also let's assume that you have access to all the historical data you want, regardless of the size of the dataset or other computational issues? What if an chief engineer suddenly decides to resign from the company and the news leaks instantly through social networks? What if Samsung or another mobile phone company launches its newest prototype right after your prediction? What if, just for fun, the new prototype of Samsung (or whatever) should explode during the presentation, on the spot? The randomness is so changable that our prediction could even be totally the opposite of the true story. However, in a sense of long-term prediction, things get easier, since the overall relationship between Apple and Samsung would least likely change in the near future, and the general performance of AAPL is in principle dependent on how Apple Inc. behave in the following months. So GIVEN THAT COOPER IS NOT INSANE, BUT A RATHER CAUTIOUS CEO, we can expect a higher accuracy when predicting the average price change of AAPL in the next three months (60 tradingdays or so).

Notice again that we are not talking about predicting the close of the day three months from now. We are talking about the AVERAGE price change of the following three months. Now, let's turn to see what predictor we can use to perform the idea.

## Tool: Technical Indicators & XGBoost Classification

We are using the classification algorithm in XGBoost. The regression algorithm is slower but more intuitive, but binary classification is just faster in this case, which allows for efficient backtesting later on. The prediction is based on a certain length of historical data including open, high, low, close and volume, along with other 77 techinical indicators that are not extracted until before the prediction. This is of course far from "all historical data" in the market, not to mention the off-market rumors, social and political concerns, etc. But this is enough for illustration here. Therefore, a good amendment for this strategy is surely to append these left-out indicators. For NLA and similar applications in stock price prediction, you can check my post about making Mr. Trump's tweets into prediction indicators. I didn't acutally write the project because I found a really mature application already written on Github, but I would definitely rewrite it myself during the summer vacation. If you find the post not changed even after that, please feel free to mail me and "harshly" urge me for the overdue task.

## Codes

The codes mainly consists of three parts: - Definition of techinical indicators. - Two Python classes: classifier and stock. - Main program, including configuration parameters for the backtesting framework.

# References:

• Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
• Dey, Shubharthi, et al. "Forecasting to Classification: Predicting the direction of stock market price using Xtreme Gradient Boosting."