# Ensemble Modeling in a Binary Classification Problem in Chinese A-Share Market

In this research paper we try to use as much information on a stock as we can on Ricequant, to train a robust binary classifier for expected returns on a rolling basis. As an extra, we create a brand-new accuracy metric based on behavioral economics for model traing, which enhanced the fitting of the models (in the language of classical metrics, e.g. accuracy or pricision scores) by 3 to 5 times. The advantage of this new metric will be covered in the corresponding sector.

## Section 1: Environment Preparation

First we import the necessary packages we're going to use later.

1 | %config InlineBackend.figure_format = 'retina' |

Global configurations.

1 | pool = index_components('000050.XSHG') |

## Section 2: Data Preparation

Load the raw data and encapsulate into a Pandas panel.

1 | today = datetime.today() |

Some further data investigation.

Unshifted data for training:

1 | for f in range(17,78): X_.ix[:,f] = X_.ix[:,f].astype('category') |

Shift the data so that they corresponds with

\[ y_i = clf(X_i). \]

1 | X = X_.ix[:-1,:] |

```
(60, 78)
(60,)
1 32
0 28
dtype: int64
```

1 | y.describe() |

```
count 60.000000
mean 0.533333
std 0.503098
min 0.000000
25% 0.000000
50% 1.000000
75% 1.000000
max 1.000000
dtype: float64
```

1 | X.describe(include=['number']) |

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

count | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 60.000000 | 6.000000e+01 | 60.000000 | 60.000000 |

mean | 0.976212 | 25.990400 | 25.656650 | 24.940308 | 22.879636 | 21.834747 | 20.977319 | 39.063571 | 35.434052 | 0.852667 | 67.581394 | 26.807436 | 25.990400 | 25.173364 | 1.039211e+09 | 0.723869 | 24.547501 |

std | 0.015776 | 2.827868 | 2.770969 | 2.614787 | 1.668471 | 1.437906 | 1.220115 | 7.954479 | 7.286649 | 0.354867 | 7.811732 | 3.090838 | 2.827868 | 2.616494 | 1.263014e+08 | 0.168018 | 4.961348 |

min | 0.948008 | 22.640000 | 22.085000 | 21.392000 | 20.650167 | 19.762876 | 19.285863 | 25.863478 | 26.096620 | 0.413193 | 54.834433 | 22.887871 | 22.640000 | 22.083778 | 8.750972e+08 | 0.527858 | 16.936824 |

25% | 0.965606 | 23.293000 | 23.043000 | 22.981875 | 21.499667 | 20.572173 | 19.949996 | 32.360567 | 27.708563 | 0.528046 | 61.687466 | 23.800034 | 23.293000 | 22.799033 | 9.242619e+08 | 0.578824 | 20.770453 |

50% | 0.974243 | 25.244000 | 24.591000 | 24.007250 | 22.387833 | 21.624111 | 20.712315 | 39.207117 | 34.467175 | 0.749777 | 66.749465 | 26.030462 | 25.244000 | 24.397322 | 1.038170e+09 | 0.656475 | 23.760415 |

75% | 0.987142 | 29.210500 | 28.633250 | 27.278500 | 24.174250 | 23.017472 | 21.933203 | 46.661330 | 41.646060 | 1.179799 | 73.758827 | 30.086332 | 29.210500 | 27.848188 | 1.159543e+09 | 0.852259 | 27.627880 |

max | 1.007653 | 30.286000 | 29.820000 | 29.691500 | 26.243333 | 24.528111 | 23.411667 | 49.526365 | 46.601986 | 1.427553 | 82.767827 | 31.203224 | 30.286000 | 29.442553 | 1.301344e+09 | 1.065128 | 35.042981 |

1 | X.describe(include=['category']) |

17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

count | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 |

unique | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 | 2.0 | 3.0 | 1.0 | 3.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 3.0 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 2.0 | 3.0 | 3.0 | 3.0 | 4.0 | 1.0 | 2.0 | 1.0 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 | 2.0 | 3.0 | 3.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 1.0 | 2.0 | 2.0 | 3.0 | 3.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 |

top | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

freq | 60.0 | 60.0 | 60.0 | 59.0 | 57.0 | 60.0 | 60.0 | 60.0 | 59.0 | 47.0 | 60.0 | 55.0 | 60.0 | 60.0 | 59.0 | 51.0 | 60.0 | 60.0 | 53.0 | 60.0 | 60.0 | 60.0 | 59.0 | 59.0 | 59.0 | 55.0 | 58.0 | 49.0 | 51.0 | 60.0 | 59.0 | 60.0 | 60.0 | 59.0 | 60.0 | 60.0 | 60.0 | 51.0 | 48.0 | 57.0 | 57.0 | 60.0 | 60.0 | 60.0 | 60.0 | 60.0 | 55.0 | 60.0 | 59.0 | 59.0 | 51.0 | 41.0 | 60.0 | 60.0 | 60.0 | 60.0 | 59.0 | 60.0 | 60.0 | 60.0 | 60.0 |

1 | unbalance = sum(y==1)/len(y) |

`0.53333333333333333`

1 | data = X |

```
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
'13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
'25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36',
'37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48',
'49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60',
'61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72',
'73', '74', '75', '76', '77', 'y'],
dtype='object')
```

Quick look at the distribution of y.

1 | if verbose: |

At first we can see that the target variable is distributed quite equally. We won't perform any actions to deal with imbalanced dataset. First we present the continuous data using boxplot (described in the following image)

Boxplot of y against continuous variables.

1 | if verbose: |

Pairplot of all continous variables.

1 | if verbose: |

Dummy encoding for categorical variables.

1 | for i in range(17,78): |

## Section 3: Feature Selection

Drop columns that contains only one value.

1 | mask = data.std() == 0 |

1 | X = data_valid.drop('y', axis=1) |

`['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '19#0.0', '19#100.0', '21#-100.0', '21#0.0', '24#-100.0', '24#0.0', '26#-100.0', '26#0.0', '26#100.0', '28#0.0', '28#100.0', '29#-100.0', '29#0.0', '30#-100.0', '30#0.0', '30#100.0', '31#-200.0', '31#-100.0', '31#0.0', '31#100.0', '32#0.0', '32#100.0', '36#-100.0', '36#0.0', '36#100.0', '37#0.0', '37#100.0', '40#0.0', '40#100.0', '41#0.0', '41#100.0', '42#-100.0', '42#0.0', '42#100.0', '45#-100.0', '45#0.0', '49#0', '49#1', '51#0', '51#1', '53#0', '53#1', '54#0', '54#1', '55#0', '55#1', '56#0', '56#1', '57#0', '57#1', '58#0', '58#1', '59#0', '59#1', '60#0', '60#1', '61#0', '61#1', '62#0', '62#1', '64#0', '64#1', '65#0', '65#1', '66#0', '66#1', '68#0', '68#1', '69#0', '69#1', '70#0', '70#1', '72#0', '72#1', '75#0', '75#1', '76#0', '76#1']`

`Variance Ranking`

1 | vt = VarianceThreshold().fit(X) |

`['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '16', '31#0.0', '36#0.0', '42#0.0', '65#0', '70#0', '70#1']`

`Random Forest`

1 | model = RandomForestClassifier() |

`['16', '2', '13', '14', '9', '8', '0', '1', '15', '10', '7', '6', '68#0', '70#1', '4', '64#0', '11', '30#0.0', '3', '5']`

`Chi2 Test`

1 | X_minmax = MinMaxScaler([0,1]).fit_transform(X) |

`['42#-100.0', '64#1', '36#-100.0', '70#0', '68#1', '40#100.0', '31#-100.0', '60#1', '76#1', '51#0', '30#100.0', '32#100.0', '24#-100.0', '21#-100.0', '53#0', '30#-100.0', '57#1', '58#1', '59#1', '62#1']`

`Recursive Feature Elimination (RFE)`

with logistic regression model.

1 | rfe = RFE(LogisticRegression(),20) |

`['1', '2', '3', '4', '5', '6', '7', '8', '10', '11', '12', '13', '14', '16', '36#0.0', '40#0.0', '42#0.0', '68#0', '70#0', '70#1']`

`Final selection of features`

is the union of all previous sets.

1 | features = np.hstack([feat_var_threshold,feat_imp_20,feat_scored_20,feat_rfe_20]) |

`Final features (46 in total): 0, 1, 10, 11, 12, 13, 14, 15, 16, 2, 21#-100.0, 24#-100.0, 3, 30#-100.0, 30#0.0, 30#100.0, 31#-100.0, 31#0.0, 32#100.0, 36#-100.0, 36#0.0, 4, 40#0.0, 40#100.0, 42#-100.0, 42#0.0, 5, 51#0, 53#0, 57#1, 58#1, 59#1, 6, 60#1, 62#1, 64#0, 64#1, 65#0, 68#0, 68#1, 7, 70#0, 70#1, 76#1, 8, 9`

## Section 4: Model Training

**1. Split the training and testing data (ratio: 3:1).**

1 | data_clean = data_valid.ix[:,features.tolist()+['y']] |

```
Clean dataset shape: (60, 47)
Train features shape: (45, 46)
Test features shape: (15, 46)
Train label shape: (45,)
Test label shape: (15,)
```

**2. PCA visualization of training data**

PCA plot

1 | if verbose: |

Lmplot

1 | if verbose: |

**3. A New Accuracy Metric Based on Utility and Risk-Aversion**

Instead of using existing accuracy or error metrics, e.g. accuracy scores and log loss, we tend to come up with our own metric that suits this scenario better. According to classical utility theory, the utility on the expected net return of a transaction should at least follow these properties:

- Higher expected returns means higher utility;
- Zero expected return means zero absolute utility;
- Marginal utility decreases with higher expected returns;
- The utility is robust of scaling.

Mathematically, therefore, we know a well-behaved utility function \(U(x)\) has:

- A non-negative first-order derivative;
- A non-positive second-order derivative;
- A zero at \(x=0\).

However, since late 20th century, this setting of utility has been widely criticized and the voice was mainly from behavioral economics. This group of people managed a huge number of empirical experiments and showed how poor such utility models work when the variation of risk-aversion is considered. Risk-aversion was originally introduced to catch the aversion of a human against uncertainty. In classical economics, there are a range of measures to depict such aversion. One of the most famous is Arrow–Pratt measure of absolute risk-aversion (ARA), which is defined based on the utility function:

\[ ARA = -\frac{U^{\prime}(x)}{U^{\prime\prime}(x)}. \]

The Arrow-Pratt absolute risk-aversion is well successful not only because it catches the concavity of the utility, but also it can be expanded into many special cases, mainly w.r.t. different classical utility functions like exponential or hyperbolic absolute utility. However, it is not in line with common sense, pointed by Daniel Kahneman and Amos Tversky in their prospect theory in 1972. The theory has been well further developed since 1992 and is now accepted as a more realistic model for uncertainty perception psychologically.

Different from the classical expected utility theory, the prospect theory specifies the utility in the following four implications:

- Certainty effect: most people are risk-averse about gaining;
- Reflection effect: most people are risk-loving about losing;
- Loss aversion: most people are more sensitive to losses than to gains;
- Reference dependence: most people's perception of uncertainty is based on the reference point.

Considering here the notion of "most people" is typically based on the fact tha most investors are more or less risk-averse, we simplify the model by giving assumptions as follows:

- The first-order derivative of the utility is non-negative for all outcome;
- For risk-averse investors, the second-order derivative of the utility is non-negative for losses while non-positive for gains;
- For risk-loving investors, the second-order derivative of the utility is non-positive for losses while non-negative for gains;
- For risk-neutral investors, the second-order derivative is zero everywhere

while for the loss aversion implication, we don't take it into consideration as the influence turned out to be minuscule compared with the loss of model simplicity.

Therefore, with the previous four assumptions, we can easily come up with a nice utility function w.r.t. prediction accuracy:

\[ U(x) = sgn(x-1/2)|2x-1|^{2^{logit\left(\frac{r+1}{2}\right)}} \]

where \(sgn(\cdot)\) is the sign function and \(logit(\cdot)\) is the logit function, which is also known as the inverse of the sigmoidal "logistic" function:

\[ logit(x)=\ln\left(\frac{x}{1-x}\right). \]

It is easy to validate that our utility follows the configuration and because of its monotonicity and continuity in \([0,1]\), is a well-behaved accuracy metric for further learning algorithms.

`Utility Curve`

1 | f = lambda x, r: (2*(x>0.5)-1)*abs(2*x-1)**(2**logit((r+1)/2)) |

1 | if verbose: |

1 | def custom_score(y_true, y_pred): |

1 | seed = 7 |

First let's have a quick spot-check.

1 | # Some basic models |

```
LR: (0.117) +/- (0.436)
LDA: (0.518) +/- (0.163)
KNN: (0.255) +/- (0.263)
DT: (0.255) +/- (0.263)
GNB: (-0.106) +/- (0.393)
SVC: (0.117) +/- (0.436)
```

Let's first look at ensemble results.

## Section 5: Ensemble Modeling and Validation

**1. Bagging (Bootstrap Aggregation)**

Prediction of a bagging model is the average of all sub-models.

`Bagged Decision Trees`

Bagged Desicion Trees performs the best when the variance is large in the dataset.

1 | cart = DecisionTreeClassifier() |

`(0.158) +/- (0.312)`

`Random Forest`

Random forest is a famous extension to bagged decision trees. It is usually more precise but slower, especially for large number of leaves.

1 | num_trees = 100 |

`(-0.082) +/- (0.295)`

`Extra Trees`

Randomness is introduced to investigate further precision.

1 | num_trees = 100 |

`(0.015) +/- (0.484)`

**2. Boosting** Boosting ensembles a sequence of weak learners for better performance.

`AdaBoost`

AdaBoost simply gives the weighted average of results by a series of weak learners, with updating the weight vector in each iteration.

1 | model = AdaBoostClassifier(n_estimators=100, random_state=seed) |

`(0.291) +/- (0.387)`

`Stochastic Gradient Boosting`

Gradient Tree Boosting or Gradient Boosted Regression Trees (GBRT) is a generalization of boosting. It uses arbitrary differentiable loss functions so that is more accurate and effective.

1 | model = GradientBoostingClassifier(n_estimators=100, random_state=seed) |

`(0.291) +/- (0.387)`

`Extreme Gradient Boosting`

A (usually) more efficient gradient boosting algorithm by Tianqi Chen.

1 | model = XGBClassifier(n_estimators=100, seed=seed) |

`(0.189) +/- (0.459)`

**3. Hyperparameter tuning**

As a matter of fact, hyperparameter tuning can matter a lot here, and thus to actual determine which models are the best, we need to run grid searching and cross validation of the training dataset for the best scores and the model configurations corresponing to them.

1 | estimator_list = [] |

`Logistic Regression`

1 | lr_grid = GridSearchCV(estimator = LogisticRegression(random_state=seed), |

```
0.3726887283268559
{'penalty': 'l1', 'C': 1}
```

`Linear Discriminant Analysis`

1 | lda_grid = GridSearchCV(estimator = LinearDiscriminantAnalysis(), |

```
0.5622889460982887
{'n_components': None, 'solver': 'svd'}
```

`Decision Tree`

1 | dt_grid = GridSearchCV(estimator = DecisionTreeClassifier(random_state=seed), |

```
0.34852862485121184
{'criterion': 'gini', 'max_depth': None, 'max_features': None}
```

`K-Nearest Neighbors`

1 | knn_grid = GridSearchCV(estimator = KNeighborsClassifier(), |

```
0.30539557863515217
{'leaf_size': 2, 'algorithm': 'ball_tree', 'n_neighbors': 5, 'p': 1}
```

`Random Forest`

1 | rf_grid = GridSearchCV(estimator = RandomForestClassifier(warm_start=True, random_state=seed), |

```
0.30113313283861066
{'bootstrap': True, 'max_depth': 5, 'n_estimators': 100, 'max_features': None, 'criterion': 'entropy'}
```

`Extra Trees`

1 | ext_grid = GridSearchCV(estimator = ExtraTreesClassifier(warm_start=True, random_state=seed), |

```
0.2601389681995039
{'bootstrap': True, 'max_depth': 10, 'n_estimators': 100, 'max_features': 20, 'criterion': 'entropy'}
```

`AdaBoost`

1 | ada_grid = GridSearchCV(estimator = AdaBoostClassifier(random_state=seed), |

```
0.30113313283861066
{'n_estimators': 200, 'algorithm': 'SAMME', 'learning_rate': 0.1}
```

`Gradient Boosting`

1 | gbm_grid = GridSearchCV(estimator = GradientBoostingClassifier(warm_start=True, random_state=seed), |

```
0.4376019643046027
{'n_estimators': 100, 'max_depth': 3, 'learning_rate': 0.01, 'max_features': None}
```

`Extreme Gradient Boosting`

1 | xgb_grid = GridSearchCV(estimator = XGBClassifier(nthread=1, seed=seed), |

```
0.556562176665963
{'gamma': 0, 'min_child_weight': 1, 'max_depth': 5, 'learning_rate': 0.01, 'n_estimators': 200}
```

`Support Vector Classification`

1 | svc_grid = GridSearchCV(estimator = SVC(probability=True, class_weight='balanced'), |

```
0.3422932503617775
{'gamma': 0.01, 'C': 0.1}
```

**4. Voting ensemble**

1 | best_score_list_rounded = [round(s,3) for s in best_score_list] |

model | LR | LDA | DT | KNN | RF | EXT | ADA | GBM | XGB | SVC |
---|---|---|---|---|---|---|---|---|---|---|

score | 0.373 | 0.562 | 0.349 | 0.305 | 0.301 | 0.26 | 0.301 | 0.438 | 0.557 | 0.342 |

rank | 3 | 0 | 4 | 6 | 7 | 9 | 7 | 2 | 1 | 5 |

1 | # Create sub models |

`(0.516) +/- (0.200)`

It is clear that the ensemble model further enhanced the performance of the seperate models. Now we try to make actual predictions and see if the results are robust.

**5. Make predictions**

1 | model = ensemble |

1 | print('Unbalance of the data: {:.3f}'.format(unbalance)) |

`Unbalance of the data: 0.533`

Now apart from the utility, we can check our prediction based on some other metrics, e.g.:

`Accuracy`

which is defined by

\[ \begin{align*} Accuracy &=\frac{|True\ Positive|+|True\ Negative|}{|Total\ Polulation|}\\ &=\frac{|True\ Positive|+|True\ Negative|}{|True\ Positive|+|False\ Positive|+|True\ Negative|+|False\ Negative|}\\ &=\frac{1}{n}\sum_{i=1}^n\mathbb{1}(\hat{y}_i=y_i) \end{align*} \]

and should be bounded within \([0,1]\), where \(1\) indicates perfect prediction. This is also called total accuracy, and it calculates the percentage of a right guess.

1 | ac = accuracy_score(y_test, y_pred) |

`Accuracy: 0.667`

`Precision`

which is defined by

\[ Precision = \frac{|True\ Positive|}{|True\ Positive|+|False\ Positive|}= \frac{\sum_{i=1}^n\mathbb{1}(\hat{y}_i=1\mid y_i=1)}{\sum_{i=1}^n[\mathbb{1}(\hat{y}_i=1\mid y_i=1)+\mathbb{1}(\hat{y}_i=1\mid y_i=0)]}. \]

Similar as accuracy, this is also bounded and indicates perfect prediction when the value is 1. However, precision gives intuition about the percentage of correct guesses among all your guesses, so in this case the probability that your actual transaction is in the right direction.

1 | pc = precision_score(y_test, y_pred) |

`Precision: 0.818`

`Recall`

which is defined by

\[ Recall = \frac{|True\ Positive|}{|True\ Positive|+|False\ Negative|}= \frac{\sum_{i=1}^n\mathbb{1}(\hat{y}_i=1\mid y_i=1)}{\sum_{i=1}^n[\mathbb{1}(\hat{y}_i=1\mid y_i=1)+\mathbb{1}(\hat{y}_i=0\mid y_i=1)]}. \]

Recall is also bounded and indicates perfect prediction when it's 1, but different from precision, it gives intuition about the percentage of actual signals being predicted, i.e. in this case, the probability that you catch an actual appreciation.

1 | re = recall_score(y_test, y_pred) |

`Recall: 0.750`

Although not included in this notebook, we think it is very important and encouraging to mention what these scores mean, compared with when other scoring functions are used in grid searching. As a matter of fact, the average accuracy given by the ensemble model using accuracy or log loss directly is much lower than these figures above. In general, the model prediction accuracy has been improved from 15% - 25% to 60% - 80%, i.e. 2 to 5 times. The effect of introducing this utility-like scoring function for hyperparameter tuning is substantial, though needs further theoretical proofs, of course.

Lastly, let's check this utility value for the testing dataset.

1 | cs = custom_score(y_test, y_pred) |

`Utility: 0.287`

which is thus robust (even higher, in fact) out of sample.

## Conclusion and Prospective Improvements

The ensemble model I've just shown above is quite naive, I would say, and is far from "good". The metric is more or less still quite arbitrary and the algorithm is rather slow (so I set window length to 3 from 60 at first, which was intended to train a model on data of 5 years), and thus on an unprofessional platform like Ricequant we're not allowed to run through the whole market and search for a best portfolio -- the most predictable ones. In the backtest strategy I implimented based on this research paper, I only chose 5 stocks arbitrarily from the index components of 000050.XSHG, and this can have an unpredictable downside effect on the model performance. A more desirable idea would be to set up a local backtest environment and implement this process with the help of GPU and faster languages like C++. Moreover, overfitting is possible, very possible, and thus whether it will make a good strategy needs much more work of validation.

## References

- Arrow, K. J. (1965). "Aspects of the Theory of Risk Bearing". The Theory of Risk Aversion. Helsinki: Yrjo Jahnssonin Saatio. Reprinted in: Essays in the Theory of Risk Bearing, Markham Publ. Co., Chicago, 1971, 90–109.
- Pratt, J. W. (1964). "Risk Aversion in the Small and in the Large". Econometrica. 32 (1–2): 122–136.
- Kahneman, Daniel; Tversky, Amos (1979). "Prospect theory: An analysis of decision under risk". Econometrica: Journal of the econometric society. 47 (2): 263–291.
- Chen, T., & Guestrin, C. (2016). "Xgboost: A scalable tree boosting system". In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). ACM.