Fix english, generalize writing, improve clarity

This commit is contained in:
robcaulk 2022-07-17 20:58:26 +02:00
parent b65e4b5049
commit 56ad107769

View File

@ -408,17 +408,9 @@ It is common to want constant retraining, in whichcase, user should set `live_re
### Controlling the model learning process
Depending on what AI model to be used, these parameter names could be different. For example, the accepted parameters for the `Catboost`
models are `n_estimators`, `task_type` and others. For the model like SVM regression model, the accepted parameters are different.
Model training parameters are unqiue to the library employed by the user. FreqAI allows users to set any parameter for any library using the `model_training_parameters` dictionary in the user configuration file. The example configuration files show some of the example parameters associated with `Catboost` and `LightGBM`, but users can add any parameters available in those libraries.
Here we explan the parameters of `model_training_parameters` for `Catboost`:
The user can define model settings for the data split `data_split_parameters` and learning parameters
`model_training_parameters`. Users are encouraged to visit the Catboost documentation
for more information on how to select these values. `n_estimators` increases the
computational effort and the fit to the training data. If a user has a GPU
installed in their system, they may benefit from changing `task_type` to `GPU`.
The `weight_factor` allows the user to weight more recent data more strongly
Data split parameters are defined in `data_split_parameters` which can be any parameters associated with `Sklearn`'s `train_test_split()` function. Meanwhile, FreqAI includes some additional parameters such `weight_factor` which allows the user to weight more recent data more strongly
than past data via an exponential function:
$$ W_i = \exp(\frac{-i}{\alpha*n}) $$
@ -427,11 +419,11 @@ where $W_i$ is the weight of data point $i$ in a total set of $n$ data points._
![weight-factor](assets/weights_factor.png)
Finally, `period` defines the offset used for the `labels`. In the present example,
`train_test_split()` has a parameters called `shuffle`, which users also have access to in FreqAI, that allows them to keep the data unshuffled. This is particularly useful to avoid biasing training with temporally autocorrelated data.
Finally, `label_period_candles` defines the offset used for the `labels`. In the present example,
the user is asking for `labels` that are 24 candles in the future.
Note: typically in time-series forecasting, the validation/test data should be the "future" by a given training data. Thus, it is recommended to disable `shuffle` parameter during the cross-validation or validation steps. For more detailed explaination, visit [here](https://medium.com/@soumyachess1496/cross-validation-in-time-series-566ae4981ce4).
### Removing outliers with the Dissimilarity Index
The Dissimilarity Index (DI) aims to quantify the uncertainty associated with each