slightly update doc wording

This commit is contained in:
Matthias 2022-08-14 17:08:29 +02:00
parent c9c128f781
commit a5e96881f4

View File

@ -8,7 +8,7 @@ Among the the features included:
* **Self-adaptive retraining**: retrain models during live deployments to self-adapt to the market in an unsupervised manner. * **Self-adaptive retraining**: retrain models during live deployments to self-adapt to the market in an unsupervised manner.
* **Rapid feature engineering**: create large rich feature sets (10k+ features) based on simple user created strategies. * **Rapid feature engineering**: create large rich feature sets (10k+ features) based on simple user created strategies.
* **High performance**: adaptive retraining occurs on separate thread (or on GPU if available) from inferencing and bot trade operations. Keep newest models and data in memory for rapid inferencing. * **High performance**: adaptive retraining occurs on separate thread (or on GPU if available) from inferencing and bot trade operations. Keep newest models and data in memory for rapid inferencing.
* **Realistic backtesting**: emulate self-adaptive retraining with backtesting module that automates past retraining. * **Realistic backtesting**: emulate self-adaptive retraining with backtesting module that automates past retraining.
* **Modifiable**: use the generalized and robust architecture for incorporating any machine learning library/method available in Python. Seven examples available. * **Modifiable**: use the generalized and robust architecture for incorporating any machine learning library/method available in Python. Seven examples available.
* **Smart outlier removal**: remove outliers from training and prediction sets using a variety of outlier detection techniques. * **Smart outlier removal**: remove outliers from training and prediction sets using a variety of outlier detection techniques.
@ -37,7 +37,7 @@ The example strategy, example prediction model, and example config can all be fo
The user provides FreqAI with a set of custom *base* indicators (created inside the strategy the same way The user provides FreqAI with a set of custom *base* indicators (created inside the strategy the same way
a typical Freqtrade strategy is created) as well as target values which look into the future. a typical Freqtrade strategy is created) as well as target values which look into the future.
FreqAI trains a model to predict the target value based on the input of custom indicators for each pair in the whitelist. These models are consistently retrained to adapt to market conditions. FreqAI offers the ability to both backtest strategies (emulating reality with periodic retraining) and deploy dry/live. In dry/live conditions, FreqAI can be set to constant retraining in a background thread in an effort to keep models as young as possible. FreqAI trains a model to predict the target value based on the input of custom indicators for each pair in the whitelist. These models are consistently retrained to adapt to market conditions. FreqAI offers the ability to both backtest strategies (emulating reality with periodic retraining) and deploy dry/live. In dry/live conditions, FreqAI can be set to constant retraining in a background thread in an effort to keep models as young as possible.
An overview of the algorithm is shown here to help users understand the data processing pipeline and the model usage. An overview of the algorithm is shown here to help users understand the data processing pipeline and the model usage.
@ -66,7 +66,7 @@ directly influence nodal weights within the model.
## Install prerequisites ## Install prerequisites
The normal Freqtrade install process will ask the user if they wish to install `FreqAI` dependencies. The user should reply "yes" to this question if they wish to use FreqAI. If the user did not reply yes, they can manually install these dependencies after the install with: The normal Freqtrade install process will ask the user if they wish to install FreqAI dependencies. The user should reply "yes" to this question if they wish to use FreqAI. If the user did not reply yes, they can manually install these dependencies after the install with:
``` bash ``` bash
pip install -r requirements-freqai.txt pip install -r requirements-freqai.txt
@ -75,21 +75,21 @@ pip install -r requirements-freqai.txt
!!! Note !!! Note
Catboost will not be installed on arm devices (raspberry, Mac M1, ARM based VPS, ...), since Catboost does not provide wheels for this platform. Catboost will not be installed on arm devices (raspberry, Mac M1, ARM based VPS, ...), since Catboost does not provide wheels for this platform.
## Configuring the bot ## Configuring FreqAI
### Parameter table ### Parameter table
The table below will list all configuration parameters available for `FreqAI`. The table below will list all configuration parameters available for FreqAI.
Mandatory parameters are marked as **Required**, which means that they are required to be set in one of the possible ways. Mandatory parameters are marked as **Required**, which means that they are required to be set in one of the possible ways.
| Parameter | Description | | Parameter | Description |
|------------|-------------| |------------|-------------|
| `freqai` | **Required.** The dictionary containing all the parameters for controlling FreqAI. <br> **Datatype:** dictionary. | `freqai` | **Required.** The parent dictionary containing all the parameters below for controlling FreqAI. <br> **Datatype:** dictionary.
| `identifier` | **Required.** A unique name for the current model. This can be reused to reload pre-trained models/data. <br> **Datatype:** string. | `identifier` | **Required.** A unique name for the current model. This can be reused to reload pre-trained models/data. <br> **Datatype:** string.
| `train_period_days` | **Required.** Number of days to use for the training data (width of the sliding window). <br> **Datatype:** positive integer. | `train_period_days` | **Required.** Number of days to use for the training data (width of the sliding window). <br> **Datatype:** positive integer.
| `backtest_period_days` | **Required.** Number of days to inference into the trained model before sliding the window and retraining. This can be fractional days, but beware that the user provided `timerange` will be divided by this number to yield the number of trainings necessary to complete the backtest. <br> **Datatype:** Float. | `backtest_period_days` | **Required.** Number of days to inference into the trained model before sliding the window and retraining. This can be fractional days, but beware that the user provided `timerange` will be divided by this number to yield the number of trainings necessary to complete the backtest. <br> **Datatype:** Float.
| `live_retrain_hours` | Frequency of retraining during dry/live runs. Default set to 0, which means it will retrain as often as possible. **Datatype:** Float > 0. | `live_retrain_hours` | Frequency of retraining during dry/live runs. Default set to 0, which means it will retrain as often as possible. <br> **Datatype:** Float > 0.
| `follow_mode` | If true, this instance of FreqAI will look for models associated with `identifier` and load those for inferencing. A `follower` will **not** train new models. `False` by default. <br> **Datatype:** boolean. | `follow_mode` | If true, this instance of FreqAI will look for models associated with `identifier` and load those for inferencing. A `follower` will **not** train new models. `False` by default. <br> **Datatype:** boolean.
| `startup_candles` | Number of candles needed for *backtesting only* to ensure all indicators are non NaNs at the start of the first train period. <br> **Datatype:** positive integer. | `startup_candles` | Number of candles needed for *backtesting only* to ensure all indicators are non NaNs at the start of the first train period. <br> **Datatype:** positive integer.
| `fit_live_predictions_candles` | Computes target (label) statistics from prediction data, instead of from the training data set. Number of candles is the number of historical candles it uses to generate the statistics. <br> **Datatype:** positive integer. | `fit_live_predictions_candles` | Computes target (label) statistics from prediction data, instead of from the training data set. Number of candles is the number of historical candles it uses to generate the statistics. <br> **Datatype:** positive integer.
@ -101,11 +101,11 @@ Mandatory parameters are marked as **Required**, which means that they are requi
| `include_timeframes` | A list of timeframes that all indicators in `populate_any_indicators` will be created for and added as features to the base asset feature set. <br> **Datatype:** list of timeframes (strings). | `include_timeframes` | A list of timeframes that all indicators in `populate_any_indicators` will be created for and added as features to the base asset feature set. <br> **Datatype:** list of timeframes (strings).
| `label_period_candles` | Number of candles into the future that the labels are created for. This is used in `populate_any_indicators`, refer to `templates/FreqaiExampleStrategy.py` for detailed usage. The user can create custom labels, making use of this parameter not. <br> **Datatype:** positive integer. | `label_period_candles` | Number of candles into the future that the labels are created for. This is used in `populate_any_indicators`, refer to `templates/FreqaiExampleStrategy.py` for detailed usage. The user can create custom labels, making use of this parameter not. <br> **Datatype:** positive integer.
| `include_shifted_candles` | Parameter used to add a sense of temporal recency to flattened regression type input data. `include_shifted_candles` takes all features, duplicates and shifts them by the number indicated by user. <br> **Datatype:** positive integer. | `include_shifted_candles` | Parameter used to add a sense of temporal recency to flattened regression type input data. `include_shifted_candles` takes all features, duplicates and shifts them by the number indicated by user. <br> **Datatype:** positive integer.
| `DI_threshold` | Activates the Dissimilarity Index for outlier detection when above 0, explained more [here](#removing-outliers-with-the-dissimilarity-index). <br> **Datatype:** positive float (typically below 1). | `DI_threshold` | Activates the Dissimilarity Index for outlier detection when above 0, explained in detail [here](#removing-outliers-with-the-dissimilarity-index). <br> **Datatype:** positive float (typically below 1).
| `weight_factor` | Used to set weights for training data points according to their recency, see details and a figure of how it works [here](#controlling-the-model-learning-process). <br> **Datatype:** positive float (typically below 1). | `weight_factor` | Used to set weights for training data points according to their recency, see details and a figure of how it works [here](#controlling-the-model-learning-process). <br> **Datatype:** positive float (typically below 1).
| `principal_component_analysis` | Ask FreqAI to automatically reduce the dimensionality of the data set using PCA. <br> **Datatype:** boolean. | `principal_component_analysis` | Ask FreqAI to automatically reduce the dimensionality of the data set using PCA. <br> **Datatype:** boolean.
| `use_SVM_to_remove_outliers` | Ask FreqAI to train a support vector machine to detect and remove outliers from the training data set as well as from incoming data points. <br> **Datatype:** boolean. | `use_SVM_to_remove_outliers` | Ask FreqAI to train a support vector machine to detect and remove outliers from the training data set as well as from incoming data points. <br> **Datatype:** boolean.
| `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. E.g. `nu` *Very* broadly, is the percentage of data points that should be considered outliers. `shuffle` is by default false to maintain reprodicibility. But these and all others can be added/changed in this dictionary. <br> **Datatype:** dictionary. | `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. E.g. `nu` *Very* broadly, is the percentage of data points that should be considered outliers. `shuffle` is by default false to maintain reproducibility. But these and all others can be added/changed in this dictionary. <br> **Datatype:** dictionary.
| `stratify_training_data` | This value is used to indicate the stratification of the data. e.g. 2 would set every 2nd data point into a separate dataset to be pulled from during training/testing. <br> **Datatype:** positive integer. | `stratify_training_data` | This value is used to indicate the stratification of the data. e.g. 2 would set every 2nd data point into a separate dataset to be pulled from during training/testing. <br> **Datatype:** positive integer.
| `indicator_max_period_candles` | The maximum *period* used in `populate_any_indicators()` for indicator creation. FreqAI uses this information in combination with the maximum timeframe to calculate how many data points it should download so that the first data point does not have a NaN <br> **Datatype:** positive integer. | `indicator_max_period_candles` | The maximum *period* used in `populate_any_indicators()` for indicator creation. FreqAI uses this information in combination with the maximum timeframe to calculate how many data points it should download so that the first data point does not have a NaN <br> **Datatype:** positive integer.
| `indicator_periods_candles` | A list of integers used to duplicate all indicators according to a set of periods and add them to the feature set. <br> **Datatype:** list of positive integers. | `indicator_periods_candles` | A list of integers used to duplicate all indicators according to a set of periods and add them to the feature set. <br> **Datatype:** list of positive integers.
@ -113,7 +113,7 @@ Mandatory parameters are marked as **Required**, which means that they are requi
| | **Data split parameters** | | **Data split parameters**
| `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) <br> **Datatype:** dictionary. | `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) <br> **Datatype:** dictionary.
| `test_size` | Fraction of data that should be used for testing instead of training. <br> **Datatype:** positive float below 1. | `test_size` | Fraction of data that should be used for testing instead of training. <br> **Datatype:** positive float below 1.
| `shuffle` | Shuffle the training data points during training. Typically for time-series forecasting, this is set to False. **Datatype:** boolean. | `shuffle` | Shuffle the training data points during training. Typically for time-series forecasting, this is set to False. <br> **Datatype:** boolean.
| | **Model training parameters** | | **Model training parameters**
| `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected library. For example, if the user uses `LightGBMRegressor`, then this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html). If the user selects a different model, then this dictionary can contain any parameter from that different model. <br> **Datatype:** dictionary. | `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected library. For example, if the user uses `LightGBMRegressor`, then this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html). If the user selects a different model, then this dictionary can contain any parameter from that different model. <br> **Datatype:** dictionary.
| `n_estimators` | A common parameter among regressors which sets the number of boosted trees to fit <br> **Datatype:** integer. | `n_estimators` | A common parameter among regressors which sets the number of boosted trees to fit <br> **Datatype:** integer.
@ -123,8 +123,8 @@ Mandatory parameters are marked as **Required**, which means that they are requi
| `keras` | If your model makes use of keras (typical of Tensorflow based prediction models), activate this flag so that the model save/loading follows keras standards. Default value `false` <br> **Datatype:** boolean. | `keras` | If your model makes use of keras (typical of Tensorflow based prediction models), activate this flag so that the model save/loading follows keras standards. Default value `false` <br> **Datatype:** boolean.
| `conv_width` | The width of a convolutional neural network input tensor. This replaces the need for `shift` by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. Default value, 2 <br> **Datatype:** integer. | `conv_width` | The width of a convolutional neural network input tensor. This replaces the need for `shift` by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. Default value, 2 <br> **Datatype:** integer.
### Important FreqAI dataframe key patterns ### Important FreqAI dataframe key patterns
Here are the values the user can expect to include/use inside the typical strategy dataframe (`df[]`): Here are the values the user can expect to include/use inside the typical strategy dataframe (`df[]`):
| DataFrame Key | Description | | DataFrame Key | Description |
@ -137,51 +137,50 @@ Here are the values the user can expect to include/use inside the typical strate
### Example config file ### Example config file
The user interface is isolated to the typical config file. A typical FreqAI The user interface is isolated to the typical config file. A typical FreqAI config setup could include:
config setup could include:
```json ```json
"freqai": { "freqai": {
"startup_candles": 10000, "startup_candles": 10000,
"purge_old_models": true, "purge_old_models": true,
"train_period_days" : 30, "train_period_days": 30,
"backtest_period_days" : 7, "backtest_period_days": 7,
"identifier" : "unique-id", "identifier" : "unique-id",
"feature_parameters" : { "feature_parameters" : {
"include_timeframes" : ["5m","15m","4h"], "include_timeframes": ["5m","15m","4h"],
"include_corr_pairlist": [ "include_corr_pairlist": [
"ETH/USD", "ETH/USD",
"LINK/USD", "LINK/USD",
"BNB/USD" "BNB/USD"
], ],
"label_period_candles": 24, "label_period_candles": 24,
"include_shifted_candles": 2, "include_shifted_candles": 2,
"weight_factor": 0, "weight_factor": 0,
"indicator_max_period_candles": 20, "indicator_max_period_candles": 20,
"indicator_periods_candles": [10, 20] "indicator_periods_candles": [10, 20]
}, },
"data_split_parameters" : { "data_split_parameters" : {
"test_size": 0.25, "test_size": 0.25,
"random_state": 42 "random_state": 42
}, },
"model_training_parameters" : { "model_training_parameters" : {
"n_estimators": 100, "n_estimators": 100,
"random_state": 42, "random_state": 42,
"learning_rate": 0.02, "learning_rate": 0.02,
"task_type": "CPU", "task_type": "CPU",
}, },
} }
``` ```
### Feature engineering ### Feature engineering
Features are added by the user inside the `populate_any_indicators()` method of the strategy Features are added by the user inside the `populate_any_indicators()` method of the strategy
by prepending indicators with `%` and labels are added by prepending `&`. by prepending indicators with `%` and labels are added by prepending `&`.
There are some important components/structures that the user *must* include when building their feature set. There are some important components/structures that the user *must* include when building their feature set.
As shown below, `with self.freqai.lock:` must be used to ensure thread safety - especially when using third As shown below, `with self.freqai.lock:` must be used to ensure thread safety - especially when using third
party libraries for indicator construction such as TA-lib. party libraries for indicator construction such as TA-lib.
Another structure to consider is the location of the labels at the bottom of the example function (below `if set_generalized_indicators:`). Another structure to consider is the location of the labels at the bottom of the example function (below `if set_generalized_indicators:`).
This is where the user will add single features and labels to their feature set to avoid duplication from This is where the user will add single features and labels to their feature set to avoid duplication from
various configuration parameters which multiply the feature set such as `include_timeframes`. various configuration parameters which multiply the feature set such as `include_timeframes`.
```python ```python
@ -275,42 +274,9 @@ The user of the present example does not wish to pass the `bb_lowerband` as a fe
and has therefore not prepended it with `%`. The user does, however, wish to pass `bb_width` to the and has therefore not prepended it with `%`. The user does, however, wish to pass `bb_width` to the
model for training/prediction and has therefore prepended it with `%`. model for training/prediction and has therefore prepended it with `%`.
Note: features **must** be defined in `populate_any_indicators()`. Making features in `populate_indicators()` The `include_timeframes` from the example config above are the timeframes (`tf`) of each call to `populate_any_indicators()`
will fail in live/dry mode. If the user wishes to add generalized features that are not associated with included metric for inclusion in the feature set. In the present case, the user is asking for the
a specific pair or timeframe, they should use the following structure inside `populate_any_indicators()` `5m`, `15m`, and `4h` timeframes of the `rsi`, `mfi`, `roc`, and `bb_width` to be included in the feature set.
(as exemplified in `freqtrade/templates/FreqaiExampleStrategy.py`:
```python
def populate_any_indicators(self, metadata, pair, df, tf, informative=None, coin="", set_generalized_indicators=False):
...
# Add generalized indicators here (because in live, it will call only this function to populate
# indicators for retraining). Notice how we ensure not to add them multiple times by associating
# these generalized indicators to the basepair/timeframe
if set_generalized_indicators:
df['%-day_of_week'] = (df["date"].dt.dayofweek + 1) / 7
df['%-hour_of_day'] = (df['date'].dt.hour + 1) / 25
# user adds targets here by prepending them with &- (see convention below)
# If user wishes to use multiple targets, a multioutput prediction model
# needs to be used such as templates/CatboostPredictionMultiModel.py
df["&-s_close"] = (
df["close"]
.shift(-self.freqai_info["feature_parameters"]["label_period_candles"])
.rolling(self.freqai_info["feature_parameters"]["label_period_candles"])
.mean()
/ df["close"]
- 1
)
```
(Please see the example script located in `freqtrade/templates/FreqaiExampleStrategy.py` for a full example of `populate_any_indicators()`)
The `include_timeframes` from the example config above are the timeframes of each `populate_any_indicator()`
included metric for inclusion in the feature set. In the present case, the user is asking for the
`5m`, `15m`, and `4h` timeframes of the `rsi`, `mfi`, `roc`, and `bb_width` to be included
in the feature set.
In addition, the user can ask for each of these features to be included from In addition, the user can ask for each of these features to be included from
informative pairs using the `include_corr_pairlist`. This means that the present feature informative pairs using the `include_corr_pairlist`. This means that the present feature
@ -324,7 +290,40 @@ FreqAI to include the the past 2 candles for each of the features included in th
In total, the number of features the present user has created is: In total, the number of features the present user has created is:
length of `include_timeframes` * no. features in `populate_any_indicators()` * length of `include_corr_pairlist` * no. `include_shifted_candles` * length of `indicator_periods_candles` length of `include_timeframes` * no. features in `populate_any_indicators()` * length of `include_corr_pairlist` * no. `include_shifted_candles` * length of `indicator_periods_candles`
_3 * 3 * 3 * 2 * 2 = 108_. $3 * 3 * 3 * 2 * 2 = 108$.
!!! Note
Features **must** be defined in `populate_any_indicators()`. Making features in `populate_indicators()`
will fail in live/dry mode. If the user wishes to add generalized features that are not associated with
a specific pair or timeframe, they should use the following structure inside `populate_any_indicators()`
(as exemplified in `freqtrade/templates/FreqaiExampleStrategy.py`:
```python
def populate_any_indicators(self, metadata, pair, df, tf, informative=None, coin="", set_generalized_indicators=False):
...
# Add generalized indicators here (because in live, it will call only this function to populate
# indicators for retraining). Notice how we ensure not to add them multiple times by associating
# these generalized indicators to the basepair/timeframe
if set_generalized_indicators:
df['%-day_of_week'] = (df["date"].dt.dayofweek + 1) / 7
df['%-hour_of_day'] = (df['date'].dt.hour + 1) / 25
# user adds targets here by prepending them with &- (see convention below)
# If user wishes to use multiple targets, a multioutput prediction model
# needs to be used such as templates/CatboostPredictionMultiModel.py
df["&-s_close"] = (
df["close"]
.shift(-self.freqai_info["feature_parameters"]["label_period_candles"])
.rolling(self.freqai_info["feature_parameters"]["label_period_candles"])
.mean()
/ df["close"]
- 1
)
```
(Please see the example script located in `freqtrade/templates/FreqaiExampleStrategy.py` for a full example of `populate_any_indicators()`)
### Deciding the sliding training window and backtesting duration ### Deciding the sliding training window and backtesting duration
@ -370,12 +369,11 @@ Backtesting mode requires the user to have the data pre-downloaded (unlike dry/l
If this command has never been executed with the existing config file, then it will train a new model If this command has never been executed with the existing config file, then it will train a new model
for each pair, for each backtesting window within the bigger `--timerange`. for each pair, for each backtesting window within the bigger `--timerange`.
---
!!! Note "Model reuse" !!! Note "Model reuse"
Once the training is completed, the user can execute this again with the same config file and Once the training is completed, the user can execute this again with the same config file and
FreqAI will find the trained models and load them instead of spending time training. This is useful FreqAI will find the trained models and load them instead of spending time training. This is useful
if the user wants to tweak (or even hyperopt) buy and sell criteria inside the strategy. IF the user if the user wants to tweak (or even hyperopt) buy and sell criteria inside the strategy. IF the user
*wants* to retrain a new model with the same config file, then he/she should simply change the `identifier`. *wants* to retrain a new model with the same config file, then he/she should simply change the `identifier`.
This way, the user can return to using any model they wish by simply changing the `identifier`. This way, the user can return to using any model they wish by simply changing the `identifier`.
--- ---
@ -430,7 +428,7 @@ The FreqAI strategy requires the user to include the following lines of code in
:param coin: the name of the coin which will modify the feature names. :param coin: the name of the coin which will modify the feature names.
""" """
coint = pair.split('/')[0] coin = pair.split('/')[0]
with self.freqai.lock: with self.freqai.lock:
if informative is None: if informative is None:
@ -480,7 +478,7 @@ The FreqAI strategy requires the user to include the following lines of code in
``` ```
Notice how the `populate_any_indicators()` is where the user adds their own features and labels (more information [here](#feature-engineering)). See a full example at `templates/FreqaiExampleStrategy.py`. Notice how the `populate_any_indicators()` is where the user adds their own features and labels ([more information](#feature-engineering)). See a full example at `templates/FreqaiExampleStrategy.py`.
### Setting classifier targets ### Setting classifier targets
@ -490,7 +488,6 @@ FreqAI includes a the `CatboostClassifier` via the flag `--freqaimodel CatboostC
df['&s-up_or_down'] = np.where( df["close"].shift(-100) > df["close"], 'up', 'down') df['&s-up_or_down'] = np.where( df["close"].shift(-100) > df["close"], 'up', 'down')
``` ```
### Running the model live ### Running the model live
FreqAI can be run dry/live using the following command FreqAI can be run dry/live using the following command
@ -499,8 +496,8 @@ FreqAI can be run dry/live using the following command
freqtrade trade --strategy FreqaiExampleStrategy --config config_freqai.example.json --freqaimodel LightGBMRegressor freqtrade trade --strategy FreqaiExampleStrategy --config config_freqai.example.json --freqaimodel LightGBMRegressor
``` ```
By default, FreqAI will not find find any existing models and will start by training a new one By default, FreqAI will not find any existing models and will start by training a new one
given the user configuration settings. Following training, it will use that model to make predictions on incoming candles until a new model is available. New models are typically generated as often as possible, with FreqAI managing an internal queue of the pairs to try and keep all models equally "young." FreqAI will always use the newest trained model to make predictions on incoming live data. If users do not want FreqAI to retrain new models as often as possible, they can set `live_retrain_hours` to tell FreqAI to wait at least that number of hours before retraining a new model. Additionally, users can set `expired_hours` to tell FreqAI to avoid making predictions on models aged over this number of hours. given the user configuration settings. Following training, it will use that model to make predictions on incoming candles until a new model is available. New models are typically generated as often as possible, with FreqAI managing an internal queue of the pairs to try and keep all models equally "young." FreqAI will always use the newest trained model to make predictions on incoming live data. If users do not want FreqAI to retrain new models as often as possible, they can set `live_retrain_hours` to tell FreqAI to wait at least that number of hours before retraining a new model. Additionally, users can set `expired_hours` to tell FreqAI to avoid making predictions on models aged over this number of hours.
If the user wishes to start dry/live from a backtested saved model, the user only needs to reuse If the user wishes to start dry/live from a backtested saved model, the user only needs to reuse
the same `identifier` parameter the same `identifier` parameter
@ -514,7 +511,7 @@ the same `identifier` parameter
In this case, although FreqAI will initiate with a In this case, although FreqAI will initiate with a
pre-trained model, it will still check to see how much time has elapsed since the model was trained, pre-trained model, it will still check to see how much time has elapsed since the model was trained,
and if a full `live_retrain_hours` has elapsed since the end of the loaded model, FreqAI will self retrain. and if a full `live_retrain_hours` has elapsed since the end of the loaded model, FreqAI will self retrain.
## Data analysis techniques ## Data analysis techniques
@ -522,7 +519,7 @@ and if a full `live_retrain_hours` has elapsed since the end of the loaded model
Model training parameters are unique to the ML library used by the user. FreqAI allows users to set any parameter for any library using the `model_training_parameters` dictionary in the user configuration file. The example configuration files show some of the example parameters associated with `Catboost` and `LightGBM`, but users can add any parameters available in those libraries. Model training parameters are unique to the ML library used by the user. FreqAI allows users to set any parameter for any library using the `model_training_parameters` dictionary in the user configuration file. The example configuration files show some of the example parameters associated with `Catboost` and `LightGBM`, but users can add any parameters available in those libraries.
Data split parameters are defined in `data_split_parameters` which can be any parameters associated with `Sklearn`'s `train_test_split()` function. Meanwhile, FreqAI includes some additional parameters such `weight_factor` which allows the user to weight more recent data more strongly Data split parameters are defined in `data_split_parameters` which can be any parameters associated with `Sklearn`'s `train_test_split()` function. FreqAI includes some additional parameters such `weight_factor` which allows the user to weight more recent data more strongly
than past data via an exponential function: than past data via an exponential function:
$$ W_i = \exp(\frac{-i}{\alpha*n}) $$ $$ W_i = \exp(\frac{-i}{\alpha*n}) $$
@ -545,8 +542,8 @@ data point and all other training data points:
$$ d_{ab} = \sqrt{\sum_{j=1}^p(X_{a,j}-X_{b,j})^2} $$ $$ d_{ab} = \sqrt{\sum_{j=1}^p(X_{a,j}-X_{b,j})^2} $$
where $d_{ab}$ is the distance between the normalized points $a$ and $b$. $p$ where $d_{ab}$ is the distance between the normalized points $a$ and $b$. $p$
is the number of features i.e. the length of the vector $X$. The is the number of features i.e. the length of the vector $X$.
characteristic distance, $\overline{d}$ for a set of training data points is simply the mean The characteristic distance, $\overline{d}$ for a set of training data points is simply the mean
of the average distances: of the average distances:
$$ \overline{d} = \sum_{a=1}^n(\sum_{b=1}^n(d_{ab}/n)/n) $$ $$ \overline{d} = \sum_{a=1}^n(\sum_{b=1}^n(d_{ab}/n)/n) $$
@ -574,8 +571,7 @@ to low levels of certainty. Activating the Dissimilarity Index can be achieved w
} }
``` ```
The user can tweak the DI with `DI_threshold` to increase or decrease the extrapolation of the The user can tweak the DI with `DI_threshold` to increase or decrease the extrapolation of the trained model.
trained model.
### Reducing data dimensionality with Principal Component Analysis ### Reducing data dimensionality with Principal Component Analysis
@ -609,7 +605,7 @@ FreqAI will train an SVM on the training data (or components if the user activat
### Clustering the training data and removing outliers with DBSCAN ### Clustering the training data and removing outliers with DBSCAN
The user can tell FreqAI to use DBSCAN to cluster training data and remove outliers from the training data set. The user activates `use_DBSCAN_to_remove_outliers` to cluster training data for identification of outliers. Also used to detect incoming outliers for prediction data points. The user can configure FreqAI to use DBSCAN to cluster training data and remove outliers from the training data set. The user activates `use_DBSCAN_to_remove_outliers` to cluster training data for identification of outliers. Also used to detect incoming outliers for prediction data points.
```json ```json
"freqai": { "freqai": {
@ -655,7 +651,7 @@ The follower will load models created by the leader and inference them to obtain
FreqAI stores new model files each time it retrains. These files become obsolete as new models FreqAI stores new model files each time it retrains. These files become obsolete as new models
are trained and FreqAI adapts to the new market conditions. Users planning to leave FreqAI running are trained and FreqAI adapts to the new market conditions. Users planning to leave FreqAI running
for extended periods of time with high frequency retraining should set `purge_old_models` in their for extended periods of time with high frequency retraining should set `purge_old_models` in their
config: config:
```json ```json
@ -694,7 +690,7 @@ By default, FreqAI computes this based on training data and it assumes the label
These are big assumptions that the user should consider when creating their labels. If the user wants to consider the population These are big assumptions that the user should consider when creating their labels. If the user wants to consider the population
of *historical predictions* for creating the dynamic target instead of the trained labels, the user of *historical predictions* for creating the dynamic target instead of the trained labels, the user
can do so by setting `fit_live_prediction_candles` to the number of historical prediction candles can do so by setting `fit_live_prediction_candles` to the number of historical prediction candles
the user wishes to use to generate target statistics. the user wishes to use to generate target statistics.
```json ```json
"freqai": { "freqai": {
@ -703,17 +699,17 @@ the user wishes to use to generate target statistics.
``` ```
If the user sets this value, FreqAI will initially use the predictions from the training data set If the user sets this value, FreqAI will initially use the predictions from the training data set
and then subsequently begin introducing real prediction data as it is generated. FreqAI will save and then subsequently begin introducing real prediction data as it is generated. FreqAI will save
this historical data to be reloaded if the user stops and restarts with the same `identifier`. this historical data to be reloaded if the user stops and restarts with the same `identifier`.
## Extra returns per train ## Extra returns per train
Users may find that there are some important metrics that they'd like to return to the strategy at the end of each retrain. Users may find that there are some important metrics that they'd like to return to the strategy at the end of each retrain.
Users can include these metrics by assigining them to `dk.data['extra_returns_per_train']['my_new_value'] = XYZ` inside their custom prediction Users can include these metrics by assigning them to `dk.data['extra_returns_per_train']['my_new_value'] = XYZ` inside their custom prediction
model class. FreqAI takes the `my_new_value` assigned in this dictionary and expands it to fit the return dataframe to the strategy. model class. FreqAI takes the `my_new_value` assigned in this dictionary and expands it to fit the return dataframe to the strategy.
The user can then use the value in the strategy with `dataframe['my_new_value']`. An example of how this is already used in FreqAI is The user can then use the value in the strategy with `dataframe['my_new_value']`. An example of how this is already used in FreqAI is
the `&*_mean` and `&*_std` values, which indicate the mean and standard deviation of that particular label during the most recent training. the `&*_mean` and `&*_std` values, which indicate the mean and standard deviation of that particular label during the most recent training.
Another example is shown below if the user wants to use live metrics from the trade databse. Another example is shown below if the user wants to use live metrics from the trade database.
The user needs to set the standard dictionary in the config so FreqAI can return proper dataframe shapes: The user needs to set the standard dictionary in the config so FreqAI can return proper dataframe shapes:
@ -726,10 +722,9 @@ The user needs to set the standard dictionary in the config so FreqAI can return
These values will likely be overridden by the user prediction model, but in the case where the user model has yet to set them, or needs These values will likely be overridden by the user prediction model, but in the case where the user model has yet to set them, or needs
a default initial value - this is the value that will be returned. a default initial value - this is the value that will be returned.
## Building an IFreqaiModel ## Building an IFreqaiModel
FreqAI has multiple example prediction model based libraries such as `Catboost` regression (`freqai/prediction_models/CatboostRegressor.py`) and `LightGBM` regression. FreqAI has multiple example prediction model based libraries such as `Catboost` regression (`freqai/prediction_models/CatboostRegressor.py`) and `LightGBM` regression.
However, users can customize and create their own prediction models using the `IFreqaiModel` class. However, users can customize and create their own prediction models using the `IFreqaiModel` class.
Users are encouraged to inherit `train()` and `predict()` to let them customize various aspects of their training procedures. Users are encouraged to inherit `train()` and `predict()` to let them customize various aspects of their training procedures.
@ -755,8 +750,8 @@ This file structure is heavily controlled and read by the `FreqaiDataKitchen()`
and should therefore not be modified. and should therefore not be modified.
## Credits ## Credits
FreqAI was developed by a group of individuals who all contributed specific skillsets to the
project. FreqAI was developed by a group of individuals who all contributed specific skillsets to the project.
Conception and software development: Conception and software development:
Robert Caulk @robcaulk Robert Caulk @robcaulk