From f5535e780cb04247b28c43449a0fac01d37d4782 Mon Sep 17 00:00:00 2001 From: robcaulk Date: Sun, 25 Sep 2022 21:18:51 +0200 Subject: [PATCH] change wording, switch FreqAI word format --- docs/freqai-configuration.md | 26 +++++++++++------------ docs/freqai-developers.md | 6 +++--- docs/freqai-feature-engineering.md | 34 ++++++++++++++++-------------- docs/freqai-parameter-table.md | 19 +++++++++-------- docs/freqai-running.md | 2 +- docs/freqai.md | 23 +++++++++++--------- 6 files changed, 58 insertions(+), 52 deletions(-) diff --git a/docs/freqai-configuration.md b/docs/freqai-configuration.md index de77ce92b..68d34eddf 100644 --- a/docs/freqai-configuration.md +++ b/docs/freqai-configuration.md @@ -1,10 +1,10 @@ # Configuration -FreqAI is configured through the typical [Freqtrade config file](configuration.md) and the standard [Freqtrade strategy](strategy-customization.md). Examples of a FreqAI config and strategy file can be found in `config_examples/config_freqai.example.json` and `freqtrade/templates/FreqaiExampleStrategy.py`, respectively. +`FreqAI` is configured through the typical [Freqtrade config file](configuration.md) and the standard [Freqtrade strategy](strategy-customization.md). Examples of `FreqAI` config and strategy files can be found in `config_examples/config_freqai.example.json` and `freqtrade/templates/FreqaiExampleStrategy.py`, respectively. ## Setting up the configuration file - Although there are plenty of additional parameters to choose from, as highlighted in the [parameter table](freqai-parameter-table.md#parameter-table), a FreqAI config must at minimum include the following parameters (the parameter values are only examples): + Although there are plenty of additional parameters to choose from, as highlighted in the [parameter table](freqai-parameter-table.md#parameter-table), a `FreqAI` config must at minimum include the following parameters (the parameter values are only examples): ```json "freqai": { @@ -34,9 +34,9 @@ FreqAI is configured through the typical [Freqtrade config file](configuration.m ``` A full example config is available in `config_examples/config_freqai.example.json`. -## Building a FreqAI strategy +## Building a `FreqAI` strategy -The FreqAI strategy requires including the following lines of code in the standard [Freqtrade strategy](strategy-customization.md): +The `FreqAI` strategy requires including the following lines of code in the standard [Freqtrade strategy](strategy-customization.md): ```python # user should define the maximum startup candle count (the largest number of candles @@ -128,7 +128,7 @@ Notice also the location of the labels under `if set_generalized_indicators:` at The `self.freqai.start()` function cannot be called outside the `populate_indicators()`. !!! Note - Features **must** be defined in `populate_any_indicators()`. Defining FreqAI features in `populate_indicators()` + Features **must** be defined in `populate_any_indicators()`. Defining `FreqAI` features in `populate_indicators()` will cause the algorithm to fail in live/dry mode. In order to add generalized features that are not associated with a specific pair or timeframe, the following structure inside `populate_any_indicators()` should be used (as exemplified in `freqtrade/templates/FreqaiExampleStrategy.py`): @@ -165,14 +165,14 @@ Below are the values you can expect to include/use inside a typical strategy dat | DataFrame Key | Description | |------------|-------------| -| `df['&*']` | Any dataframe column prepended with `&` in `populate_any_indicators()` is treated as a training target (label) inside FreqAI (typically following the naming convention `&-s*`). The names of these dataframe columns are fed back as the predictions. For example, to predict the price change in the next 40 candles (similar to `templates/FreqaiExampleStrategy.py`), you would set `df['&-s_close']`. FreqAI makes the predictions and gives them back under the same key (`df['&-s_close']`) to be used in `populate_entry/exit_trend()`.
**Datatype:** Depends on the output of the model. +| `df['&*']` | Any dataframe column prepended with `&` in `populate_any_indicators()` is treated as a training target (label) inside `FreqAI` (typically following the naming convention `&-s*`). The names of these dataframe columns are fed back as the predictions. For example, to predict the price change in the next 40 candles (similar to `templates/FreqaiExampleStrategy.py`), you would set `df['&-s_close']`. `FreqAI` makes the predictions and gives them back under the same key (`df['&-s_close']`) to be used in `populate_entry/exit_trend()`.
**Datatype:** Depends on the output of the model. | `df['&*_std/mean']` | Standard deviation and mean values of the defined labels during training (or live tracking with `fit_live_predictions_candles`). Commonly used to understand the rarity of a prediction (use the z-score as shown in `templates/FreqaiExampleStrategy.py` and explained [here](#creating-a-dynamic-target-threshold) to evaluate how often a particular prediction was observed during training or historically with `fit_live_predictions_candles`).
**Datatype:** Float. -| `df['do_predict']` | Indication of an outlier data point. The return value is integer between -1 and 2, which lets you know if the prediction is trustworthy or not. `do_predict==1` means that the prediction is trustworthy. If the Dissimilarity Index (DI, see details [here](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di)) of the input data point is above the threshold defined in the config, FreqAI will subtract 1 from `do_predict`, resulting in `do_predict==0`. If `use_SVM_to_remove_outliers()` is active, the Support Vector Machine (SVM, see details [here](freqai-feature-engineering.md#identifying-outliers-using-a-support-vector-machine-svm)) may also detect outliers in training and prediction data. In this case, the SVM will also subtract 1 from `do_predict`. If the input data point was considered an outlier by the SVM but not by the DI, or vice versa, the result will be `do_predict==0`. If both the DI and the SVM considers the input data point to be an outlier, the result will be `do_predict==-1`. A particular case is when `do_predict == 2`, which means that the model has expired due to exceeding `expired_hours`.
**Datatype:** Integer between -1 and 2. -| `df['DI_values']` | Dissimilarity Index (DI) values are proxies for the level of confidence FreqAI has in the prediction. A lower DI means the prediction is close to the training data, i.e., higher prediction confidence. See details about the DI [here](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di).
**Datatype:** Float. -| `df['%*']` | Any dataframe column prepended with `%` in `populate_any_indicators()` is treated as a training feature. For example, you can include the RSI in the training feature set (similar to in `templates/FreqaiExampleStrategy.py`) by setting `df['%-rsi']`. See more details on how this is done [here](freqai-feature-engineering.md).
**Note:** Since the number of features prepended with `%` can multiply very quickly (10s of thousands of features is easily engineered using the multiplictative functionality described in the `feature_parameters` table shown above), these features are removed from the dataframe upon return from FreqAI. To keep a particular type of feature for plotting purposes, you would prepend it with `%%`.
**Datatype:** Depends on the output of the model. +| `df['do_predict']` | Indication of an outlier data point. The return value is integer between -1 and 2, which lets you know if the prediction is trustworthy or not. `do_predict==1` means that the prediction is trustworthy. If the Dissimilarity Index (DI, see details [here](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di)) of the input data point is above the threshold defined in the config, `FreqAI` will subtract 1 from `do_predict`, resulting in `do_predict==0`. If `use_SVM_to_remove_outliers()` is active, the Support Vector Machine (SVM, see details [here](freqai-feature-engineering.md#identifying-outliers-using-a-support-vector-machine-svm)) may also detect outliers in training and prediction data. In this case, the SVM will also subtract 1 from `do_predict`. If the input data point was considered an outlier by the SVM but not by the DI, or vice versa, the result will be `do_predict==0`. If both the DI and the SVM considers the input data point to be an outlier, the result will be `do_predict==-1`. A particular case is when `do_predict == 2`, which means that the model has expired due to exceeding `expired_hours`.
**Datatype:** Integer between -1 and 2. +| `df['DI_values']` | Dissimilarity Index (DI) values are proxies for the level of confidence `FreqAI` has in the prediction. A lower DI means the prediction is close to the training data, i.e., higher prediction confidence. See details about the DI [here](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di).
**Datatype:** Float. +| `df['%*']` | Any dataframe column prepended with `%` in `populate_any_indicators()` is treated as a training feature. For example, you can include the RSI in the training feature set (similar to in `templates/FreqaiExampleStrategy.py`) by setting `df['%-rsi']`. See more details on how this is done [here](freqai-feature-engineering.md).
**Note:** Since the number of features prepended with `%` can multiply very quickly (10s of thousands of features is easily engineered using the multiplictative functionality described in the `feature_parameters` table shown above), these features are removed from the dataframe upon return from `FreqAI`. To keep a particular type of feature for plotting purposes, you would prepend it with `%%`.
**Datatype:** Depends on the output of the model. ## Setting the `startup_candle_count` -The `startup_candle_count` in the FreqAI strategy needs to be set up in the same way as in the standard Freqtrade strategy (see details [here](strategy-customization.md#strategy-startup-period)). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling the `dataprovider`, to avoid any NaNs at the beginning of the first training. You can easily set this value by identifying the longest period (in candle units) which is passed to the indicator creation functions (e.g., talib functions). In the presented example, `startup_candle_count` is 20 since this is the maximum value in `indicators_periods_candles`. +The `startup_candle_count` in the `FreqAI` strategy needs to be set up in the same way as in the standard Freqtrade strategy (see details [here](strategy-customization.md#strategy-startup-period)). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling the `dataprovider`, to avoid any NaNs at the beginning of the first training. You can easily set this value by identifying the longest period (in candle units) which is passed to the indicator creation functions (e.g., talib functions). In the presented example, `startup_candle_count` is 20 since this is the maximum value in `indicators_periods_candles`. !!! Note There are instances where the talib functions actually require more data than just the passed `period` or else the feature dataset gets populated with NaNs. Anecdotally, multiplying the `startup_candle_count` by 2 always leads to a fully NaN free training dataset. Hence, it is typically safest to multiply the expected `startup_candle_count` by 2. Look out for this log message to confirm that the data is clean: @@ -183,7 +183,7 @@ The `startup_candle_count` in the FreqAI strategy needs to be set up in the same ## Creating a dynamic target threshold -Deciding when to enter or exit a trade can be done in a dynamic way to reflect current market conditions. FreqAI allows you to return additional information from the training of a model (more info [here](freqai-feature-engineering.md#returning-additional-info-from-training)). For example, the `&*_std/mean` return values describe the statistical distribution of the target/label *during the most recent training*. Comparing a given prediction to these values allows you to know the rarity of the prediction. In `templates/FreqaiExampleStrategy.py`, the `target_roi` and `sell_roi` are defined to be 1.25 z-scores away from the mean which causes predictions that are closer to the mean to be filtered out. +Deciding when to enter or exit a trade can be done in a dynamic way to reflect current market conditions. `FreqAI` allows you to return additional information from the training of a model (more info [here](freqai-feature-engineering.md#returning-additional-info-from-training)). For example, the `&*_std/mean` return values describe the statistical distribution of the target/label *during the most recent training*. Comparing a given prediction to these values allows you to know the rarity of the prediction. In `templates/FreqaiExampleStrategy.py`, the `target_roi` and `sell_roi` are defined to be 1.25 z-scores away from the mean which causes predictions that are closer to the mean to be filtered out. ```python dataframe["target_roi"] = dataframe["&-s_close_mean"] + dataframe["&-s_close_std"] * 1.25 @@ -198,11 +198,11 @@ To consider the population of *historical predictions* for creating the dynamic } ``` -If this value is set, FreqAI will initially use the predictions from the training data and subsequently begin introducing real prediction data as it is generated. FreqAI will save this historical data to be reloaded if you stop and restart a model with the same `identifier`. +If this value is set, `FreqAI` will initially use the predictions from the training data and subsequently begin introducing real prediction data as it is generated. `FreqAI` will save this historical data to be reloaded if you stop and restart a model with the same `identifier`. ## Using different prediction models -FreqAI has multiple example prediction model libraries that are ready to be used as is via the flag `--freqaimodel`. These libraries include `Catboost`, `LightGBM`, and `XGBoost` regression, classification, and multi-target models, and can be found in `freqai/prediction_models/`. However, it is possible to customize and create your own prediction models using the `IFreqaiModel` class. You are encouraged to inherit `fit()`, `train()`, and `predict()` to let these customize various aspects of the training procedures. +`FreqAI` has multiple example prediction model libraries that are ready to be used as is via the flag `--freqaimodel`. These libraries include `Catboost`, `LightGBM`, and `XGBoost` regression, classification, and multi-target models, and can be found in `freqai/prediction_models/`. However, it is possible to customize and create your own prediction models using the `IFreqaiModel` class. You are encouraged to inherit `fit()`, `train()`, and `predict()` to let these customize various aspects of the training procedures. ### Setting classifier targets diff --git a/docs/freqai-developers.md b/docs/freqai-developers.md index 97c0938b4..1635e785d 100644 --- a/docs/freqai-developers.md +++ b/docs/freqai-developers.md @@ -2,13 +2,13 @@ ## Project architecture -The architechture and functions of FreqAI are generalized to encourages development of unique features, functions, models, etc. +The architechture and functions of `FreqAI` are generalized to encourages development of unique features, functions, models, etc. The class structure and a detailed algorithmic overview is depicted in the following diagram: ![image](assets/freqai_algorithm-diagram.jpg) -As shown, there are three distinct objects comprising FreqAI: +As shown, there are three distinct objects comprising `FreqAI`: * **IFreqaiModel** - A singular persistent object containing all the necessary logic to collect, store, and process data, engineer features, run training, and inference models. * **FreqaiDataKitchen** - A non-persistent object which is created uniquely for each unique asset/model. Beyond metadata, it also contains a variety of data processing tools. @@ -27,7 +27,7 @@ The file structure is automatically generated based on the model `identifier` se | Structure | Description | |-----------|-------------| | `config_*.json` | A copy of the model specific configuration file. | -| `historic_predictions.pkl` | A file containing all historic predictions generated during the lifetime of the `identifier` model during live deployment. `historic_predictions.pkl` is used to reload the model after a crash or a config change. A backup file is always held incase of corruption on the main file. **FreqAI automatically detects corruption and replaces the corrupted file with the backup**. | +| `historic_predictions.pkl` | A file containing all historic predictions generated during the lifetime of the `identifier` model during live deployment. `historic_predictions.pkl` is used to reload the model after a crash or a config change. A backup file is always held incase of corruption on the main file. **`FreqAI` automatically detects corruption and replaces the corrupted file with the backup**. | | `pair_dictionary.json` | A file containing the training queue as well as the on disk location of the most recently trained model. | | `sub-train-*_TIMESTAMP` | A folder containing all the files associated with a single model, such as:
|| `*_metadata.json` - Metadata for the model, such as normalization max/mins, expected training feature list, etc.
diff --git a/docs/freqai-feature-engineering.md b/docs/freqai-feature-engineering.md index 7dd41fd88..34c9c8443 100644 --- a/docs/freqai-feature-engineering.md +++ b/docs/freqai-feature-engineering.md @@ -2,9 +2,11 @@ ## Defining the features -Feature engineering is handled within `"feature_parameters":{}` in the `FreqAI` config and strategy. All `base features` you wish to use, such as, e.g., `RSI`, `MFI`, `EMA`, `SMA`, etc., should be added to the strategy. The `base features` can be custom indicators or they can be imported from any technical-analysis library that you can find. The `base features` are added inside the `populate_any_indicators()` method of the strategy by prepending indicators with `%`, and labels with `&`. +Low level feature engineering is performed in the user strategy within a function called `populate_any_indicators()`. That function sets the `base features` such as, `RSI`, `MFI`, `EMA`, `SMA`, time of day, volume, etc. The `base features` can be custom indicators or they can be imported from any technical-analysis library that you can find. One important syntax rule is that all `base features` string names are prepended with `%`, while labels/targets are prepended with `&`. -It is advisable to start from the `populate_any_indicators()` in the example strategy (found in `templates/FreqaiExampleStrategy.py`) to ensure that the feature definitions are following the correct conventions. Here is an example of how to set the indicators and labels in the strategy: +Meanwhile, high level feature engineering is handled within `"feature_parameters":{}` in the `FreqAI` config. Within this file, it is possible to decide large scale feature expansions on top of the `base_features` such as "including correlated pairs" or "including informative timeframes" or even "including recent candles." + +It is advisable to start from the template `populate_any_indicators()` in the source provided example strategy (found in `templates/FreqaiExampleStrategy.py`) to ensure that the feature definitions are following the correct conventions. Here is an example of how to set the indicators and labels in the strategy: ```python def populate_any_indicators( @@ -129,7 +131,7 @@ In total, the number of features the user of the presented example strat has cre Important metrics can be returned to the strategy at the end of each model training by assigning them to `dk.data['extra_returns_per_train']['my_new_value'] = XYZ` inside the custom prediction model class. -FreqAI takes the `my_new_value` assigned in this dictionary and expands it to fit the dataframe that is returned to the strategy. You can then use the returned metrics in your strategy through `dataframe['my_new_value']`. An example of how return values can be used in FreqAI are the `&*_mean` and `&*_std` values that are used to [created a dynamic target threshold](freqai-configuration.md#creating-a-dynamic-target-threshold). +`FreqAI` takes the `my_new_value` assigned in this dictionary and expands it to fit the dataframe that is returned to the strategy. You can then use the returned metrics in your strategy through `dataframe['my_new_value']`. An example of how return values can be used in `FreqAI` are the `&*_mean` and `&*_std` values that are used to [created a dynamic target threshold](freqai-configuration.md#creating-a-dynamic-target-threshold). Another example, where the user wants to use live metrics from the trade database, is shown below: @@ -139,15 +141,15 @@ Another example, where the user wants to use live metrics from the trade databas } ``` -You need to set the standard dictionary in the config so that FreqAI can return proper dataframe shapes. These values will likely be overridden by the prediction model, but in the case where the model has yet to set them, or needs a default initial value, the preset values are what will be returned. +You need to set the standard dictionary in the config so that `FreqAI` can return proper dataframe shapes. These values will likely be overridden by the prediction model, but in the case where the model has yet to set them, or needs a default initial value, the preset values are what will be returned. ## Feature normalization -FreqAI is strict when it comes to data normalization. The train features, $X^{train}$, are always normalized to [-1, 1] using a shifted min-max normalization: +`FreqAI` is strict when it comes to data normalization. The train features, $X^{train}$, are always normalized to [-1, 1] using a shifted min-max normalization: $$X^{train}_{norm} = 2 * \frac{X^{train} - X^{train}.min()}{X^{train}.max() - X^{train}.min()} - 1$$ -All other data (test data and unseen prediction data in dry/live/backtest) is always automatically normalized to the training feature space according to industry standards. FreqAI stores all the metadata required to ensure that test and prediction features will be properly normalized and that predictions are properly denormalized. For this reason, it is not recommended to eschew industry standards and modify FreqAI internals - however - advanced users can do so by inheriting `train()` in their custom `IFreqaiModel` and using their own normalization functions. +All other data (test data and unseen prediction data in dry/live/backtest) is always automatically normalized to the training feature space according to industry standards. `FreqAI` stores all the metadata required to ensure that test and prediction features will be properly normalized and that predictions are properly denormalized. For this reason, it is not recommended to eschew industry standards and modify `FreqAI` internals - however - advanced users can do so by inheriting `train()` in their custom `IFreqaiModel` and using their own normalization functions. ## Data dimensionality reduction with Principal Component Analysis @@ -167,17 +169,17 @@ This will perform PCA on the features and reduce their dimensionality so that th The `inlier_metric` is a metric aimed at quantifying how similar a the features of a data point are to the most recent historic data points. -You define the lookback window by setting `inlier_metric_window` and FreqAI computes the distance between the present time point and each of the previous `inlier_metric_window` lookback points. A Weibull function is fit to each of the lookback distributions and its cumulative distribution function (CDF) is used to produce a quantile for each lookback point. The `inlier_metric` is then computed for each time point as the average of the corresponding lookback quantiles. The figure below explains the concept for an `inlier_metric_window` of 5. +You define the lookback window by setting `inlier_metric_window` and `FreqAI` computes the distance between the present time point and each of the previous `inlier_metric_window` lookback points. A Weibull function is fit to each of the lookback distributions and its cumulative distribution function (CDF) is used to produce a quantile for each lookback point. The `inlier_metric` is then computed for each time point as the average of the corresponding lookback quantiles. The figure below explains the concept for an `inlier_metric_window` of 5. ![inlier-metric](assets/freqai_inlier-metric.jpg) -FreqAI adds the `inlier_metric` to the training features and hence gives the model access to a novel type of temporal information. +`FreqAI` adds the `inlier_metric` to the training features and hence gives the model access to a novel type of temporal information. This function does **not** remove outliers from the data set. ## Weighting features for temporal importance -FreqAI allows you to set a `weight_factor` to weight recent data more strongly than past data via an exponential function: +`FreqAI` allows you to set a `weight_factor` to weight recent data more strongly than past data via an exponential function: $$ W_i = \exp(\frac{-i}{\alpha*n}) $$ @@ -187,13 +189,13 @@ where $W_i$ is the weight of data point $i$ in a total set of $n$ data points. B ## Outlier detection -Equity and crypto markets suffer from a high level of non-patterned noise in the form of outlier data points. FreqAI implements a variety of methods to identify such outliers and hence mitigate risk. +Equity and crypto markets suffer from a high level of non-patterned noise in the form of outlier data points. `FreqAI` implements a variety of methods to identify such outliers and hence mitigate risk. ### Identifying outliers with the Dissimilarity Index (DI) The Dissimilarity Index (DI) aims to quantify the uncertainty associated with each prediction made by the model. -You can tell FreqAI to remove outlier data points from the training/test data sets using the DI by including the following statement in the config: +You can tell `FreqAI` to remove outlier data points from the training/test data sets using the DI by including the following statement in the config: ```json "freqai": { @@ -203,7 +205,7 @@ You can tell FreqAI to remove outlier data points from the training/test data se } ``` - The DI allows predictions which are outliers (not existent in the model feature space) to be thrown out due to low levels of certainty. To do so, FreqAI measures the distance between each training data point (feature vector), $X_{a}$, and all other training data points: + The DI allows predictions which are outliers (not existent in the model feature space) to be thrown out due to low levels of certainty. To do so, `FreqAI` measures the distance between each training data point (feature vector), $X_{a}$, and all other training data points: $$ d_{ab} = \sqrt{\sum_{j=1}^p(X_{a,j}-X_{b,j})^2} $$ @@ -227,7 +229,7 @@ Below is a figure that describes the DI for a 3D data set. ### Identifying outliers using a Support Vector Machine (SVM) -You can tell FreqAI to remove outlier data points from the training/test data sets using a Support Vector Machine (SVM) by including the following statement in the config: +You can tell `FreqAI` to remove outlier data points from the training/test data sets using a Support Vector Machine (SVM) by including the following statement in the config: ```json "freqai": { @@ -239,7 +241,7 @@ You can tell FreqAI to remove outlier data points from the training/test data se The SVM will be trained on the training data and any data point that the SVM deems to be beyond the feature space will be removed. -FreqAI uses `sklearn.linear_model.SGDOneClassSVM` (details are available on scikit-learn's webpage [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDOneClassSVM.html) (external website)) and you can elect to provide additional parameters for the SVM, such as `shuffle`, and `nu`. +`FreqAI` uses `sklearn.linear_model.SGDOneClassSVM` (details are available on scikit-learn's webpage [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDOneClassSVM.html) (external website)) and you can elect to provide additional parameters for the SVM, such as `shuffle`, and `nu`. The parameter `shuffle` is by default set to `False` to ensure consistent results. If it is set to `True`, running the SVM multiple times on the same data set might result in different outcomes due to `max_iter` being to low for the algorithm to reach the demanded `tol`. Increasing `max_iter` solves this issue but causes the procedure to take longer time. @@ -247,7 +249,7 @@ The parameter `nu`, *very* broadly, is the amount of data points that should be ### Identifying outliers with DBSCAN -You can configure FreqAI to use DBSCAN to cluster and remove outliers from the training/test data set or incoming outliers from predictions, by activating `use_DBSCAN_to_remove_outliers` in the config: +You can configure `FreqAI` to use DBSCAN to cluster and remove outliers from the training/test data set or incoming outliers from predictions, by activating `use_DBSCAN_to_remove_outliers` in the config: ```json "freqai": { @@ -263,4 +265,4 @@ Given a number of data points $N$, and a distance $\varepsilon$, DBSCAN clusters ![dbscan](assets/freqai_dbscan.jpg) -FreqAI uses `sklearn.cluster.DBSCAN` (details are available on scikit-learn's webpage [here](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) (external website)) with `min_samples` ($N$) taken as 1/4 of the no. of time points in the feature set. `eps` ($\varepsilon$) is computed automatically as the elbow point in the *k-distance graph* computed from the nearest neighbors in the pairwise distances of all data points in the feature set. +`FreqAI` uses `sklearn.cluster.DBSCAN` (details are available on scikit-learn's webpage [here](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) (external website)) with `min_samples` ($N$) taken as 1/4 of the no. of time points in the feature set. `eps` ($\varepsilon$) is computed automatically as the elbow point in the *k-distance graph* computed from the nearest neighbors in the pairwise distances of all data points in the feature set. diff --git a/docs/freqai-parameter-table.md b/docs/freqai-parameter-table.md index 605892134..5969f43c6 100644 --- a/docs/freqai-parameter-table.md +++ b/docs/freqai-parameter-table.md @@ -1,13 +1,13 @@ # Parameter table -The table below will list all configuration parameters available for FreqAI. Some of the parameters are exemplified in `config_examples/config_freqai.example.json`. +The table below will list all configuration parameters available for `FreqAI`. Some of the parameters are exemplified in `config_examples/config_freqai.example.json`. Mandatory parameters are marked as **Required** and have to be set in one of the suggested ways. | Parameter | Description | |------------|-------------| | | **General configuration parameters** -| `freqai` | **Required.**
The parent dictionary containing all the parameters for controlling FreqAI.
**Datatype:** Dictionary. +| `freqai` | **Required.**
The parent dictionary containing all the parameters for controlling `FreqAI`.
**Datatype:** Dictionary. | `train_period_days` | **Required.**
Number of days to use for the training data (width of the sliding window).
**Datatype:** Positive integer. | `backtest_period_days` | **Required.**
Number of days to inference from the trained model before sliding the `train_period_days` window defined above, and retraining the model during backtesting (more info [here](freqai-running.md#backtesting)). This can be fractional days, but beware that the provided `timerange` will be divided by this number to yield the number of trainings necessary to complete the backtest.
**Datatype:** Float. | `identifier` | **Required.**
A unique ID for the current model. If models are saved to disk, the `identifier` allows for reloading specific pre-trained models/data.
**Datatype:** String. @@ -21,21 +21,22 @@ Mandatory parameters are marked as **Required** and have to be set in one of the | | **Feature parameters** | `feature_parameters` | A dictionary containing the parameters used to engineer the feature set. Details and examples are shown [here](freqai-feature-engineering.md).
**Datatype:** Dictionary. | `include_timeframes` | A list of timeframes that all indicators in `populate_any_indicators` will be created for. The list is added as features to the base indicators dataset.
**Datatype:** List of timeframes (strings). -| `include_corr_pairlist` | A list of correlated coins that FreqAI will add as additional features to all `pair_whitelist` coins. All indicators set in `populate_any_indicators` during feature engineering (see details [here](freqai-feature-engineering.md)) will be created for each correlated coin. The correlated coins features are added to the base indicators dataset.
**Datatype:** List of assets (strings). +| `include_corr_pairlist` | A list of correlated coins that `FreqAI` will add as additional features to all `pair_whitelist` coins. All indicators set in `populate_any_indicators` during feature engineering (see details [here](freqai-feature-engineering.md)) will be created for each correlated coin. The correlated coins features are added to the base indicators dataset.
**Datatype:** List of assets (strings). | `label_period_candles` | Number of candles into the future that the labels are created for. This is used in `populate_any_indicators` (see `templates/FreqaiExampleStrategy.py` for detailed usage). You can create custom labels and choose whether to make use of this parameter or not.
**Datatype:** Positive integer. -| `include_shifted_candles` | Add features from previous candles to subsequent candles with the intent of adding historical information. If used, FreqAI will duplicate and shift all features from the `include_shifted_candles` previous candles so that the information is available for the subsequent candle.
**Datatype:** Positive integer. +| `include_shifted_candles` | Add features from previous candles to subsequent candles with the intent of adding historical information. If used, `FreqAI` will duplicate and shift all features from the `include_shifted_candles` previous candles so that the information is available for the subsequent candle.
**Datatype:** Positive integer. | `weight_factor` | Weight training data points according to their recency (see details [here](freqai-feature-engineering.md#weighting-features-for-temporal-importance)).
**Datatype:** Positive float (typically < 1). -| `indicator_max_period_candles` | **No longer used (#7325)**. Replaced by `startup_candle_count` which is set in the [strategy](freqai-configuration.md#building-a-freqai-strategy). `startup_candle_count` is timeframe independent and defines the maximum *period* used in `populate_any_indicators()` for indicator creation. FreqAI uses this parameter together with the maximum timeframe in `include_time_frames` to calculate how many data points to download such that the first data point does not include a NaN
**Datatype:** Positive integer. +| `indicator_max_period_candles` | **No longer used (#7325)**. Replaced by `startup_candle_count` which is set in the [strategy](freqai-configuration.md#building-a-freqai-strategy). `startup_candle_count` is timeframe independent and defines the maximum *period* used in `populate_any_indicators()` for indicator creation. `FreqAI` uses this parameter together with the maximum timeframe in `include_time_frames` to calculate how many data points to download such that the first data point does not include a NaN
**Datatype:** Positive integer. | `indicator_periods_candles` | Time periods to calculate indicators for. The indicators are added to the base indicator dataset.
**Datatype:** List of positive integers. | `stratify_training_data` | Split the feature set into training and testing datasets. For example, `stratify_training_data: 2` would set every 2nd data point into a separate dataset to be pulled from during training/testing. See details about how it works [here](freqai-running.md#data-stratification-for-training-and-testing-the-model).
**Datatype:** Positive integer. -| `principal_component_analysis` | Reduce the dimensionality of the dataset using Principal Component Analysis. See details about how it works [here](freqai-feature-engineering.md#data-dimensionality-reduction-with-principal-component-analysis).
**Datatype:** Boolean. +| `principal_component_analysis` | Automatically reduce the dimensionality of the data set using Principal Component Analysis. See details about how it works [here](#reducing-data-dimensionality-with-principal-component-analysis)
**Datatype:** Boolean. defaults to `false`. +| `plot_feature_importances` | Create a feature importance plot for each model for the top/bottom `plot_feature_importances` number of features.
**Datatype:** Integer, defaults to `0`. | `DI_threshold` | Activates the use of the Dissimilarity Index for outlier detection when set to > 0. See details about how it works [here](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di).
**Datatype:** Positive float (typically < 1). | `use_SVM_to_remove_outliers` | Train a support vector machine to detect and remove outliers from the training dataset, as well as from incoming data points. See details about how it works [here](freqai-feature-engineering.md#identifying-outliers-using-a-support-vector-machine-svm).
**Datatype:** Boolean. | `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. See details about some select parameters [here](freqai-feature-engineering.md#identifying-outliers-using-a-support-vector-machine-svm).
**Datatype:** Dictionary. | `use_DBSCAN_to_remove_outliers` | Cluster data using the DBSCAN algorithm to identify and remove outliers from training and prediction data. See details about how it works [here](freqai-feature-engineering.md#identifying-outliers-with-dbscan).
**Datatype:** Boolean. -| `inlier_metric_window` | If set, FreqAI adds an `inlier_metric` to the training feature set and set the lookback to be the `inlier_metric_window`, i.e., the number of previous time points to compare the current candle to. Details of how the `inlier_metric` is computed can be found [here](freqai-feature-engineering.md#inlier-metric).
**Datatype:** Integer.
Default: 0. -| `noise_standard_deviation` | If set, FreqAI adds noise to the training features with the aim of preventing overfitting. FreqAI generates random deviates from a gaussian distribution with a standard deviation of `noise_standard_deviation` and adds them to all data points. `noise_standard_deviation` should be kept relative to the normalized space, i.e., between -1 and 1. In other words, since data in FreqAI is always normalized to be between -1 and 1, `noise_standard_deviation: 0.05` would result in 32% of the data being randomly increased/decreased by more than 2.5% (i.e., the percent of data falling within the first standard deviation).
**Datatype:** Integer.
Default: 0. -| `outlier_protection_percentage` | Enable to prevent outlier detection methods from discarding too much data. If more than `outlier_protection_percentage` % of points are detected as outliers by the SVM or DBSCAN, FreqAI will log a warning message and ignore outlier detection, i.e., the original dataset will be kept intact. If the outlier protection is triggered, no predictions will be made based on the training dataset.
**Datatype:** Float.
Default: `30`. +| `inlier_metric_window` | If set, `FreqAI` adds an `inlier_metric` to the training feature set and set the lookback to be the `inlier_metric_window`, i.e., the number of previous time points to compare the current candle to. Details of how the `inlier_metric` is computed can be found [here](freqai-feature-engineering.md#inlier-metric).
**Datatype:** Integer.
Default: 0. +| `noise_standard_deviation` | If set, `FreqAI` adds noise to the training features with the aim of preventing overfitting. `FreqAI` generates random deviates from a gaussian distribution with a standard deviation of `noise_standard_deviation` and adds them to all data points. `noise_standard_deviation` should be kept relative to the normalized space, i.e., between -1 and 1. In other words, since data in `FreqAI` is always normalized to be between -1 and 1, `noise_standard_deviation: 0.05` would result in 32% of the data being randomly increased/decreased by more than 2.5% (i.e., the percent of data falling within the first standard deviation).
**Datatype:** Integer.
Default: 0. +| `outlier_protection_percentage` | Enable to prevent outlier detection methods from discarding too much data. If more than `outlier_protection_percentage` % of points are detected as outliers by the SVM or DBSCAN, `FreqAI` will log a warning message and ignore outlier detection, i.e., the original dataset will be kept intact. If the outlier protection is triggered, no predictions will be made based on the training dataset.
**Datatype:** Float.
Default: `30`. | `reverse_train_test_order` | Split the feature dataset (see below) and use the latest data split for training and test on historical split of the data. This allows the model to be trained up to the most recent data point, while avoiding overfitting. However, you should be careful to understand the unorthodox nature of this parameter before employing it.
**Datatype:** Boolean.
Default: `False` (no reversal). | | **Data split parameters** | `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) (external website).
**Datatype:** Dictionary. diff --git a/docs/freqai-running.md b/docs/freqai-running.md index ff9079397..4d95f33e1 100644 --- a/docs/freqai-running.md +++ b/docs/freqai-running.md @@ -1,6 +1,6 @@ # Running FreqAI -There are two ways to train and deploy an adaptive machine learning model - live deployment and deployment for backtesting analysis. FreqAI can be used for both and allows for periodic training of models in live and backtesting, as shown in the following figure. +There are two ways to train and deploy an adaptive machine learning model - live deployment and historical backtesting. In both cases, `FreqAI` runs/simulates periodic retraining of models as shown in the following figure: ![freqai-window](assets/freqai_moving-window.jpg) diff --git a/docs/freqai.md b/docs/freqai.md index 4c95000d1..748820914 100644 --- a/docs/freqai.md +++ b/docs/freqai.md @@ -1,10 +1,10 @@ ![freqai-logo](assets/freqai_doc_logo.svg) -# FreqAI +# `FreqAI` ##Introduction -FreqAI is a software designed to automate a variety of tasks associated with training a predictive machine learning model to generate market forecasts given a set of input features. +`FreqAI` is a software designed to automate a variety of tasks associated with training a predictive machine learning model to generate market forecasts given a set of input features. Features include: @@ -23,7 +23,7 @@ Features include: ## Quick start -The easiest way to quickly test FreqAI is to run it in dry mode with the following command: +The easiest way to quickly test `FreqAI` is to run it in dry mode with the following command: ```bash freqtrade trade --config config_examples/config_freqai.example.json --strategy FreqaiExampleStrategy --freqaimodel LightGBMRegressor --strategy-path freqtrade/templates @@ -37,7 +37,7 @@ An example strategy, prediction model, and config to use as a starting points ca ## General approach -You provide FreqAI with a set of custom *base indicators* (the same way as in a [typical Freqtrade strategy](strategy-customization.md)) as well as target values (*labels*). For each pair in the whitelist, FreqAI trains a model to predict the target values based on the input of custom indicators. The models are then consistently retrained, with a predetermined frequency, to adapt to market conditions. FreqAI offers the ability to both backtest strategies (emulating reality with periodic retraining on historic data) and deploy dry/live runs. In dry/live conditions, FreqAI can be set to constant retraining in a background thread to keep models as up to date as possible. +You provide `FreqAI` with a set of custom *base indicators* (the same way as in a [typical Freqtrade strategy](strategy-customization.md)) as well as target values (*labels*). For each pair in the whitelist, `FreqAI` trains a model to predict the target values based on the input of custom indicators. The models are then consistently retrained, with a predetermined frequency, to adapt to market conditions. `FreqAI` offers the ability to both backtest strategies (emulating reality with periodic retraining on historic data) and deploy dry/live runs. In dry/live conditions, `FreqAI` can be set to constant retraining in a background thread to keep models as up to date as possible. An overview of the algorithm, explaining the data processing pipeline and model usage, is shown below. @@ -45,7 +45,7 @@ An overview of the algorithm, explaining the data processing pipeline and model ### Important machine learning vocabulary -**Features** - the parameters, based on historic data, on which a model is trained. All features for a single candle is stored as a vector. In FreqAI, you build a feature data sets from anything you can construct in the strategy. +**Features** - the parameters, based on historic data, on which a model is trained. All features for a single candle is stored as a vector. In `FreqAI`, you build a feature data sets from anything you can construct in the strategy. **Labels** - the target values that a model is trained toward. Each feature vector is associated with a single label that is defined by you within your strategy. These labels intentionally look into the future, and are not available to the model during dry/live/backtesting. @@ -59,7 +59,7 @@ An overview of the algorithm, explaining the data processing pipeline and model ## Install prerequisites -The normal Freqtrade install process will ask if you wish to install FreqAI dependencies. You should reply "yes" to this question if you wish to use FreqAI. If you did not reply yes, you can manually install these dependencies after the install with: +The normal Freqtrade install process will ask if you wish to install `FreqAI` dependencies. You should reply "yes" to this question if you wish to use `FreqAI`. If you did not reply yes, you can manually install these dependencies after the install with: ``` bash pip install -r requirements-freqai.txt @@ -70,15 +70,18 @@ pip install -r requirements-freqai.txt ### Usage with docker -If you are using docker, a dedicated tag with FreqAI dependencies is available as `:freqai`. As such - you can replace the image line in your docker-compose file with `image: freqtradeorg/freqtrade:develop_freqai`. This image contains the regular FreqAI dependencies. Similar to native installs, Catboost will not be available on ARM based devices. +If you are using docker, a dedicated tag with `FreqAI` dependencies is available as `:freqai`. As such - you can replace the image line in your docker-compose file with `image: freqtradeorg/freqtrade:develop_freqai`. This image contains the regular `FreqAI` dependencies. Similar to native installs, Catboost will not be available on ARM based devices. -## Common pitfalls +### Common pitfalls -FreqAI cannot be combined with dynamic `VolumePairlists` (or any pairlist filter that adds and removes pairs dynamically). This is for performance reasons - FreqAI relies on making quick predictions/retrains. To do this effectively, it needs to download all the training data at the beginning of a dry/live instance. FreqAI stores and appends new candles automatically for future retrains. This means that if new pairs arrive later in the dry run due to a volume pairlist, it will not have the data ready. However, FreqAI does work with the `ShufflePairlist` or a `VolumePairlist` which keeps the total pairlist constant (but reorders the pairs according to volume). +`FreqAI` cannot be combined with dynamic `VolumePairlists` (or any pairlist filter that adds and removes pairs dynamically). +This is for performance reasons - `FreqAI` relies on making quick predictions/retrains. To do this effectively, +it needs to download all the training data at the beginning of a dry/live instance. `FreqAI` stores and appends +new candles automatically for future retrains. This means that if new pairs arrive later in the dry run due to a volume pairlist, it will not have the data ready. However, `FreqAI` does work with the `ShufflePairlist` or a `VolumePairlist` which keeps the total pairlist constant (but reorders the pairs according to volume). ## Credits -FreqAI is developed by a group of individuals who all contribute specific skillsets to the project. +`FreqAI` is developed by a group of individuals who all contribute specific skillsets to the project. Conception and software development: Robert Caulk @robcaulk