diff --git a/docs/freqai-configuration.md b/docs/freqai-configuration.md index 68d34eddf..50e75b658 100644 --- a/docs/freqai-configuration.md +++ b/docs/freqai-configuration.md @@ -32,7 +32,8 @@ }, } ``` -A full example config is available in `config_examples/config_freqai.example.json`. + +A full example config is available in `config_examples/config_freqai.example.json`. ## Building a `FreqAI` strategy @@ -120,7 +121,7 @@ The `FreqAI` strategy requires including the following lines of code in the stan ``` -Notice how the `populate_any_indicators()` is where [features](freqai-feature-engineering.md#feature-engineering) and labels/tragets are added. A full example strategy is available in `templates/FreqaiExampleStrategy.py`. +Notice how the `populate_any_indicators()` is where [features](freqai-feature-engineering.md#feature-engineering) and labels/targets are added. A full example strategy is available in `templates/FreqaiExampleStrategy.py`. Notice also the location of the labels under `if set_generalized_indicators:` at the bottom of the example. This is where single features and labels/targets should be added to the feature set to avoid duplication of them from various configuration parameters that multiply the feature set, such as `include_timeframes`. @@ -172,10 +173,11 @@ Below are the values you can expect to include/use inside a typical strategy dat | `df['%*']` | Any dataframe column prepended with `%` in `populate_any_indicators()` is treated as a training feature. For example, you can include the RSI in the training feature set (similar to in `templates/FreqaiExampleStrategy.py`) by setting `df['%-rsi']`. See more details on how this is done [here](freqai-feature-engineering.md).
**Note:** Since the number of features prepended with `%` can multiply very quickly (10s of thousands of features is easily engineered using the multiplictative functionality described in the `feature_parameters` table shown above), these features are removed from the dataframe upon return from `FreqAI`. To keep a particular type of feature for plotting purposes, you would prepend it with `%%`.
**Datatype:** Depends on the output of the model. ## Setting the `startup_candle_count` -The `startup_candle_count` in the `FreqAI` strategy needs to be set up in the same way as in the standard Freqtrade strategy (see details [here](strategy-customization.md#strategy-startup-period)). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling the `dataprovider`, to avoid any NaNs at the beginning of the first training. You can easily set this value by identifying the longest period (in candle units) which is passed to the indicator creation functions (e.g., talib functions). In the presented example, `startup_candle_count` is 20 since this is the maximum value in `indicators_periods_candles`. + +The `startup_candle_count` in the `FreqAI` strategy needs to be set up in the same way as in the standard Freqtrade strategy (see details [here](strategy-customization.md#strategy-startup-period)). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling the `dataprovider`, to avoid any NaNs at the beginning of the first training. You can easily set this value by identifying the longest period (in candle units) which is passed to the indicator creation functions (e.g., Ta-Lib functions). In the presented example, `startup_candle_count` is 20 since this is the maximum value in `indicators_periods_candles`. !!! Note - There are instances where the talib functions actually require more data than just the passed `period` or else the feature dataset gets populated with NaNs. Anecdotally, multiplying the `startup_candle_count` by 2 always leads to a fully NaN free training dataset. Hence, it is typically safest to multiply the expected `startup_candle_count` by 2. Look out for this log message to confirm that the data is clean: + There are instances where the Ta-Lib functions actually require more data than just the passed `period` or else the feature dataset gets populated with NaNs. Anecdotally, multiplying the `startup_candle_count` by 2 always leads to a fully NaN free training dataset. Hence, it is typically safest to multiply the expected `startup_candle_count` by 2. Look out for this log message to confirm that the data is clean: ``` 2022-08-31 15:14:04 - freqtrade.freqai.data_kitchen - INFO - dropped 0 training points due to NaNs in populated dataset 4319. diff --git a/docs/freqai-developers.md b/docs/freqai-developers.md index 1635e785d..4bff46f2f 100644 --- a/docs/freqai-developers.md +++ b/docs/freqai-developers.md @@ -1,24 +1,24 @@ -# Development +# Development ## Project architecture -The architechture and functions of `FreqAI` are generalized to encourages development of unique features, functions, models, etc. +The architecture and functions of `FreqAI` are generalized to encourages development of unique features, functions, models, etc. -The class structure and a detailed algorithmic overview is depicted in the following diagram: +The class structure and a detailed algorithmic overview is depicted in the following diagram: ![image](assets/freqai_algorithm-diagram.jpg) As shown, there are three distinct objects comprising `FreqAI`: -* **IFreqaiModel** - A singular persistent object containing all the necessary logic to collect, store, and process data, engineer features, run training, and inference models. -* **FreqaiDataKitchen** - A non-persistent object which is created uniquely for each unique asset/model. Beyond metadata, it also contains a variety of data processing tools. -* **FreqaiDataDrawer** - A singular persistent object containing all the historical predictions, models, and save/load methods. +* **IFreqaiModel** - A singular persistent object containing all the necessary logic to collect, store, and process data, engineer features, run training, and inference models. +* **FreqaiDataKitchen** - A non-persistent object which is created uniquely for each unique asset/model. Beyond metadata, it also contains a variety of data processing tools. +* **FreqaiDataDrawer** - A singular persistent object containing all the historical predictions, models, and save/load methods. -There are a variety of built-in [prediction models](freqai-configuration.md#using-different-prediction-models) which inherit directly from `IFreqaiModel`. Each of these models have full access to all methods in `IFreqaiModel` and can therefore override any of those functions at will. However, advanced users will likely stick to overriding `fit()`, `train()`, `predict()`, and `data_cleaning_train/predict()`. +There are a variety of built-in [prediction models](freqai-configuration.md#using-different-prediction-models) which inherit directly from `IFreqaiModel`. Each of these models have full access to all methods in `IFreqaiModel` and can therefore override any of those functions at will. However, advanced users will likely stick to overriding `fit()`, `train()`, `predict()`, and `data_cleaning_train/predict()`. ## Data handling -`FreqAI` aims to organize model files, prediction data, and meta data in a way that simplifies post-processing and enhances crash recililence by automatic data reloading. The data is saved in a file structure,`user_data_dir/models/`, which contains all the data associated with the trainings and backtests. The `FreqaiDataKitchen()` relies heavily on the file structure for proper training and inferencing and should therefore not be manually modified. +`FreqAI` aims to organize model files, prediction data, and meta data in a way that simplifies post-processing and enhances crash resilience by automatic data reloading. The data is saved in a file structure,`user_data_dir/models/`, which contains all the data associated with the trainings and backtests. The `FreqaiDataKitchen()` relies heavily on the file structure for proper training and inferencing and should therefore not be manually modified. ### File structure diff --git a/docs/freqai-feature-engineering.md b/docs/freqai-feature-engineering.md index 34c9c8443..8f061b9fd 100644 --- a/docs/freqai-feature-engineering.md +++ b/docs/freqai-feature-engineering.md @@ -1,6 +1,6 @@ # Feature engineering -## Defining the features +## Defining the features Low level feature engineering is performed in the user strategy within a function called `populate_any_indicators()`. That function sets the `base features` such as, `RSI`, `MFI`, `EMA`, `SMA`, time of day, volume, etc. The `base features` can be custom indicators or they can be imported from any technical-analysis library that you can find. One important syntax rule is that all `base features` string names are prepended with `%`, while labels/targets are prepended with `&`. @@ -102,7 +102,7 @@ After having defined the `base features`, the next step is to expand upon them u ```json "freqai": { - ... + //... "feature_parameters" : { "include_timeframes": ["5m","15m","4h"], "include_corr_pairlist": [ @@ -114,7 +114,7 @@ After having defined the `base features`, the next step is to expand upon them u "include_shifted_candles": 2, "indicator_periods_candles": [10, 20] }, - ... + //... } ``` diff --git a/docs/freqai-running.md b/docs/freqai-running.md index 4d95f33e1..6c7b56da1 100644 --- a/docs/freqai-running.md +++ b/docs/freqai-running.md @@ -58,7 +58,7 @@ freqtrade backtesting --strategy FreqaiExampleStrategy --strategy-path freqtrade If this command has never been executed with the existing config file, FreqAI will train a new model for each pair, for each backtesting window within the expanded `--timerange`. -Backtesting mode requires [downloading the necessary data](#downloading-data-to-cover-the-full-backtest-period) before deployment (unlike in dry/live mode where FreqAI handles the data downloading automatically). You should be careful to consider that the time range of the downloaded data is more than the backtesting time range. This is because FreqAI needs data prior to the desired backtesting time range in order to train a model to be ready to make predictions on the first candle of the set backtesting time range. More details on how to calculate the data to download can be found [here](#deciding-the-size-of-the-sliding-training-window-and-backtesting-duration). +Backtesting mode requires [downloading the necessary data](#downloading-data-to-cover-the-full-backtest-period) before deployment (unlike in dry/live mode where FreqAI handles the data downloading automatically). You should be careful to consider that the time range of the downloaded data is more than the backtesting time range. This is because FreqAI needs data prior to the desired backtesting time range in order to train a model to be ready to make predictions on the first candle of the set backtesting time range. More details on how to calculate the data to download can be found [here](#deciding-the-size-of-the-sliding-training-window-and-backtesting-duration). !!! Note "Model reuse" Once the training is completed, you can execute the backtesting again with the same config file and @@ -71,11 +71,11 @@ Backtesting mode requires [downloading the necessary data](#downloading-data-to- ### Saving prediction data -To allow for tweaking your strategy (**not** the features!), FreqAI will automatically save the predictions during backtesting so that they can be reused for future backtests and live runs using the same `identifier` model. This provides a performance enhancement geared towards enabling **high-level hyperopting** of entry/exit criteria. +To allow for tweaking your strategy (**not** the features!), FreqAI will automatically save the predictions during backtesting so that they can be reused for future backtests and live runs using the same `identifier` model. This provides a performance enhancement geared towards enabling **high-level hyperopting** of entry/exit criteria. -An additional directory called `predictions`, which contains all the predictions stored in `hdf` format, will be created in the `unique-id` folder. +An additional directory called `predictions`, which contains all the predictions stored in `hdf` format, will be created in the `unique-id` folder. -To change your **features**, you **must** set a new `identifier` in the config to signal to `FreqAI` to train new models. +To change your **features**, you **must** set a new `identifier` in the config to signal to `FreqAI` to train new models. To save the models generated during a particular backtest so that you can start a live deployment from one of them instead of training a new model, you must set `save_backtest_models` to `True` in the config. @@ -132,7 +132,7 @@ The FreqAI specific parameter `label_period_candles` defines the offset (number ## Continual learning -You can choose to adopt a continual learning scheme by setting `"continual_learning": true` in the config. By enabling `continual_learning`, after training an initial model from scratch, subsequent trainings will start from the final model state of the preceding training. This gives the new model a "memory" of the previous state. By default, this is set to `false` which means that all new models are trained from scratch, without input from previous models. +You can choose to adopt a continual learning scheme by setting `"continual_learning": true` in the config. By enabling `continual_learning`, after training an initial model from scratch, subsequent trainings will start from the final model state of the preceding training. This gives the new model a "memory" of the previous state. By default, this is set to `false` which means that all new models are trained from scratch, without input from previous models. ## Hyperopt @@ -148,7 +148,7 @@ freqtrade hyperopt --hyperopt-loss SharpeHyperOptLoss --strategy FreqaiExampleSt - It's not possible to hyperopt indicators in the `populate_any_indicators()` function. This means that you cannot optimize model parameters using hyperopt. Apart from this exception, it is possible to optimize all other [spaces](hyperopt.md#running-hyperopt-with-smaller-search-space). - The backtesting instructions also apply to hyperopt. -The best method for combining hyperopt and FreqAI is to focus on hyperopting entry/exit thresholds/criteria. You need to focus on hyperopting parameters that are not used in your features. For example, you should not try to hyperopt rolling window lengths in the feature creation, or any part of the FreqAI config which changes predictions. In order to efficiently hyperopt the FreqAI strategy, FreqAI stores predictions as dataframes and reuses them. Hence the requirement to hyperopt entry/exit thresholds/criteria only. +The best method for combining hyperopt and FreqAI is to focus on hyperopting entry/exit thresholds/criteria. You need to focus on hyperopting parameters that are not used in your features. For example, you should not try to hyperopt rolling window lengths in the feature creation, or any part of the FreqAI config which changes predictions. In order to efficiently hyperopt the FreqAI strategy, FreqAI stores predictions as dataframes and reuses them. Hence the requirement to hyperopt entry/exit thresholds/criteria only. A good example of a hyperoptable parameter in FreqAI is a threshold for the [Dissimilarity Index (DI)](freqai-feature-engineering.md#identifying-outliers-with-the-dissimilarity-index-di) `DI_values` beyond which we consider data points as outliers: diff --git a/docs/freqai.md b/docs/freqai.md index 748820914..91adbf7ef 100644 --- a/docs/freqai.md +++ b/docs/freqai.md @@ -1,8 +1,8 @@ ![freqai-logo](assets/freqai_doc_logo.svg) -# `FreqAI` +# `FreqAI` -##Introduction +## Introduction `FreqAI` is a software designed to automate a variety of tasks associated with training a predictive machine learning model to generate market forecasts given a set of input features. @@ -47,7 +47,7 @@ An overview of the algorithm, explaining the data processing pipeline and model **Features** - the parameters, based on historic data, on which a model is trained. All features for a single candle is stored as a vector. In `FreqAI`, you build a feature data sets from anything you can construct in the strategy. -**Labels** - the target values that a model is trained toward. Each feature vector is associated with a single label that is defined by you within your strategy. These labels intentionally look into the future, and are not available to the model during dry/live/backtesting. +**Labels** - the target values that a model is trained toward. Each feature vector is associated with a single label that is defined by you within the strategy. These labels intentionally look into the future, and are not available to the model during dry/live/backtesting. **Training** - the process of "teaching" the model to match the feature sets to the associated labels. Different types of models "learn" in different ways. More information about the different models can be found [here](freqai-configuration.md#using-different-prediction-models). @@ -72,7 +72,7 @@ pip install -r requirements-freqai.txt If you are using docker, a dedicated tag with `FreqAI` dependencies is available as `:freqai`. As such - you can replace the image line in your docker-compose file with `image: freqtradeorg/freqtrade:develop_freqai`. This image contains the regular `FreqAI` dependencies. Similar to native installs, Catboost will not be available on ARM based devices. -### Common pitfalls +## Common pitfalls `FreqAI` cannot be combined with dynamic `VolumePairlists` (or any pairlist filter that adds and removes pairs dynamically). This is for performance reasons - `FreqAI` relies on making quick predictions/retrains. To do this effectively,