FreqAI is a module designed to automate a variety of tasks associated with training a predictive model to generate market forecasts given a set of input features.
* **High performance**: adaptive retraining occurs on separate thread (or on GPU if available) from inferencing and bot trade operations. Keep newest models and data in memory for rapid inferencing.
* **Realistic backtesting**: emulate self-adaptive retraining with backtesting module that automates past retraining.
* **Modifiable**: use the generalized and robust architecture for incorporating any machine learning library/method available in Python. Seven examples available.
FreqAI trains a model to predict the target value based on the input of custom indicators for each pair in the whitelist. These models are consistently retrained to adapt to market conditions. FreqAI offers the ability to both backtest strategies (emulating reality with periodic retraining) and deploy dry/live. In dry/live conditions, FreqAI can be set to constant retraining in a background thread in an effort to keep models as young as possible.
The normal Freqtrade install process will ask the user if they wish to install FreqAI dependencies. The user should reply "yes" to this question if they wish to use FreqAI. If the user did not reply yes, they can manually install these dependencies after the install with:
| `train_period_days` | **Required.** Number of days to use for the training data (width of the sliding window). <br>**Datatype:** positive integer.
| `backtest_period_days` | **Required.** Number of days to inference into the trained model before sliding the window and retraining. This can be fractional days, but beware that the user provided `timerange` will be divided by this number to yield the number of trainings necessary to complete the backtest. <br>**Datatype:** Float.
| `live_retrain_hours` | Frequency of retraining during dry/live runs. Default set to 0, which means it will retrain as often as possible. <br>**Datatype:** Float > 0.
| `follow_mode` | If true, this instance of FreqAI will look for models associated with `identifier` and load those for inferencing. A `follower` will **not** train new models. `False` by default. <br>**Datatype:** boolean.
| `startup_candles` | Number of candles needed for *backtesting only* to ensure all indicators are non NaNs at the start of the first train period. <br>**Datatype:** positive integer.
| `fit_live_predictions_candles` | Computes target (label) statistics from prediction data, instead of from the training data set. Number of candles is the number of historical candles it uses to generate the statistics. <br>**Datatype:** positive integer.
| `purge_old_models` | Tell FreqAI to delete obsolete models. Otherwise, all historic models will remain on disk. Defaults to `False`. <br>**Datatype:** boolean.
| `expiration_hours` | Ask FreqAI to avoid making predictions if a model is more than `expiration_hours` old. Defaults to 0 which means models never expire. <br>**Datatype:** positive integer.
| `feature_parameters` | A dictionary containing the parameters used to engineer the feature set. Details and examples shown [here](#feature-engineering) <br>**Datatype:** dictionary.
| `include_corr_pairlist` | A list of correlated coins that FreqAI will add as additional features to all `pair_whitelist` coins. All indicators set in `populate_any_indicators` will be created for each coin in this list, and that set of features is added to the base asset feature set. <br>**Datatype:** list of assets (strings).
| `include_timeframes` | A list of timeframes that all indicators in `populate_any_indicators` will be created for and added as features to the base asset feature set. <br>**Datatype:** list of timeframes (strings).
| `label_period_candles` | Number of candles into the future that the labels are created for. This is used in `populate_any_indicators`, refer to `templates/FreqaiExampleStrategy.py` for detailed usage. The user can create custom labels, making use of this parameter not. <br>**Datatype:** positive integer.
| `include_shifted_candles` | Parameter used to add a sense of temporal recency to flattened regression type input data. `include_shifted_candles` takes all features, duplicates and shifts them by the number indicated by user. <br>**Datatype:** positive integer.
| `DI_threshold` | Activates the Dissimilarity Index for outlier detection when above 0, explained in detail [here](#removing-outliers-with-the-dissimilarity-index). <br>**Datatype:** positive float (typically below 1).
| `weight_factor` | Used to set weights for training data points according to their recency, see details and a figure of how it works [here](#controlling-the-model-learning-process). <br>**Datatype:** positive float (typically below 1).
| `principal_component_analysis` | Ask FreqAI to automatically reduce the dimensionality of the data set using PCA. <br>**Datatype:** boolean.
| `use_SVM_to_remove_outliers` | Ask FreqAI to train a support vector machine to detect and remove outliers from the training data set as well as from incoming data points. <br>**Datatype:** boolean.
| `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. E.g. `nu`*Very* broadly, is the percentage of data points that should be considered outliers. `shuffle` is by default false to maintain reproducibility. But these and all others can be added/changed in this dictionary. <br>**Datatype:** dictionary.
| `stratify_training_data` | This value is used to indicate the stratification of the data. e.g. 2 would set every 2nd data point into a separate dataset to be pulled from during training/testing. <br>**Datatype:** positive integer.
| `indicator_max_period_candles` | The maximum *period* used in `populate_any_indicators()` for indicator creation. FreqAI uses this information in combination with the maximum timeframe to calculate how many data points it should download so that the first data point does not have a NaN <br>**Datatype:** positive integer.
| `indicator_periods_candles` | A list of integers used to duplicate all indicators according to a set of periods and add them to the feature set. <br>**Datatype:** list of positive integers.
| `use_DBSCAN_to_remove_outliers` | Inactive by default. If true, FreqAI clusters data using DBSCAN to identify and remove outliers from training and prediction data. <br>**Datatype:** float (fraction of 1).
| `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) <br>**Datatype:** dictionary.
| `shuffle` | Shuffle the training data points during training. Typically for time-series forecasting, this is set to False. <br>**Datatype:** boolean.
| `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected library. For example, if the user uses `LightGBMRegressor`, then this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html). If the user selects a different model, then this dictionary can contain any parameter from that different model. <br>**Datatype:** dictionary.
| `n_estimators` | A common parameter among regressors which sets the number of boosted trees to fit <br>**Datatype:** integer.
| `learning_rate` | A common parameter among regressors which sets the boosting learning rate. <br>**Datatype:** float.
| `n_jobs`, `thread_count`, `task_type` | Different libraries use different parameter names to control the number of threads used for parallel processing or whether or not it is a `task_type` of `gpu` or `cpu`. <br>**Datatype:** float.
| `keras` | If your model makes use of keras (typical of Tensorflow based prediction models), activate this flag so that the model save/loading follows keras standards. Default value `false`<br>**Datatype:** boolean.
| `conv_width` | The width of a convolutional neural network input tensor. This replaces the need for `shift` by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. Default value, 2 <br>**Datatype:** integer.
| `df['&*']` | Any dataframe column prepended with `&` in `populate_any_indicators()` is treated as a training target inside FreqAI (typically following the naming convention `&-s*`). These same dataframe columns names are fed back to the user as the predictions. For example, the user wishes to predict the price change in the next 40 candles (similar to `templates/FreqaiExampleStrategy.py`) by setting `df['&-s_close']`. FreqAI makes the predictions and gives them back to the user under the same key (`df['&-s_close']`) to be used in `populate_entry/exit_trend()`. <br>**Datatype:** depends on the output of the model.
| `df['&*_std/mean']` | The standard deviation and mean values of the user defined labels during training (or live tracking with `fit_live_predictions_candles`). Commonly used to understand rarity of prediction (use the z-score as shown in `templates/FreqaiExampleStrategy.py` to evaluate how often a particular prediction was observed during training (or historically with `fit_live_predictions_candles`)<br>**Datatype:** float.
| `df['do_predict']` | An indication of an outlier, this return value is integer between -1 and 2 which lets the user understand if the prediction is trustworthy or not. `do_predict==1` means the prediction is trustworthy. If the [Dissimilarity Index](#removing-outliers-with-the-dissimilarity-index) is above the user defined threshold, it will subtract 1 from `do_predict`. If `use_SVM_to_remove_outliers()` is active, then the Support Vector Machine (SVM) may also detect outliers in training and prediction data. In this case, the SVM will also subtract one from `do_predict`. A particular case is when `do_predict == 2`, it means that the model has expired due to `expired_hours`. <br>**Datatype:** integer between -1 and 2.
| `df['DI_values']` | The raw Dissimilarity Index values to give the user a sense of confidence in the prediction. Lower DI means the data point is closer to the trained parameter space. <br>**Datatype:** float.
| `df['%*']` | Any dataframe column prepended with `%` in `populate_any_indicators()` is treated as a training feature inside FreqAI. For example, the user can include the rsi in the training feature set (similar to `templates/FreqaiExampleStrategy.py`) by setting `df['%-rsi']`. See more details on how this is done [here](#building-the-feature-set). Note: since the number of features prepended with `%` can multiply very quickly (10s of thousands of features is easily engineered using the multiplictative functionality described in the `feature_parameters` table.) these features are removed from the dataframe upon return from FreqAI. If the user wishes to keep a particular type of feature for plotting purposes, you can prepend it with `%%`. <br>**Datatype:** depends on the output of the model.
length of `include_timeframes`* no. features in `populate_any_indicators()` * length of `include_corr_pairlist`* no. `include_shifted_candles` * length of `indicator_periods_candles`
Although fractional `backtest_period_days` is allowed, the user should be ware that the `--timerange` is divided by this value to determine the number of models that FreqAI will need to train in order to backtest the full range. For example, if the user wants to set a `--timerange` of 10 days, and asks for a `backtest_period_days` of 0.1, FreqAI will need to train 100 models per pair to complete the full backtest. This is why it is physically impossible to truly backtest FreqAI adaptive training. The best way to fully test a model is to run it dry and let it constantly train. In this case, backtesting would take the exact same amount of time as a dry run.
Backtesting mode requires the user to have the data pre-downloaded (unlike dry/live, where FreqAI automatically downloads the necessary data). The user should be careful to consider that the range of the downloaded data is more than the backtesting range. This is because FreqAI needs data prior to the desired backtesting range in order to train a model to be ready to make predictions on the first candle of the user set backtesting range. More details on how to calculate the data download timerange can be found [here](#deciding-the-sliding-training-window-and-backtesting-duration).
Notice how the `populate_any_indicators()` is where the user adds their own features and labels ([more information](#feature-engineering)). See a full example at `templates/FreqaiExampleStrategy.py`.
By default, FreqAI will not find any existing models and will start by training a new one
given the user configuration settings. Following training, it will use that model to make predictions on incoming candles until a new model is available. New models are typically generated as often as possible, with FreqAI managing an internal queue of the pairs to try and keep all models equally "young." FreqAI will always use the newest trained model to make predictions on incoming live data. If users do not want FreqAI to retrain new models as often as possible, they can set `live_retrain_hours` to tell FreqAI to wait at least that number of hours before retraining a new model. Additionally, users can set `expired_hours` to tell FreqAI to avoid making predictions on models aged over this number of hours.
Model training parameters are unique to the ML library used by the user. FreqAI allows users to set any parameter for any library using the `model_training_parameters` dictionary in the user configuration file. The example configuration files show some of the example parameters associated with `Catboost` and `LightGBM`, but users can add any parameters available in those libraries.
Data split parameters are defined in `data_split_parameters` which can be any parameters associated with `Sklearn`'s `train_test_split()` function. FreqAI includes some additional parameters such `weight_factor` which allows the user to weight more recent data more strongly
`train_test_split()` has a parameters called `shuffle`, which users also have access to in FreqAI, that allows them to keep the data unshuffled. This is particularly useful to avoid biasing training with temporally auto-correlated data.
The user can configure FreqAI to use DBSCAN to cluster training data and remove outliers from the training data set. The user activates `use_DBSCAN_to_remove_outliers` to cluster training data for identification of outliers. Also used to detect incoming outliers for prediction data points.
FreqAI has multiple example prediction model based libraries such as `Catboost` regression (`freqai/prediction_models/CatboostRegressor.py`) and `LightGBM` regression.