improve parameter table, add better documentation for custom calculate_reward, add various helpful notes in docstrings etc

2022-11-26 11:32:39 +01:00 · 2022-11-26 11:32:39 +01:00 · 9f13d99b99
commit 9f13d99b99
parent 2e82e6784a
2 changed files with 97 additions and 51 deletions
--- a/docs/freqai-parameter-table.md
+++ b/docs/freqai-parameter-table.md
@ -6,7 +6,7 @@ Mandatory parameters are marked as **Required** and have to be set in one of the

 |  Parameter | Description |
 |------------|-------------|
-|  |  **General configuration parameters**
+|  |  **General configuration parameters within the `config.freqai` tree**
 | `freqai` | **Required.** <br> The parent dictionary containing all the parameters for controlling FreqAI. <br> **Datatype:** Dictionary.
 | `train_period_days` | **Required.** <br> Number of days to use for the training data (width of the sliding window). <br> **Datatype:** Positive integer.
 | `backtest_period_days` | **Required.** <br> Number of days to inference from the trained model before sliding the `train_period_days` window defined above, and retraining the model during backtesting (more info [here](freqai-running.md#backtesting)). This can be fractional days, but beware that the provided `timerange` will be divided by this number to yield the number of trainings necessary to complete the backtest. <br> **Datatype:** Float.
@ -20,7 +20,11 @@ Mandatory parameters are marked as **Required** and have to be set in one of the
 | `continual_learning` | Use the final state of the most recently trained model as starting point for the new model, allowing for incremental learning (more information can be found [here](freqai-running.md#continual-learning)). <br> **Datatype:** Boolean. <br> Default: `False`.
 | `write_metrics_to_disk` | Collect train timings, inference timings and cpu usage in json file. <br> **Datatype:** Boolean. <br> Default: `False`
 | `data_kitchen_thread_count` | <br> Designate the number of threads you want to use for data processing (outlier methods, normalization, etc.). This has no impact on the number of threads used for training. If user does not set it (default), FreqAI will use max number of threads - 2 (leaving 1 physical core available for Freqtrade bot and FreqUI) <br> **Datatype:** Positive integer.
-|  |  **Feature parameters**
+
+
+|  Parameter | Description |
+|------------|-------------|
+|  |  **Feature parameters within the `freqai.feature_parameters` sub dictionary**
 | `feature_parameters` | A dictionary containing the parameters used to engineer the feature set. Details and examples are shown [here](freqai-feature-engineering.md). <br> **Datatype:** Dictionary.
 | `include_timeframes` | A list of timeframes that all indicators in `populate_any_indicators` will be created for. The list is added as features to the base indicators dataset. <br> **Datatype:** List of timeframes (strings).
 | `include_corr_pairlist` | A list of correlated coins that FreqAI will add as additional features to all `pair_whitelist` coins. All indicators set in `populate_any_indicators` during feature engineering (see details [here](freqai-feature-engineering.md)) will be created for each correlated coin. The correlated coins features are added to the base indicators dataset. <br> **Datatype:** List of assets (strings).
@ -39,16 +43,28 @@ Mandatory parameters are marked as **Required** and have to be set in one of the
 | `noise_standard_deviation` | If set, FreqAI adds noise to the training features with the aim of preventing overfitting. FreqAI generates random deviates from a gaussian distribution with a standard deviation of `noise_standard_deviation` and adds them to all data points. `noise_standard_deviation` should be kept relative to the normalized space, i.e., between -1 and 1. In other words, since data in FreqAI is always normalized to be between -1 and 1, `noise_standard_deviation: 0.05` would result in 32% of the data being randomly increased/decreased by more than 2.5% (i.e., the percent of data falling within the first standard deviation). <br> **Datatype:** Integer. <br> Default: `0`.
 | `outlier_protection_percentage` | Enable to prevent outlier detection methods from discarding too much data. If more than `outlier_protection_percentage` % of points are detected as outliers by the SVM or DBSCAN, FreqAI will log a warning message and ignore outlier detection, i.e., the original dataset will be kept intact. If the outlier protection is triggered, no predictions will be made based on the training dataset. <br> **Datatype:** Float. <br> Default: `30`.
 | `reverse_train_test_order` | Split the feature dataset (see below) and use the latest data split for training and test on historical split of the data. This allows the model to be trained up to the most recent data point, while avoiding overfitting. However, you should be careful to understand the unorthodox nature of this parameter before employing it. <br> **Datatype:** Boolean. <br> Default: `False` (no reversal).
-|  |  **Data split parameters**
+
+
+|  Parameter | Description |
+|------------|-------------|
+|  |  **Data split parameters within the `freqai.data_split_parameters` sub dictionary**
 | `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) (external website). <br> **Datatype:** Dictionary.
 | `test_size` | The fraction of data that should be used for testing instead of training. <br> **Datatype:** Positive float < 1.
 | `shuffle` | Shuffle the training data points during training. Typically, to not remove the chronological order of data in time-series forecasting, this is set to `False`. <br> **Datatype:** Boolean. <br> Defaut: `False`.
-|  |  **Model training parameters**
+
+
+|  Parameter | Description |
+|------------|-------------|
+|  |  **Model training parameters within the `freqai.model_training_parameters` sub dictionary**
 | `model_training_parameters` | A flexible dictionary that includes all parameters available by the selected model library. For example, if you use `LightGBMRegressor`, this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) (external website). If you select a different model, this dictionary can contain any parameter from that model. A list of the currently available models can be found [here](freqai-configuration.md#using-different-prediction-models).  <br> **Datatype:** Dictionary.
 | `n_estimators` | The number of boosted trees to fit in the training of the model. <br> **Datatype:** Integer.
 | `learning_rate` | Boosting learning rate during training of the model. <br> **Datatype:** Float.
 | `n_jobs`, `thread_count`, `task_type` | Set the number of threads for parallel processing and the `task_type` (`gpu` or `cpu`). Different model libraries use different parameter names. <br> **Datatype:** Float.
-|  |  **Reinforcement Learning Parameters**
+
+
+|  Parameter | Description |
+|------------|-------------|
+|  |  **Reinforcement Learning Parameters within the `freqai.rl_config` sub dictionary**
 | `rl_config` | A dictionary containing the control parameters for a Reinforcement Learning model. <br> **Datatype:** Dictionary.
 | `train_cycles` | Training time steps will be set based on the `train_cycles * number of training data points. <br> **Datatype:** Integer.
 | `cpu_count` | Number of processors to dedicate to the Reinforcement Learning training process. <br> **Datatype:** int.
@ -56,10 +72,13 @@ Mandatory parameters are marked as **Required** and have to be set in one of the
 | `model_type` | Model string from stable_baselines3 or SBcontrib. Available strings include: `'TRPO', 'ARS', 'RecurrentPPO', 'MaskablePPO', 'PPO', 'A2C', 'DQN'`. User should ensure that `model_training_parameters` match those available to the corresponding stable_baselines3 model by visiting their documentaiton. [PPO doc](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (external website) <br> **Datatype:** string.
 | `policy_type` | One of the available policy types from stable_baselines3 <br> **Datatype:** string.
 | `max_training_drawdown_pct` | The maximum drawdown that the agent is allowed to experience during training. <br> **Datatype:** float. <br> Default: 0.8
-| `cpu_count` | Number of threads/cpus to dedicate to the Reinforcement Learning training process (depending on if `ReinforcementLearning_multiproc` is selected or not). <br> **Datatype:** int. 
+| `cpu_count` | Number of threads/cpus to dedicate to the Reinforcement Learning training process (depending on if `ReinforcementLearning_multiproc` is selected or not). Recommended to leave this untouched, by default, this value is set to the total number of physical cores minus 1. <br> **Datatype:** int. 
 | `model_reward_parameters` | Parameters used inside the customizable `calculate_reward()` function in `ReinforcementLearner.py` <br> **Datatype:** int.
 | `add_state_info` | Tell FreqAI to include state information in the feature set for training and inferencing. The current state variables include trade duration, current profit, trade position. This is only available in dry/live runs, and is automatically switched to false for backtesting. <br> **Datatype:** bool. <br> Default: `False`.
+
+|  Parameter | Description |
+|------------|-------------|
 |  |  **Extraneous parameters**
-| `keras` | If the selected model makes use of Keras (typical for Tensorflow-based prediction models), this flag needs to be activated so that the model save/loading follows Keras standards. <br> **Datatype:** Boolean. <br> Default: `False`.
-| `conv_width` | The width of a convolutional neural network input tensor. This replaces the need for shifting candles (`include_shifted_candles`) by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. <br> **Datatype:** Integer. <br> Default: `2`.
-| `reduce_df_footprint` | Recast all numeric columns to float32/int32, with the objective of reducing ram/disk usage and decreasing train/inference timing. This parameter is set in the main level of the Freqtrade configuration file (not inside FreqAI). <br> **Datatype:** Boolean. <br> Default: `False`.
+| `freqai.keras` | If the selected model makes use of Keras (typical for Tensorflow-based prediction models), this flag needs to be activated so that the model save/loading follows Keras standards. <br> **Datatype:** Boolean. <br> Default: `False`.
+| `freqai.conv_width` | The width of a convolutional neural network input tensor. This replaces the need for shifting candles (`include_shifted_candles`) by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. <br> **Datatype:** Integer. <br> Default: `2`.
+| `freqai.reduce_df_footprint` | Recast all numeric columns to float32/int32, with the objective of reducing ram/disk usage and decreasing train/inference timing. This parameter is set in the main level of the Freqtrade configuration file (not inside FreqAI). <br> **Datatype:** Boolean. <br> Default: `False`.
--- a/docs/freqai-reinforcement-learning.md
+++ b/docs/freqai-reinforcement-learning.md
@ -154,55 +154,82 @@ In order to configure the `Reinforcement Learner` the following dictionary must

 Parameter details can be found [here](freqai-parameter-table.md), but in general the `train_cycles` decides how many times the agent should cycle through the candle data in its artificial environment to train weights in the model. `model_type` is a string which selects one of the available models in [stable_baselines](https://stable-baselines3.readthedocs.io/en/master/)(external link).

+!!! Note
+    If you would like to experiment with `continual_learning`, then you should set that value to `true` in the main `freqai` configuration dictionary. This will tell the Reinforcement Learning library to continue training new models from the final state of previous models, instead of retraining new models from scratch each time a retrain is initiated.
+
 !!! Note
    Remember that the general `model_training_parameters` dictionary should contain all the model hyperparameter customizations for the particular `model_type`. For example, `PPO` parameters can be found [here](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html).

-## Creating the reward
+## Creating a custom reward function

-As you begin to modify the strategy and the prediction model, you will quickly realize some important differences between the Reinforcement Learner and the Regressors/Classifiers. Firstly, the strategy does not set a target value (no labels!). Instead, you set the `calculate_reward()` function inside the `ReinforcementLearner.py` file. A default `calculate_reward()` is provided inside `prediction_models/ReinforcementLearner.py` to demonstrate the necessary building blocks for creating rewards. It is inside the `calculate_reward()` where creative theories about the market can be expressed. For example, you can reward your agent when it makes a winning trade, and penalize the agent when it makes a losing trade. Or perhaps, you wish to reward the agent for entering trades, and penalize the agent for sitting in trades too long. Below we show examples of how these rewards are all calculated:
+As you begin to modify the strategy and the prediction model, you will quickly realize some important differences between the Reinforcement Learner and the Regressors/Classifiers. Firstly, the strategy does not set a target value (no labels!). Instead, you set the `calculate_reward()` function inside the `MyRLEnv` class (see below). A default `calculate_reward()` is provided inside `prediction_models/ReinforcementLearner.py` to demonstrate the necessary building blocks for creating rewards, but users are encouraged to create their own custom reinforcement learning model class (see below) and save it to `user_data/freqaimodels`. It is inside the `calculate_reward()` where creative theories about the market can be expressed. For example, you can reward your agent when it makes a winning trade, and penalize the agent when it makes a losing trade. Or perhaps, you wish to reward the agent for entering trades, and penalize the agent for sitting in trades too long. Below we show examples of how these rewards are all calculated:

 ```python
-    class MyRLEnv(Base5ActionRLEnv):
-        """
-        User made custom environment. This class inherits from BaseEnvironment and gym.env.
-        Users can override any functions from those parent classes. Here is an example
-        of a user customized `calculate_reward()` function.
-        """
-        def calculate_reward(self, action):
-            # first, penalize if the action is not valid
-            if not self._is_valid(action):
-                return -2
-            pnl = self.get_unrealized_profit()
+    import from freqtrade.freqai.prediction_models ReinforcementLearner import ReinforcementLearner

-            factor = 100
-            # reward agent for entering trades
-            if action in (Actions.Long_enter.value, Actions.Short_enter.value) \
-                    and self._position == Positions.Neutral:
-                return 25
-            # discourage agent from not entering trades
-            if action == Actions.Neutral.value and self._position == Positions.Neutral:
-                return -1
-            max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300)
-            trade_duration = self._current_tick - self._last_trade_tick
-            if trade_duration <= max_trade_duration:
-                factor *= 1.5
-            elif trade_duration > max_trade_duration:
-                factor *= 0.5
-            # discourage sitting in position
-            if self._position in (Positions.Short, Positions.Long) and \
-               action == Actions.Neutral.value:
-                return -1 * trade_duration / max_trade_duration
-            # close long
-            if action == Actions.Long_exit.value and self._position == Positions.Long:
-                if pnl > self.profit_aim * self.rr:
-                    factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
-                return float(pnl * factor)
-            # close short
-            if action == Actions.Short_exit.value and self._position == Positions.Short:
-                if pnl > self.profit_aim * self.rr:
-                    factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
-                return float(pnl * factor)
-            return 0.
+    class MyCoolRLModel(ReinforcementLearner):
+        """
+        User created RL prediction model. 
+
+        Save this file to `freqtrade/user_data/freqaimodels`
+
+        then use it with:
+
+        freqtrade trade --freqaimodel MyCoolRLModel --config config.json --strategy SomeCoolStrat
+        
+        Here the users can override any of the functions 
+        available in the `IFreqaiModel` inheritance tree. Most importantly for RL, this 
+        is where the user overrides `MyRLEnv` (see below), to define custom
+        `calculate_reward()` function, or to override any other parts of the environment.
+        
+        This class also allows users to override any other part of the IFreqaiModel tree.
+        For example, the user can override `def fit()` or `def train()` or `def predict()` 
+        to take fine-tuned control over these processes.
+
+        Another common override may be `def data_cleaning_predict()` where the user can
+        take fine-tuned control over the data handling pipeline.
+        """
+        class MyRLEnv(Base5ActionRLEnv):
+            """
+            User made custom environment. This class inherits from BaseEnvironment and gym.env.
+            Users can override any functions from those parent classes. Here is an example
+            of a user customized `calculate_reward()` function.
+            """
+            def calculate_reward(self, action):
+                # first, penalize if the action is not valid
+                if not self._is_valid(action):
+                    return -2
+                pnl = self.get_unrealized_profit()
+
+                factor = 100
+                # reward agent for entering trades
+                if action in (Actions.Long_enter.value, Actions.Short_enter.value) \
+                        and self._position == Positions.Neutral:
+                    return 25
+                # discourage agent from not entering trades
+                if action == Actions.Neutral.value and self._position == Positions.Neutral:
+                    return -1
+                max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300)
+                trade_duration = self._current_tick - self._last_trade_tick
+                if trade_duration <= max_trade_duration:
+                    factor *= 1.5
+                elif trade_duration > max_trade_duration:
+                    factor *= 0.5
+                # discourage sitting in position
+                if self._position in (Positions.Short, Positions.Long) and \
+                action == Actions.Neutral.value:
+                    return -1 * trade_duration / max_trade_duration
+                # close long
+                if action == Actions.Long_exit.value and self._position == Positions.Long:
+                    if pnl > self.profit_aim * self.rr:
+                        factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
+                    return float(pnl * factor)
+                # close short
+                if action == Actions.Short_exit.value and self._position == Positions.Short:
+                    if pnl > self.profit_aim * self.rr:
+                        factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2)
+                    return float(pnl * factor)
+                return 0.
 ```

 ### Using Tensorboard