update docs, fix bug in environment

This commit is contained in:
robcaulk 2022-11-13 16:56:31 +01:00
parent 3c249ba994
commit 388ca21200
4 changed files with 33 additions and 6 deletions

View File

@ -58,6 +58,7 @@ Mandatory parameters are marked as **Required** and have to be set in one of the
| `max_training_drawdown_pct` | The maximum drawdown that the agent is allowed to experience during training. <br> **Datatype:** float. <br> Default: 0.8
| `cpu_count` | Number of threads/cpus to dedicate to the Reinforcement Learning training process (depending on if `ReinforcementLearning_multiproc` is selected or not). <br> **Datatype:** int.
| `model_reward_parameters` | Parameters used inside the user customizable `calculate_reward()` function in `ReinforcementLearner.py` <br> **Datatype:** int.
| `add_state_info` | Tell FreqAI to include state information in the feature set for training and inferencing. The current state variables include trade duration, current profit, trade position. This is only available in dry/live runs, and is automatically switched to false for backtesting. <br> **Datatype:** bool. <br> Default: `False`.
| | **Extraneous parameters**
| `keras` | If the selected model makes use of Keras (typical for Tensorflow-based prediction models), this flag needs to be activated so that the model save/loading follows Keras standards. <br> **Datatype:** Boolean. <br> Default: `False`.
| `conv_width` | The width of a convolutional neural network input tensor. This replaces the need for shifting candles (`include_shifted_candles`) by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. <br> **Datatype:** Integer. <br> Default: `2`.

View File

@ -16,7 +16,10 @@ Reinforcement learning is a natural progression for FreqAI, since it adds a new
### The RL interface
With the current framework, we aim to expose the training environment to the user via the common "prediction model" file (i.e. CatboostClassifier, LightGBMRegressor, etc.). Users inherit our base environment in this file, which allows them to override as much or as little of the environment as they wish.
With the current framework, we aim to expose the training environment via the common "prediction model" file, which is a user inherited `BaseReinforcementLearner` object (e.g. `freqai/prediction_models/ReinforcementLearner`). Inside this user class, the RL environment is available and customized via `MyRLEnv`:
We envision the majority of users focusing their effort on creative design of the `calculate_reward()` function [details here](#creating-the-reward), while leaving the rest of the environment untouched. Other users may not touch the environment at all, and they will only play with the configruation settings and the powerful feature engineering that already exists in FreqAI. Meanwhile, we enable advanced users to create their own model classes entirely.
@ -49,7 +52,7 @@ where `ReinforcementLearner` will use the templated `ReinforcementLearner` from
informative[f"%-{pair}mfi-period_{t}"] = ta.MFI(informative, timeperiod=t)
informative[f"%-{pair}adx-period_{t}"] = ta.ADX(informative, window=t)
# The following features are necessary for RL models
# The following raw price values are necessary for RL models
informative[f"%-{pair}raw_close"] = informative["close"]
informative[f"%-{pair}raw_open"] = informative["open"]
informative[f"%-{pair}raw_high"] = informative["high"]
@ -131,11 +134,12 @@ It is important to consider that `&-action` depends on which environment they ch
## Configuring the Reinforcement Learner
In order to configure the `Reinforcement Learner` the following dictionary to their `freqai` config:
In order to configure the `Reinforcement Learner` the following dictionary must exist in the `freqai` config:
```json
"rl_config": {
"train_cycles": 25,
"add_state_info": true,
"max_trade_duration_candles": 300,
"max_training_drawdown_pct": 0.02,
"cpu_count": 8,
@ -148,11 +152,14 @@ In order to configure the `Reinforcement Learner` the following dictionary to th
}
```
Parameter details can be found [here](freqai-parameter-table.md), but in general the `train_cycles` decides how many times the agent should cycle through the candle data in its artificial environemtn to train weights in the model. `model_type` is a string which selects one of the available models in [stable_baselines](https://stable-baselines3.readthedocs.io/en/master/)(external link).
Parameter details can be found [here](freqai-parameter-table.md), but in general the `train_cycles` decides how many times the agent should cycle through the candle data in its artificial environment to train weights in the model. `model_type` is a string which selects one of the available models in [stable_baselines](https://stable-baselines3.readthedocs.io/en/master/)(external link).
!!! Note
Remember that the general `model_training_parameters` dictionary should contain all the model hyperparameter customizations for the particular `model_type`. For example, `PPO` parameters can be found [here](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html).
## Creating the reward
As users begin to modify the strategy and the prediction model, they will quickly realize some important differences between the Reinforcement Learner and the Regressors/Classifiers. Firstly, the strategy does not set a target value (no labels!). Instead, the user sets a `calculate_reward()` function inside their custom `ReinforcementLearner.py` file. A default `calculate_reward()` is provided inside `prediction_models/ReinforcementLearner.py` to give users the necessary building blocks to start their own models. It is inside the `calculate_reward()` where users express their creative theories about the market. For example, the user wants to reward their agent when it makes a winning trade, and penalize the agent when it makes a losing trade. Or perhaps, the user wishes to reward the agnet for entering trades, and penalize the agent for sitting in trades too long. Below we show examples of how these rewards are all calculated:
As you begin to modify the strategy and the prediction model, you will quickly realize some important differences between the Reinforcement Learner and the Regressors/Classifiers. Firstly, the strategy does not set a target value (no labels!). Instead, you set the `calculate_reward()` function inside the `ReinforcementLearner.py` file. A default `calculate_reward()` is provided inside `prediction_models/ReinforcementLearner.py` to demonstrate the necessary building blocks for creating rewards. It is inside the `calculate_reward()` where creative theories about the market can be expressed. For example, you can reward your agent when it makes a winning trade, and penalize the agent when it makes a losing trade. Or perhaps, the user wishes to reward the agnet for entering trades, and penalize the agent for sitting in trades too long. Below we show examples of how these rewards are all calculated:
```python
class MyRLEnv(Base5ActionRLEnv):

View File

@ -578,6 +578,25 @@ CONF_SCHEMA = {
"model_training_parameters": {
"type": "object"
},
"rl_config": {
"type": "object",
"properties": {
"train_cycles": {"type": "integer"},
"max_trade_duration_candles": {"type": "integer"},
"add_state_info": {"type": "boolean", "default": False},
"max_training_drawdown_pct": {"type": "number", "default": 0.02},
"cpu_count": {"type": "integer", "default": 1},
"model_type": {"type": "string", "default": "PPO"},
"policy_type": {"type": "string", "default": "MlpPolicy"},
"model_reward_parameters": {
"type": "object",
"properties": {
"rr": {"type": "number", "default": 1},
"profit_aim": {"type": "number", "default": 0.025}
}
}
},
},
},
"required": [
"enabled",

View File

@ -59,7 +59,7 @@ class BaseEnvironment(gym.Env):
if self.config.get('fee', None) is not None:
self.fee = self.config['fee']
elif dp is not None:
self.fee = self.dp.exchange.get_fee(symbol=dp.current_whitelist()[0])
self.fee = dp._exchange.get_fee(symbol=dp.current_whitelist()[0]) # type: ignore
else:
self.fee = 0.0015