add rl_config.device param to support use apple chip GPU

2022-12-23 19:51:33 +08:00 · 2022-12-23 19:51:33 +08:00 · fed9220b55
commit fed9220b55
parent 3012c55ec5
3 changed files with 23 additions and 15 deletions
--- a/docs/freqai-parameter-table.md
+++ b/docs/freqai-parameter-table.md
@ -68,20 +68,21 @@ Mandatory parameters are marked as **Required** and have to be set in one of the
 ### Reinforcement Learning parameters
-|  Parameter | Description |
+| Parameter                     | Description |
-|------------|-------------|
+|-------------------------------|-------------|
-|  |  **Reinforcement Learning Parameters within the `freqai.rl_config` sub dictionary**
+|                               |  **Reinforcement Learning Parameters within the `freqai.rl_config` sub dictionary**
-| `rl_config` | A dictionary containing the control parameters for a Reinforcement Learning model. <br> **Datatype:** Dictionary.
+| `rl_config`                   | A dictionary containing the control parameters for a Reinforcement Learning model. <br> **Datatype:** Dictionary.
-| `train_cycles` | Training time steps will be set based on the `train_cycles * number of training data points. <br> **Datatype:** Integer.
+| `device`                      | Specify where to run. (cpu,mps,cuda) For example, you can specify 'mps' to use the GPU on an apple chip <br> **Datatype:** string.
-| `cpu_count` | Number of processors to dedicate to the Reinforcement Learning training process. <br> **Datatype:** int.
+| `train_cycles`                | Training time steps will be set based on the `train_cycles * number of training data points. <br> **Datatype:** Integer.
-| `max_trade_duration_candles`| Guides the agent training to keep trades below desired length. Example usage shown in `prediction_models/ReinforcementLearner.py` within the customizable `calculate_reward()` function. <br> **Datatype:** int.
+| `cpu_count`                   | Number of processors to dedicate to the Reinforcement Learning training process. <br> **Datatype:** int.
-| `model_type` | Model string from stable_baselines3 or SBcontrib. Available strings include: `'TRPO', 'ARS', 'RecurrentPPO', 'MaskablePPO', 'PPO', 'A2C', 'DQN'`. User should ensure that `model_training_parameters` match those available to the corresponding stable_baselines3 model by visiting their documentaiton. [PPO doc](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (external website) <br> **Datatype:** string.
+| `max_trade_duration_candles`  | Guides the agent training to keep trades below desired length. Example usage shown in `prediction_models/ReinforcementLearner.py` within the customizable `calculate_reward()` function. <br> **Datatype:** int.
-| `policy_type` | One of the available policy types from stable_baselines3 <br> **Datatype:** string.
+| `model_type`                  | Model string from stable_baselines3 or SBcontrib. Available strings include: `'TRPO', 'ARS', 'RecurrentPPO', 'MaskablePPO', 'PPO', 'A2C', 'DQN'`. User should ensure that `model_training_parameters` match those available to the corresponding stable_baselines3 model by visiting their documentaiton. [PPO doc](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (external website) <br> **Datatype:** string.
-| `max_training_drawdown_pct` | The maximum drawdown that the agent is allowed to experience during training. <br> **Datatype:** float. <br> Default: 0.8
+| `policy_type`                 | One of the available policy types from stable_baselines3 <br> **Datatype:** string.
-| `cpu_count` | Number of threads/cpus to dedicate to the Reinforcement Learning training process (depending on if `ReinforcementLearning_multiproc` is selected or not). Recommended to leave this untouched, by default, this value is set to the total number of physical cores minus 1. <br> **Datatype:** int. 
+| `max_training_drawdown_pct`   | The maximum drawdown that the agent is allowed to experience during training. <br> **Datatype:** float. <br> Default: 0.8
-| `model_reward_parameters` | Parameters used inside the customizable `calculate_reward()` function in `ReinforcementLearner.py` <br> **Datatype:** int.
+| `cpu_count`                   | Number of threads/cpus to dedicate to the Reinforcement Learning training process (depending on if `ReinforcementLearning_multiproc` is selected or not). Recommended to leave this untouched, by default, this value is set to the total number of physical cores minus 1. <br> **Datatype:** int. 
-| `add_state_info` | Tell FreqAI to include state information in the feature set for training and inferencing. The current state variables include trade duration, current profit, trade position. This is only available in dry/live runs, and is automatically switched to false for backtesting. <br> **Datatype:** bool. <br> Default: `False`.
+| `model_reward_parameters`     | Parameters used inside the customizable `calculate_reward()` function in `ReinforcementLearner.py` <br> **Datatype:** int.
-| `net_arch` | Network architecture which is well described in [`stable_baselines3` doc](https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#examples). In summary: `[<shared layers>, dict(vf=[<non-shared value network layers>], pi=[<non-shared policy network layers>])]`. By default this is set to `[128, 128]`, which defines 2 shared hidden layers with 128 units each.
+| `add_state_info`              | Tell FreqAI to include state information in the feature set for training and inferencing. The current state variables include trade duration, current profit, trade position. This is only available in dry/live runs, and is automatically switched to false for backtesting. <br> **Datatype:** bool. <br> Default: `False`.
 | `net_arch`                    | Network architecture which is well described in [`stable_baselines3` doc](https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#examples). In summary: `[<shared layers>, dict(vf=[<non-shared value network layers>], pi=[<non-shared policy network layers>])]`. By default this is set to `[128, 128]`, which defines 2 shared hidden layers with 128 units each.
 | `randomize_starting_position` | Randomize the starting point of each episode to avoid overfitting. <br> **Datatype:** bool. <br> Default: `False`.
 ### Additional parameters
--- a/freqtrade/freqai/RL/BaseReinforcementLearningModel.py
+++ b/freqtrade/freqai/RL/BaseReinforcementLearningModel.py
@ -44,6 +44,7 @@ class BaseReinforcementLearningModel(IFreqaiModel):
        self.max_threads = min(self.freqai_info['rl_config'].get(
            'cpu_count', 1), max(int(self.max_system_threads / 2), 1))
        th.set_num_threads(self.max_threads)
        self.device = self.freqai_info['rl_config'].get('device', '')
        self.reward_params = self.freqai_info['rl_config']['model_reward_parameters']
        self.train_env: Union[SubprocVecEnv, Type[gym.Env]] = gym.Env()
        self.eval_env: Union[SubprocVecEnv, Type[gym.Env]] = gym.Env()
--- a/freqtrade/freqai/prediction_models/ReinforcementLearner.py
+++ b/freqtrade/freqai/prediction_models/ReinforcementLearner.py
@ -58,10 +58,16 @@ class ReinforcementLearner(BaseReinforcementLearningModel):
                             net_arch=self.net_arch)
        if dk.pair not in self.dd.model_dictionary or not self.continual_learning:
            kwargs = self.freqai_info.get('model_training_parameters', {})
            # set device if device is not None
            if self.device != '':
                kwargs['device'] = self.device
            model = self.MODELCLASS(self.policy_type, self.train_env, policy_kwargs=policy_kwargs,
                                    tensorboard_log=Path(
                                        dk.full_path / "tensorboard" / dk.pair.split('/')[0]),
-                                    **self.freqai_info.get('model_training_parameters', {})
+                                    **kwargs
                                    )
        else:
            logger.info('Continual training activated - starting training from previously '