add documentation for tensorboard_log, change how users interact with tensorboard_log

2022-12-11 15:31:29 +01:00
parent cb8fc3c8c7
commit 0fd8e214e4
5 changed files with 57 additions and 7 deletions
@@ -247,6 +247,32 @@ where `unique-id` is the `identifier` set in the `freqai` configuration file. Th

 ![tensorboard](assets/tensorboard.jpg)

+
+### Custom logging
+
+FreqAI also provides a built in episodic summary logger called `self.tensorboard_log` for adding custom information to the Tensorboard log. By default, this function is already called once per step inside the environment to record the agent actions. All values accumulated for all steps in a single episode are reported at the conclusion of each episode, followed by a full reset of all metrics to 0 in preparation for the subsequent episode.
+
+
+`self.tensorboard_log` can also be used anywhere inside the environment, for example, it can be added to the `calculate_reward` function to collect more detailed information about how often various parts of the reward were called:
+
+```py
+        class MyRLEnv(Base5ActionRLEnv):
+            """
+            User made custom environment. This class inherits from BaseEnvironment and gym.env.
+            Users can override any functions from those parent classes. Here is an example
+            of a user customized `calculate_reward()` function.
+            """
+            def calculate_reward(self, action: int) -> float:
+                if not self._is_valid(action):
+                    self.tensorboard_log("is_valid")
+                    return -2
+
+```
+
+!!! Note
+    The `self.tensorboard_log()` function is designed for tracking incremented objects only i.e. events, actions inside the training environment. If the event of interest is a float, the float can be passed as the second argument e.g. `self.tensorboard_log("float_metric1", 0.23)` would add 0.23 to `float_metric`.
+
+
 ### Choosing a base environment

 FreqAI provides two base environments, `Base4ActionEnvironment` and `Base5ActionEnvironment`. As the names imply, the environments are customized for agents that can select from 4 or 5 actions. In the `Base4ActionEnvironment`, the agent can enter long, enter short, hold neutral, or exit position. Meanwhile, in the `Base5ActionEnvironment`, the agent has the same actions as Base4, but instead of a single exit action, it separates exit long and exit short. The main changes stemming from the environment selection include: