Commit Graph

4 Commits

Author SHA1 Message Date
MukavaValkku
13cd18dc9a PPO policy change + verbose=1 2022-08-24 13:00:55 +02:00
robcaulk
926023935f make base 3ac and base 5ac environments. TDQN defaults to 3AC. 2022-08-24 13:00:55 +02:00
robcaulk
9c78e6c26f base PPO model only customizes reward for 3AC 2022-08-24 13:00:55 +02:00
robcaulk
91683e1dca restructure RL so that user can customize environment 2022-08-24 13:00:55 +02:00