Commit Graph

26 Commits

Author SHA1 Message Date
robcaulk 4fc0edb8b7 add pair to environment for access inside calculate_reward 2023-02-10 14:45:50 +01:00
robcaulk 7b4abd5ef5 use a dictionary to make code more readable 2022-12-15 12:25:33 +01:00
Emre 2018da0767
Add env_info dict to base environment 2022-12-14 22:03:05 +03:00
robcaulk 2285ca7d2a add dp to multiproc 2022-12-14 18:22:20 +01:00
robcaulk 24766928ba reorganize/generalize tensorboard callback 2022-12-04 13:54:30 +01:00
smarmau d6f45a12ae
add multiproc fix flake8 2022-12-03 22:30:04 +11:00
robcaulk 8dbfd2cacf improve docstring clarity about how to inherit from ReinforcementLearner, demonstrate inherittance with ReinforcementLearner_multiproc 2022-11-26 11:51:08 +01:00
robcaulk 6394ef4558 fix docstrings 2022-11-13 17:43:52 +01:00
robcaulk 8d7adfabe9 clean RL tests to avoid dir pollution and increase speed 2022-10-08 12:10:38 +02:00
robcaulk 83343dc2f1 control number of threads, update doc 2022-09-29 00:10:18 +02:00
Timothy Pogue 099137adac remove hasattr calls 2022-09-27 22:35:15 -06:00
Timothy Pogue 9e36b0d2ea fix formatting 2022-09-27 22:02:33 -06:00
Timothy Pogue caa47a2f47 close subproc env on shutdown 2022-09-28 03:06:05 +00:00
robcaulk 647200e8a7 isort 2022-09-23 19:30:56 +02:00
robcaulk 77c360b264 improve typing, improve docstrings, ensure global tests pass 2022-09-23 19:17:27 +02:00
robcaulk 8aac644009 add tests. add guardrails. 2022-09-15 00:46:35 +02:00
robcaulk 240b529533 fix tensorboard path so that users can track all historical models 2022-08-31 16:50:39 +02:00
robcaulk 7766350c15 refactor environment inheritence tree to accommodate flexible action types/counts. fix bug in train profit handling 2022-08-28 19:21:57 +02:00
robcaulk 3199eb453b reduce code for base use-case, ensure multiproc inherits custom env, add ability to limit ram use. 2022-08-25 19:05:51 +02:00
robcaulk 05ccebf9a1 automate eval freq in multiproc 2022-08-25 12:29:48 +02:00
robcaulk 94cfc8e63f fix multiproc callback, add continual learning to multiproc, fix totalprofit bug in env, set eval_freq automatically, improve default reward 2022-08-25 11:46:18 +02:00
robcaulk bd870e2331 fix monitor bug, set default values in case user doesnt set params 2022-08-24 16:32:14 +02:00
robcaulk b708134c1a switch multiproc thread count to rl_config definition 2022-08-24 13:00:55 +02:00
robcaulk b26ed7dea4 fix generic reward, add time duration to reward 2022-08-24 13:00:55 +02:00
robcaulk 29f0e01c4a expose environment reward parameters to the user config 2022-08-24 13:00:55 +02:00
robcaulk 3eb897c2f8 reuse callback, allow user to acces all stable_baselines3 agents via config 2022-08-24 13:00:55 +02:00