stable

Author	SHA1	Message	Date
robcaulk	cb80d7c26f	close the multi_proc env before creating new ones in an attempt to avoid increasing processes	2023-02-24 11:19:54 +01:00
robcaulk	4fc0edb8b7	add pair to environment for access inside calculate_reward	2023-02-10 14:45:50 +01:00
robcaulk	7b4abd5ef5	use a dictionary to make code more readable	2022-12-15 12:25:33 +01:00
Emre	2018da0767	Add env_info dict to base environment	2022-12-14 22:03:05 +03:00
robcaulk	2285ca7d2a	add dp to multiproc	2022-12-14 18:22:20 +01:00
robcaulk	24766928ba	reorganize/generalize tensorboard callback	2022-12-04 13:54:30 +01:00
smarmau	d6f45a12ae	add multiproc fix flake8	2022-12-03 22:30:04 +11:00
robcaulk	8dbfd2cacf	improve docstring clarity about how to inherit from ReinforcementLearner, demonstrate inherittance with ReinforcementLearner_multiproc	2022-11-26 11:51:08 +01:00
robcaulk	6394ef4558	fix docstrings	2022-11-13 17:43:52 +01:00
robcaulk	8d7adfabe9	clean RL tests to avoid dir pollution and increase speed	2022-10-08 12:10:38 +02:00
robcaulk	83343dc2f1	control number of threads, update doc	2022-09-29 00:10:18 +02:00
Timothy Pogue	099137adac	remove hasattr calls	2022-09-27 22:35:15 -06:00
Timothy Pogue	9e36b0d2ea	fix formatting	2022-09-27 22:02:33 -06:00
Timothy Pogue	caa47a2f47	close subproc env on shutdown	2022-09-28 03:06:05 +00:00
robcaulk	647200e8a7	isort	2022-09-23 19:30:56 +02:00
robcaulk	77c360b264	improve typing, improve docstrings, ensure global tests pass	2022-09-23 19:17:27 +02:00
robcaulk	8aac644009	add tests. add guardrails.	2022-09-15 00:46:35 +02:00
robcaulk	240b529533	fix tensorboard path so that users can track all historical models	2022-08-31 16:50:39 +02:00
robcaulk	7766350c15	refactor environment inheritence tree to accommodate flexible action types/counts. fix bug in train profit handling	2022-08-28 19:21:57 +02:00
robcaulk	3199eb453b	reduce code for base use-case, ensure multiproc inherits custom env, add ability to limit ram use.	2022-08-25 19:05:51 +02:00
robcaulk	05ccebf9a1	automate eval freq in multiproc	2022-08-25 12:29:48 +02:00
robcaulk	94cfc8e63f	fix multiproc callback, add continual learning to multiproc, fix totalprofit bug in env, set eval_freq automatically, improve default reward	2022-08-25 11:46:18 +02:00
robcaulk	bd870e2331	fix monitor bug, set default values in case user doesnt set params	2022-08-24 16:32:14 +02:00
robcaulk	b708134c1a	switch multiproc thread count to rl_config definition	2022-08-24 13:00:55 +02:00
robcaulk	b26ed7dea4	fix generic reward, add time duration to reward	2022-08-24 13:00:55 +02:00
robcaulk	29f0e01c4a	expose environment reward parameters to the user config	2022-08-24 13:00:55 +02:00
robcaulk	3eb897c2f8	reuse callback, allow user to acces all stable_baselines3 agents via config	2022-08-24 13:00:55 +02:00

27 Commits