From af8f308584a270c4e35d2ad6d768099459975cb1 Mon Sep 17 00:00:00 2001 From: robcaulk Date: Sun, 28 Aug 2022 20:52:03 +0200 Subject: [PATCH] start the reinforcement learning doc --- docs/assets/tensorboard.png | Bin 0 -> 9273 bytes docs/freqai.md | 101 +++++++++++++++++++++++++++++++++++- 2 files changed, 99 insertions(+), 2 deletions(-) create mode 100644 docs/assets/tensorboard.png diff --git a/docs/assets/tensorboard.png b/docs/assets/tensorboard.png new file mode 100644 index 0000000000000000000000000000000000000000..b986900435b28c89e9d9e8d1bdb5413d4411f913 GIT binary patch literal 9273 zcmWk!1yodB5GGbiN?hsg24U&$Zcyp&j-^3V8tD?GQ$QB!knUcPSQ;c;Iz+nu_n-6P zoO|Bey?5r$%=dlsVzf1t@Nmd+P*70tRFvg)fqMXO+F)S-*P^<7T;PW7p={`lf`Z@o z|A{)}{>=s$B=b>t=cDIt=i_JTWsBnH=f~~n=Im{4>0!(5?q&b=SezUMg%(9cUPj+P z=P1`O$rzEf{vHW?N8tczTXb>L&|Q(&fhdfUt7I@4TC-h#win-tVX07MhmG1)w5zl0 zWvP_@O)6*PsnBO92v9NLOKnz?QzWVf_TvU5635(y1u*c_QJ0&&8n!33N6_wH)!`jIpk40vwn`sgL=kMlEO zpmIgw5DXhES((KPTu2#RN5tu%*taeC$I6$Qxr%RC=I=2PXvymlbr+=9f^`NmjF>W7 zAMA)UbL1z@jRUnTd1#Y&#=3jezer(Wu~L89)8qBW@(Qv%errKYCx9~tw>U-z5n)G< z6E|3|_dLx#DGDy3%SrsjWbKPO4ZWp7oyxy?nRdpr^>D3)WiKg1CK>&ADsK3eu_vnw zMfyciajLB{uP2f1!Sp3wL`Oju;dOW_4gD_#`O4_JBhc^uN3j!XVQxJVc&DY`cbv1nE|Q_J?N_g#wmW#d?6a?9b%N; zar)fbSVWvWwIlrM9!BJuoNulw{w4KoIKstm-f23g?`PfZvOA(j`Yqd!~Yt$3Ozum+arIyr5rjh9zSZEfw{ zd?Z{fydX&P@!yP=3M0{%vKV9lX=KmF2CXcXzCo!zJ6*Vig#`*7Xu>X4pIw0v)L_K* zYY&2wVQaZ8m|d1Uw!^ED8v1E()o)z^0UaD16b`wC#1UoD^iT?vflyJ zohQYPCjT!a?BVW;9|CFEKGPCE1M&0oo3wgG0ApLi z_?;)I%8W_6zgz?>-c;cd5CHQnbJVTSCzMTsBRx%psBJQAUkeK>lSV2p=6wJwSXx+w zy{U?@B)#rT+!AJum1Rzi5fi;V$$uNISo{J$60eAkfRd4tN*m1Ky>GEKynbBb5+(Q{ zE*Xk^a8s2G_{&&dU*CS(`@Cev@zjEAc18 zjDh)TW9z0iZ=|N%T6;_pt3LZHd@9lY{(jJ&Pxb+#2Czf;mT>#OUux3EySfG!c^;q5 z?LjhA6R@!zzUnt|fR`W|3|MFo*M~!H@V6o zE(Eh`tTEBYe0L9Oo^tr@rG7_A-JZ2co*=NBR(lC#VJ59a&r@Ht1$sI;#ah)ympRTl zHNF$YI29*g+lka(>-mJK{YpkM-}tKSy=j4?Xe=S%#h+Cs|0=a%2Y&}C-22aMJIw4) z);iQRHH!{hYqWt~QBj_^4`gTLWUjTp|Cg^q?sXw|zU1-1ZX1ZiAM)hIEv}&z5*2NO zy}w4h;P709@uQ=o=Lma#bZW$DC|wWin0(IDPH6s8@J)T@NN;S+J0cz#it_n)!Y>a2y5mTFj!eRJ~5FzVfWK*AJKy{lt`19 zcy+k=k_HZk`%xj!<2p4pX68fI-iNud7L}(qyDkqlxh}skFtkTjKil!jk}dpk7_i)A zeV|{G=?*EXux~|O3ox}=T2)^F+xVPqDoo}}B_adBaSBe2Lj`0*9h+2}?Sr6ftm6zALQWzMJ(a<34;gWq24X}dD z3X0 zCYyg5ulQ+XL}hSj$kQ~_sd0387;Dby)mYsy^4uXi($)|pBTb+D&a<-Be7DZuwB0|M z7R$1TF37gedy@+`t$@_qPJZxq_F&;R;wB>tCQBrzN`BG z%=Vi6@$m3;IEhDgyh?Q06crUosi;b|SuG6bE6JOvfdn)j`0;9#n`WL7IAjP5D%lf` zHVQ1F&@i82Mr3al5A${^8W_mJpOr_vtszo z!!VZmY}j!_kn^((zEn8ME4gdo*n~2yoDqt&wfomz7D>tLkC;_Wi)sb4rweY`phCuU zRjVNG4}>KKBkPlq;%NjTQjVV9og;4vaL*fi7hz3m&)jZ#J?I zTf9DQMo&JMBuWzJk;eMqxF(q=>h)HUU9H@@j~92ap6-K#63oBI)hS)~7w$%1K>ktW zWCH3-F7WnC(Kh;mKC(^KS8}Wnm{~uMR;vH?Yp?a2;%gC(o0dM_&k=pQ!nd=O`6bL) zeMylA{InoT!VQ(4pmO;6a7?j{d+6Zgk>MXN@__n>wY=rrtvmbj07rLN)$tYVZGdMK zS!Q))N+zkCm3n&sva$UyyjJU;V&&6&lm;=nWaZNyOd5}kZd4$o;D?8tLkS@-=sM^j&TbW(fhLp@CiR-vtqYL#xoTj=btgAX*ixa8M6qo~FK*`Eb}oD+fa_@+ zN;E=`UHUaUky1pKG*bNjv=61_c%MNUw{N#-+i8Xc{Cqzy%fieFM8ExfK*2hqGR()+ z2b68;qvN%6K~uleTC=76s?VX>7=iau`?A!AjcLv)-9<1ijgJ~PcRhioC_oTasdE8R zt&-tRwm>LcY|uebvsp9X#;oOy(8BHPHtF zoem~}LuF>L4vE*$nBJ}h#?QJ3KSMyzN@6)?FFk3o(FqW*)L2DQNBT1w13NSe_Jes( zuQ%rjp_~fLse^fbn+I>?Mt-Ppnsp4@)&Flu*4U3}DT>9@3L|B7Y*?=}goF>W z#XD9}432bp^h>$0$l9jug4z;tIu!nNGlbir$$^IPRWBoAD`J-?N19xVTevY%c>Yb5 zCXfU!G>zS4Y|PGb;WJAlxYJnd&l)o(N_|2Gj4yLr)_+hgJB69b{)zjd88o#t)7-N2 z%>PZg3-hzQIG!F)mldN|3=Z&tJ)b}W)TpJ7d6|>`Yx?kLF;1L`qY|6FphMn`w@egF zryfVdyK!#g3aXccl^E5;ohQFdZo%iZJilAHWkLR&x^;1}GIyrm8BJM;V&0iWgBmlo zv^`8Ky)}#~a5V;Y_Wwa9d+eu2Hhz24=Xa}TvyMRK_`4YYnc@Upp-y>_mng+8%ryyU zOyij>502woIJ3u?7$+FwF+E?PB(7O5^wZR20#(xAxJ)9iVpV($Wd2=O8a%@aCd%TJ zsrul3qleLT&DmMh#MTm(Qd}3BCjR=hOsk03nXk?s8#tiXnUYk;Rl;HlB48* zws~zTs-~|(bSuDl$ijob!waCq4S6b;&+N8!7IOD`wT@4G6;feNEt%O9R+cQ=?k?Yn z5SKs+CEOInmz~bgtt*=Ge~9|BbytflQ(<=!C%G%sD){zAHf}6@S8GJf@7M-F65UqY zjvGHW+DH|NC$O)5mNs|inqMsEDS2n(=3MK(r3&(5H$YhT^YUI%F}HXMP`8BAf3PGC zA1v}9bvoLbg3Lo$I8?P*;1d%%+?gt+x>$6ZC?9}Qw!6y;;AaXUHvsDbVD!mU>QqyQ z72M|1onZ^r@lu@r;gXqFX;I0ZjjJ8VYN+=N?E_!kE9Cu<%@yS#7QMa*v_oYM`?{l# z!E@0oZ3P>CzJ%_!&wP0Z9<62ZG>!7EdHU@BxaF>QyT^gsbsF#g0<8hlR7OBp4?0}F zRHYJJ{))gz3fGg00s*zIydX}HaCc3^mB?*(M!=xWS~ZxW$Zc)a640{eYp@JQxcn}i zen(tN$BhVielFe9cwBTZ)-UybsQd$lnP%2 zYfNdqH^(4r_s%fwk-Q7yn z7VT#5b8-mR=ie_%++jLx$F=?)e@8h?+Gg!b#ecZ3OLjh9)YNR!IV&M^IA6t|&4@t=dCk2+qzrj!#={;`RJ}Rr%vdYCPNt2c_G=l3J4a z=DhGg$k;1MN=a3Z`k4Ko;G_Ode?q&+W(!h9N@YSB9UpbaB+aJ9#;gghC887+h%`Ay zPvdi3$w%Dzp>nIP3nNxT*0Fc(C28+66a>GPF1lQHh!FJ`O`26lmC*2I;YQ(8#lH8O z9Fh6Po1x}uA*?W;7Hq2&%aL_d!eeG#6H4ukG=mrntppJg3rlZ&8wdCEejjQJ} zeVXrMa6Lfqd3!aiU@m=<6cQah=d9E|mJLG>V`kLl4lQ_)RJgM^YujWkJfS00R*=Dv z75+>Q1zm;9W7x=~8B%ATtCsvN%p7d_@eT#Bd!t+WQR;H#c*@ZWTvI>2rA&15E z$I@CDZ9e{VprSs)nW`9;1uWE(1pM;%lMm{{$Tor~mI14-y3;ErF zsF0mXLklT1nbx>(RcUZZSgL3y6@LWFV3w+~VSgtVdPKgrJL!6=2{~wg`7O3dWFq?o zzz!@Ii1pPqtledh8yPy3iYh_WZI7&Z1`!dhYFHFuzZ-hLaG#1#K~YOcDF1E*;8@lx z2K3;gxHhu>xL6P*jHK)9>+=)~UN4_>8hGvSKnZ?f)%Z>6^rK*uhbajiDDULOY$a=g z9G3d~6ZlB`=}*%uv5bLXsk@mL7btt$;rd91Ex@srnU@v+UEnl#7;k&Ewn${DmyOj> z!lRzAA88VYz)@?feyq2y)tRYMQ&N8Z#7u9k9IpA$QUCqzDIYF6l}Z7b1Db4K|BQe> z+|uHS!9gkE<(G;UYW_@?7SEXeGy8c)|7W37U8#B%`rxoL&53<{9=j{VIsneeS zRu@M)kOdkFmZX;w2;-(-Ih2dPGJ`aIR#ZlEUHhR}5#2)S4Xc=ECGusuSPC1MmfyMe z?h*<{3#yLibm|IDr~ttNj${Mlfa3=4mU^ZxRmmbh*czsd+2Z!m=DO8y0D?mOVn1X; zxZTgUDHkl5tJ#b^ZxvxrR}IcjOEIceY&41=b6Gy#v={^UxdkaklPfUOO>ve;OYQ16 zTlVF;u9e^CoFL?QDC+zlvXp5*+!#?QO)KN!H9y6L-#De))i1uGii*8f6a)Laxw+w0 zY1lkzu6@@wfMRpJlJ$xAoq_!H`XK$YUd%g2K3mq_G+E+NbH-R6y(=|B5!EjjD!>00 z*-oX11AS&l|0Aox`5&#v3`>~+oKGSqD2yo}?CMZk- zAM&g$dm7d**VyyY2@8?vdNM}E;b;)R4CK(TwcOX+^V;(t<_L5|UtI;A8R8pYret?N z?;-d-YttRYe>5$&oI<%;7}9yD0x+r*L4*nEr9~e+@tQY zzgJVF>_yr@ifKXO&3X!YORj28ADdyAxaB7y#?A>Q$_niMKJu1+X1rerg;DRFM5Pxw zYuw3(e?*z)i$Y`7nE^<}NECwyW$DeWDew=Q+j8%^>vMaxzff!KX;nX``H$eo&D7jT zcA4ugTAi7Ho}c-=$b+cNUZd%96BPwG!`Oq$FoAPS7OQq&0YRWEzDfbIOdN3b_03Q% zm1R$N8d7m2my?rQX%FD{1%LkhS(`nbl$zSk&MuM{myCiU?%ZO(ZBmY4l{pbgAP$dv zm;)ZJ)w)AHrp!YQ#MlL4m#du`u_0_5Hr=YD4>xkt?*Dvs|K-%U8L_IXY8HGpARu*s zzy?r&JiSbXK7BrXXfV6$W{hS<%6b^xH^lu^=>5$nH{a@U6D3Hcg%Fk>wfs1oFyY9y z(+Ch)zNA#mvEKye(PBA3$XR2Fo@)bS*{XJUTs;UAtT6)^0rMhI;*wp#KWWo(Yr4ah zFl?1};Q?zuoj=-$EQ?GY;-s#j0myI}w#mwj!!~^5I2=CX00D%& z{}GBZ<7r5~T|Iza46Tl0iQ=S`&l+mA4H|!@_SH(THgmuH6Fb~%mC^nQ)BQxtY=Zt` z!rJvNxVS2nXHNEUm4aLAMD;Y+S~7D?aA=TESYq*Ckt=r57VFPE!jG-3B3`YO7lBl! z2!MMFC)8W@re$T3{rU5!CGdiMYisKat(g`}*kc35z3V~HxXHy5z+@(!!DMVqohST~ zlAVQY6jHal44h+D4i2oEa(j1@z(X3=Au6P>xI5qs4lAJx9z{4%TdH z6}hb=XMsElc&$6@NW%`F-vvDooizvmD*}2M<&b4yV0K|49U%1O==~q!y?O{JdHkR& z`V&wDJ5Rh17hd=BGJc&%(JTkVkTRocuiYs+q8QLcVD8!3neRK7`c}p+WFW&Rk_voX~w1~6<8<9 zSc=`lRA{nimF$l<520&8o$lALpf#A-0KDs|*v4JyeqI5DHLi6%I zM!sraa)+Y2-MfUb-lQq0hnvY@YiPv%d6x!h3}bI#O{7Ebje0#OCu>-<2Q@RQ|=Z+Wv!e zM?lRRW5wbG-k0I&hT#WEA%9LTLSoWYpDj0SwN&qt!_RO3obO*9rcW9sBs%W4`xf>$ zeO9C^KCM4ECyku!Z=dM8-V_8Yd|%3%nH#b74m&x1!VpXOSfkpKy*4TtJp)lV4YbN| zT`mJ^s@eDGN`^PVut9gQ9SrwBvIo0lgVaYsABZ)8bKGltJPwAvU z4{6SJYuoCb+x68g42Bc>Z;x_aT%0oMSz;N^m&l$(c&7CC&=}l*K7Sh+vi|AYbtHAL zjdV6x3?`cdKT&ax#WthH>Qb$?2c!Zxkvp^RDg;pFS<_VkJr9ual9iLxnE_$2%TpAxi-$L$qEdX(ox6({bHrJ-%w5Sd4VKxxA_%Q%2o@hJ^R;$8%Ue2#_4-DRE3Q72o!??_mG+ zfJyvU5Dla|H{$wEshJzQKe1EnSsQfZu^8ekUfxP5>qz{o+r`@8;!=S<-Dv)aGLkNn z+f-Hu?0d4NLE7YPt(dPVZ#ylJYs-h*YI1cdJJQ@#Nvaq$Cr=1EYE82}hec_4g|@_3}=4MN+0-uzTQM7*wHhsiDyzP|ery%#^t9NTVdhzc zLPcHK?=su&rV@Xmzo%%rT@?*c9=T*s6w@W6vK*jK?b? zltaIP(&NYFZ{SLxhbiIVXP%LK0taw`nKvNV_foK#4oY{NIJmOa?Fzm`${b}$}t0$+T{e%shWLmO~Cg1v*y+i!NUO zcBjSc5JVp6{x#dDJIMpv+!{OuIyL9#GnoNvIV@+}Z`+PLD10R(B&aY+vdb0`-)eJu z`ku>@e}J!>eLt;MQd*8fm|(7Z|A6NYsxhiaoteC)&)fOTQPmjlp8W&Yz(52r@co_U zBD7#%xNv*8K>m3!TDi(mui+?cW3v>y^YRe8|J1tKRCp}_mT5|Y6R{w*LWUP0=6h7c zx0+(|9da{YT!eX-Ftz>Z@D^A6tnj6kTo5+RBwSLnn z(}=2mk(`Cna0cc8DH;+QMThT81I#iq^0rV~wBQb&ZtTT-))I=7J|qr;oUZ2PK;WPFLiby?SDM)p)zUHV@`Jd=NKr1o4x}hsz?{kiVvFfJW|5So;Tui^MbYbf;~E{AzH_vO?k%NJ z?!3U;05kSoWitAmU%93hwcq)v1kOxkteJ=l}9ZE zSuB2*%?%7_-7DC~Fx(pe{;Tw0r;hv~x7_C4xFvSKXkiiSJ;8&HBHlu_wh9@}lTT1& zxMRNyK}KEcWS!y?NA`Y4#&v7$5~GWksJN__4MdxDG9+AMy*Va+q+bE7Mlx> zxP3Ft)BTC9SC*U=KbB`bONcoQYH1h#lQp|O}c%e0m&`?QkG4DUP<(j#%GfZghXM+^&cgH_HS%F4h zj+kFU^Lpr4-HO?PU+xGPAao)Rb-@vK^&@*T%U-$fv2D7tcPsD&WJMc43 zC#v<@my=>Ar}M_{16LXv8fS6l_w?LxKa8B#AZ&Tm7S^Yi1feYf3l~|7NuN=R9V4_= zAIHSF4dqVV^Lel1v?TZx#X>gJf{tBNrLrB7gkqAFf8WFXO}>Q(BcmsM&yJf-}nibcQ4K@Yc?Cx_d5Wi)gM>tf0 zzCP3WDd^2zllnErv?hGwmPp9?XP~F~;zi)G@fph4R;P|v10%$U_X_AJ?_6nhD)|@4 zx{4nMu_*ap%)PZ%`^AtbUT5{@Bj!6YfTUQvxP0fBvefhWJ6)`_r6GM{yEFu62yC`& z-`cp1m)+J~0s5;ce=*-ltspcL+BFD%B=1gaK@-oUPnY~zyPq(v;UMQym4iFrvmX={ M1x **Datatype:** Positive float < 1. | `shuffle` | Shuffle the training data points during training. Typically, for time-series forecasting, this is set to `False`.
| | **Model training parameters** -| `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected model library. For example, if the user uses `LightGBMRegressor`, this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) (external website). If the user selects a different model, this dictionary can contain any parameter from that model.
**Datatype:** Dictionary.**Datatype:** Boolean. +| `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected model library. For example, if the user uses `LightGBMRegressor`, this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) (external website). If the user selects a different model, such as `PPO` from stable_baselines3, this dictionary can contain any parameter from that model.
**Datatype:** Dictionary | `n_estimators` | The number of boosted trees to fit in regression.
**Datatype:** Integer. | `learning_rate` | Boosting learning rate during regression.
**Datatype:** Float. | `n_jobs`, `thread_count`, `task_type` | Set the number of threads for parallel processing and the `task_type` (`gpu` or `cpu`). Different model libraries use different parameter names.
**Datatype:** Float. +| | *Reinforcement Learning Parameters** +| `rl_config` | A dictionary containing the control parameters for a Reinforcement Learning model.
**Datatype:** Dictionary. +| `train_cycles` | Training time steps will be set based on the `train_cycles * number of training data points.
**Datatype:** Integer. +| `thread_count` | Number of threads to dedicate to the Reinforcement Learning training process.
**Datatype:** int. +| `max_trade_duration_candles`| Guides the agent training to keep trades below desired length. Example usage shown in `prediction_models/ReinforcementLearner.py` within the user customizable `calculate_reward()`
**Datatype:** int. +| `model_type` | Model string from stable_baselines3 or SBcontrib. Available strings include: `'TRPO', 'ARS', 'RecurrentPPO', 'MaskablePPO', 'PPO', 'A2C', 'DQN'`. User should ensure that `model_training_parameters` match those available to the corresponding stable_baselines3 model by visiting their documentaiton. [PPO doc](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (external website)
**Datatype:** string. +| `policy_type` | One of the available policy types from stable_baselines3
**Datatype:** string. +| `continual_learning` | Number of threads to dedicate to the Reinforcement Learning training process.
**Datatype:** int. +| `thread_count` | If true, the agent will start new trainings from the model selected during the previous training. If false, a new agent is trained from scratch for each training.
**Datatype:** Bool. +| `model_reward_parameters` | Parameters used inside the user customizable `calculate_reward()` function in `ReinforcementLearner.py`
**Datatype:** int. | | **Extraneous parameters** | `keras` | If your model makes use of keras (typical of Tensorflow based prediction models), activate this flag so that the model save/loading follows keras standards. Default value `false`
**Datatype:** boolean. | `conv_width` | The width of a convolutional neural network input tensor or the `ReinforcementLearningModel` `window_size`. This replaces the need for `shift` by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. Default value, 2
**Datatype:** integer. @@ -731,6 +741,93 @@ Given a number of data points $N$, and a distance $\varepsilon$, DBSCAN clusters FreqAI uses `sklearn.cluster.DBSCAN` (details are available on scikit-learn's webpage [here](#https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)) with `min_samples` ($N$) taken as double the no. of user-defined features, and `eps` ($\varepsilon$) taken as the longest distance in the *k-distance graph* computed from the nearest neighbors in the pairwise distances of all data points in the feature set. +## Reinforcement Learning + +Setting up and running a Reinforcement Learning model is as quick and simple as running a Regressor. Users can start training and trading live from example files using: + +```bash +freqtrade trade --freqaimodel ReinforcementLearner --strategy ReinforcementLearningExample5ac --strategy-path freqtrade/freqai/example_strats --config config_examples/config_freqai-rl.example.json +``` + +As users begin to modify the strategy and the prediction model, they will quickly realize some important differences between the Reinforcement Learner and the Regressors/Classifiers. Firstly, the strategy does not set a target value (no labels!). Instead, the user sets a `calculate_reward()` function inside their custom `ReinforcementLearner.py` file. A default `calculate_reward()` is provided inside `prediction_models/ReinforcementLearner.py` to give users the necessary building blocks to start their own models. It is inside the `calculate_reward()` where users express their creative theories about the market. For example, the user wants to reward their agent when it makes a winning trade, and penalize the agent when it makes a losing trade. Or perhaps, the user wishes to reward the agnet for entering trades, and penalize the agent for sitting in trades too long. Below we show examples of how these rewards are all calculated: + +```python + class MyRLEnv(Base5ActionRLEnv): + """ + User made custom environment. This class inherits from BaseEnvironment and gym.env. + Users can override any functions from those parent classes. Here is an example + of a user customized `calculate_reward()` function. + """ + + def calculate_reward(self, action): + + # first, penalize if the action is not valid + if not self._is_valid(action): + return -2 + + pnl = self.get_unrealized_profit() + rew = np.sign(pnl) * (pnl + 1) + factor = 100 + + # reward agent for entering trades + if action in (Actions.Long_enter.value, Actions.Short_enter.value) \ + and self._position == Positions.Neutral: + return 25 + # discourage agent from not entering trades + if action == Actions.Neutral.value and self._position == Positions.Neutral: + return -1 + + max_trade_duration = self.rl_config.get('max_trade_duration_candles', 300) + trade_duration = self._current_tick - self._last_trade_tick + + if trade_duration <= max_trade_duration: + factor *= 1.5 + elif trade_duration > max_trade_duration: + factor *= 0.5 + + # discourage sitting in position + if self._position in (Positions.Short, Positions.Long) and \ + action == Actions.Neutral.value: + return -1 * trade_duration / max_trade_duration + + # close long + if action == Actions.Long_exit.value and self._position == Positions.Long: + if pnl > self.profit_aim * self.rr: + factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2) + return float(rew * factor) + + # close short + if action == Actions.Short_exit.value and self._position == Positions.Short: + if pnl > self.profit_aim * self.rr: + factor *= self.rl_config['model_reward_parameters'].get('win_reward_factor', 2) + return float(rew * factor) + + return 0. + +``` + +After users realize there are no labels to set, they will soon understand that the agent is making its "own" entry and exit decisions. This makes strategy construction rather simple (as shown in `example_strats/ReinforcementLearningExample5ac.py`). The entry and exit signals come from the agent in the form of an integer - which are used directly to decide entries and exits in the strategy. + + +### Using Tensorboard + +Reinforcement Learning models benefit from tracking training metrics. FreqAI has integrated Tensorboard to allow users to track training and evaluation performance across all coins and across all retrainings. To start, the user should ensure Tensorboard is installed on their computer: + +```bash +pip3 install tensorboard +``` + +Next, the user can activate Tensorboard with the following command: + +```bash +cd freqtrade +tensorboard --logdir user_data/models/unique-id +``` + +where `unique-id` is the `identifier` set in the `freqai` configuration file. + +![tensorboard](assets/tensorboard.png) + ## Additional information ### Common pitfalls @@ -738,7 +835,7 @@ FreqAI uses `sklearn.cluster.DBSCAN` (details are available on scikit-learn's we FreqAI cannot be combined with dynamic `VolumePairlists` (or any pairlist filter that adds and removes pairs dynamically). This is for performance reasons - FreqAI relies on making quick predictions/retrains. To do this effectively, it needs to download all the training data at the beginning of a dry/live instance. FreqAI stores and appends -new candles automatically for future retrains. This means that if new pairs arrive later in the dry run due to a volume pairlist, it will not have the data ready. However, FreqAI does work with the `ShufflePairlist` or a `VolumePairlist` which keeps the total pairlist constant (but reorders the pairs according to volume). +new candles automatically for future retrains. This means that if new pairs arrive later in the dry run due to a volume pairlist, it will not have the data ready. However, FreqAI does work with the `ShuffleFilter` or a `VolumePairlist` which keeps the total pairlist constant (but reorders the pairs according to volume). ## Credits