slightly update doc wording

2022-08-14 17:08:29 +02:00
parent c9c128f781
commit a5e96881f4
1 changed files with 103 additions and 108 deletions
@@ -66,7 +66,7 @@ directly influence nodal weights within the model.

 ## Install prerequisites

-The normal Freqtrade install process will ask the user if they wish to install `FreqAI` dependencies. The user should reply "yes" to this question if they wish to use FreqAI. If the user did not reply yes, they can manually install these dependencies after the install with:
+The normal Freqtrade install process will ask the user if they wish to install FreqAI dependencies. The user should reply "yes" to this question if they wish to use FreqAI. If the user did not reply yes, they can manually install these dependencies after the install with:

 ``` bash
 pip install -r requirements-freqai.txt
@@ -75,21 +75,21 @@ pip install -r requirements-freqai.txt
 !!! Note
    Catboost will not be installed on arm devices (raspberry, Mac M1, ARM based VPS, ...), since Catboost does not provide wheels for this platform.

-## Configuring the bot
+## Configuring FreqAI

 ### Parameter table

-The table below will list all configuration parameters available for `FreqAI`.
+The table below will list all configuration parameters available for FreqAI.

 Mandatory parameters are marked as **Required**, which means that they are required to be set in one of the possible ways.

 |  Parameter | Description |
 |------------|-------------|
-| `freqai` | **Required.** The dictionary containing all the parameters for controlling FreqAI. <br> **Datatype:** dictionary.
+| `freqai` | **Required.** The parent dictionary containing all the parameters below for controlling FreqAI. <br> **Datatype:** dictionary.
 | `identifier` | **Required.** A unique name for the current model. This can be reused to reload pre-trained models/data. <br> **Datatype:** string.
 | `train_period_days` | **Required.** Number of days to use for the training data (width of the sliding window). <br> **Datatype:** positive integer.
 | `backtest_period_days` | **Required.** Number of days to inference into the trained model before sliding the window and retraining. This can be fractional days, but beware that the user provided `timerange` will be divided by this number to yield the number of trainings necessary to complete the backtest. <br> **Datatype:** Float.
-| `live_retrain_hours` | Frequency of retraining during dry/live runs. Default set to 0, which means it will retrain as often as possible. **Datatype:** Float > 0.
+| `live_retrain_hours` | Frequency of retraining during dry/live runs. Default set to 0, which means it will retrain as often as possible. <br> **Datatype:** Float > 0.
 | `follow_mode` | If true, this instance of FreqAI will look for models associated with `identifier` and load those for inferencing. A `follower` will **not** train new models. `False` by default. <br> **Datatype:** boolean.
 | `startup_candles` | Number of candles needed for *backtesting only* to ensure all indicators are non NaNs at the start of the first train period. <br> **Datatype:** positive integer.
 | `fit_live_predictions_candles` | Computes target (label) statistics from prediction data, instead of from the training data set. Number of candles is the number of historical candles it uses to generate the statistics. <br> **Datatype:** positive integer.
@@ -101,11 +101,11 @@ Mandatory parameters are marked as **Required**, which means that they are requi
 | `include_timeframes` | A list of timeframes that all indicators in `populate_any_indicators` will be created for and added as features to the base asset feature set. <br> **Datatype:** list of timeframes (strings).
 | `label_period_candles` | Number of candles into the future that the labels are created for. This is used in `populate_any_indicators`, refer to `templates/FreqaiExampleStrategy.py` for detailed usage. The user can create custom labels, making use of this parameter not. <br> **Datatype:** positive integer.
 | `include_shifted_candles` | Parameter used to add a sense of temporal recency to flattened regression type input data. `include_shifted_candles` takes all features, duplicates and shifts them by the number indicated by user. <br> **Datatype:** positive integer.
-| `DI_threshold` | Activates the Dissimilarity Index for outlier detection when above 0, explained more [here](#removing-outliers-with-the-dissimilarity-index). <br> **Datatype:** positive float (typically below 1).
+| `DI_threshold` | Activates the Dissimilarity Index for outlier detection when above 0, explained in detail [here](#removing-outliers-with-the-dissimilarity-index). <br> **Datatype:** positive float (typically below 1).
 | `weight_factor` | Used to set weights for training data points according to their recency, see details and a figure of how it works [here](#controlling-the-model-learning-process). <br> **Datatype:** positive float (typically below 1).
 | `principal_component_analysis` | Ask FreqAI to automatically reduce the dimensionality of the data set using PCA. <br> **Datatype:** boolean.
 | `use_SVM_to_remove_outliers` | Ask FreqAI to train a support vector machine to detect and remove outliers from the training data set as well as from incoming data points. <br> **Datatype:** boolean.
-| `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. E.g. `nu` *Very* broadly, is the percentage of data points that should be considered outliers. `shuffle` is by default false to maintain reprodicibility. But these and all others can be added/changed in this dictionary. <br> **Datatype:** dictionary.
+| `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. E.g. `nu` *Very* broadly, is the percentage of data points that should be considered outliers. `shuffle` is by default false to maintain reproducibility. But these and all others can be added/changed in this dictionary. <br> **Datatype:** dictionary.
 | `stratify_training_data` | This value is used to indicate the stratification of the data. e.g. 2 would set every 2nd data point into a separate dataset to be pulled from during training/testing. <br> **Datatype:** positive integer.
 | `indicator_max_period_candles` | The maximum *period* used in `populate_any_indicators()` for indicator creation. FreqAI uses this information in combination with the maximum timeframe to calculate how many data points it should download so that the first data point does not have a NaN <br> **Datatype:** positive integer.
 | `indicator_periods_candles` | A list of integers used to duplicate all indicators according to a set of periods and add them to the feature set. <br> **Datatype:** list of positive integers.
@@ -113,7 +113,7 @@ Mandatory parameters are marked as **Required**, which means that they are requi
 |  |  **Data split parameters**
 | `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) <br> **Datatype:** dictionary.
 | `test_size` | Fraction of data that should be used for testing instead of training. <br> **Datatype:** positive float below 1.
-| `shuffle` | Shuffle the training data points during training. Typically for time-series forecasting, this is set to False. **Datatype:** boolean.
+| `shuffle` | Shuffle the training data points during training. Typically for time-series forecasting, this is set to False. <br> **Datatype:** boolean.
 |  |  **Model training parameters**
 | `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected library. For example, if the user uses `LightGBMRegressor`, then this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html). If the user selects a different model, then this dictionary can contain any parameter from that different model.  <br> **Datatype:** dictionary.
 | `n_estimators` | A common parameter among regressors which sets the number of boosted trees to fit <br> **Datatype:** integer.
@@ -123,8 +123,8 @@ Mandatory parameters are marked as **Required**, which means that they are requi
 | `keras` | If your model makes use of keras (typical of Tensorflow based prediction models), activate this flag so that the model save/loading follows keras standards. Default value `false`  <br> **Datatype:** boolean.
 | `conv_width` | The width of a convolutional neural network input tensor. This replaces the need for `shift` by feeding in historical data points as the second dimension of the tensor. Technically, this parameter can also be used for regressors, but it only adds computational overhead and does not change the model training/prediction. Default value, 2 <br> **Datatype:** integer.

-
 ### Important FreqAI dataframe key patterns
+
 Here are the values the user can expect to include/use inside the typical strategy dataframe (`df[]`):

 |  DataFrame Key | Description |
@@ -137,40 +137,39 @@ Here are the values the user can expect to include/use inside the typical strate

 ### Example config file

-The user interface is isolated to the typical config file. A typical FreqAI
-config setup could include:
+The user interface is isolated to the typical config file. A typical FreqAI config setup could include:

 ```json
    "freqai": {
-                "startup_candles": 10000,
-                "purge_old_models": true,
-                "train_period_days" : 30,
-                "backtest_period_days" : 7,
-                "identifier" :  "unique-id",
-                "feature_parameters" : {
-                    "include_timeframes" : ["5m","15m","4h"],
-                    "include_corr_pairlist": [
-                            "ETH/USD",
-                            "LINK/USD",
-                            "BNB/USD"
-                    ],
-                    "label_period_candles": 24,
-                    "include_shifted_candles": 2,
-                    "weight_factor":  0,
-                    "indicator_max_period_candles": 20,
-                    "indicator_periods_candles": [10, 20]
-                },
-                "data_split_parameters" : {
-                    "test_size": 0.25,
-                    "random_state": 42
-                },
-                "model_training_parameters" : {
-                    "n_estimators": 100,
-                    "random_state": 42,
-                    "learning_rate": 0.02,
-                    "task_type": "CPU",
-                },
-        }
+        "startup_candles": 10000,
+        "purge_old_models": true,
+        "train_period_days": 30,
+        "backtest_period_days": 7,
+        "identifier" : "unique-id",
+        "feature_parameters" : {
+            "include_timeframes": ["5m","15m","4h"],
+            "include_corr_pairlist": [
+                "ETH/USD",
+                "LINK/USD",
+                "BNB/USD"
+            ],
+            "label_period_candles": 24,
+            "include_shifted_candles": 2,
+            "weight_factor":  0,
+            "indicator_max_period_candles": 20,
+            "indicator_periods_candles": [10, 20]
+        },
+        "data_split_parameters" : {
+            "test_size": 0.25,
+            "random_state": 42
+        },
+        "model_training_parameters" : {
+            "n_estimators": 100,
+            "random_state": 42,
+            "learning_rate": 0.02,
+            "task_type": "CPU",
+        },
+    }
 ```

 ### Feature engineering
@@ -275,42 +274,9 @@ The user of the present example does not wish to pass the `bb_lowerband` as a fe
 and has therefore not prepended it with `%`. The user does, however, wish to pass `bb_width` to the
 model for training/prediction and has therefore prepended it with `%`.

-Note: features **must** be defined in `populate_any_indicators()`. Making features in `populate_indicators()`
-will fail in live/dry mode. If the user wishes to add generalized features that are not associated with 
-a specific pair or timeframe, they should use the following structure inside `populate_any_indicators()`
-(as exemplified in `freqtrade/templates/FreqaiExampleStrategy.py`:
-
-```python
-    def populate_any_indicators(self, metadata, pair, df, tf, informative=None, coin="", set_generalized_indicators=False):
-
-        ...
-
-        # Add generalized indicators here (because in live, it will call only this function to populate 
-        # indicators for retraining). Notice how we ensure not to add them multiple times by associating
-        # these generalized indicators to the basepair/timeframe
-        if set_generalized_indicators:
-            df['%-day_of_week'] = (df["date"].dt.dayofweek + 1) / 7
-            df['%-hour_of_day'] = (df['date'].dt.hour + 1) / 25
-
-            # user adds targets here by prepending them with &- (see convention below)
-            # If user wishes to use multiple targets, a multioutput prediction model
-            # needs to be used such as templates/CatboostPredictionMultiModel.py
-            df["&-s_close"] = (
-                df["close"]
-                .shift(-self.freqai_info["feature_parameters"]["label_period_candles"])
-                .rolling(self.freqai_info["feature_parameters"]["label_period_candles"])
-                .mean()
-                / df["close"]
-                - 1
-                )
-```
-
-(Please see the example script located in `freqtrade/templates/FreqaiExampleStrategy.py` for a full example of `populate_any_indicators()`)
-
-The `include_timeframes` from the example config above are the timeframes of each `populate_any_indicator()`
- included metric for inclusion in the feature set. In the present case, the user is asking for the
-`5m`, `15m`, and `4h` timeframes of the `rsi`, `mfi`, `roc`, and `bb_width` to be included
-in the feature set.
+The `include_timeframes` from the example config above are the timeframes (`tf`) of each call to `populate_any_indicators()`
+included metric for inclusion in the feature set. In the present case, the user is asking for the
+`5m`, `15m`, and `4h` timeframes of the `rsi`, `mfi`, `roc`, and `bb_width` to be included in the feature set.

 In addition, the user can ask for each of these features to be included from
 informative pairs using the `include_corr_pairlist`. This means that the present feature
@@ -324,7 +290,40 @@ FreqAI to include the the past 2 candles for each of the features included in th
 In total, the number of features the present user has created is:

 length of `include_timeframes` * no. features in `populate_any_indicators()` * length of `include_corr_pairlist` * no. `include_shifted_candles` * length of `indicator_periods_candles`  
-_3 * 3 * 3 * 2 * 2 = 108_.
+$3 * 3 * 3 * 2 * 2 = 108$.
+
+!!! Note
+    Features **must** be defined in `populate_any_indicators()`. Making features in `populate_indicators()`
+    will fail in live/dry mode. If the user wishes to add generalized features that are not associated with 
+    a specific pair or timeframe, they should use the following structure inside `populate_any_indicators()`
+    (as exemplified in `freqtrade/templates/FreqaiExampleStrategy.py`:
+
+    ```python
+        def populate_any_indicators(self, metadata, pair, df, tf, informative=None, coin="", set_generalized_indicators=False):
+
+            ...
+
+            # Add generalized indicators here (because in live, it will call only this function to populate 
+            # indicators for retraining). Notice how we ensure not to add them multiple times by associating
+            # these generalized indicators to the basepair/timeframe
+            if set_generalized_indicators:
+                df['%-day_of_week'] = (df["date"].dt.dayofweek + 1) / 7
+                df['%-hour_of_day'] = (df['date'].dt.hour + 1) / 25
+
+                # user adds targets here by prepending them with &- (see convention below)
+                # If user wishes to use multiple targets, a multioutput prediction model
+                # needs to be used such as templates/CatboostPredictionMultiModel.py
+                df["&-s_close"] = (
+                    df["close"]
+                    .shift(-self.freqai_info["feature_parameters"]["label_period_candles"])
+                    .rolling(self.freqai_info["feature_parameters"]["label_period_candles"])
+                    .mean()
+                    / df["close"]
+                    - 1
+                    )
+    ```
+
+    (Please see the example script located in `freqtrade/templates/FreqaiExampleStrategy.py` for a full example of `populate_any_indicators()`)

 ### Deciding the sliding training window and backtesting duration

@@ -370,7 +369,6 @@ Backtesting mode requires the user to have the data pre-downloaded (unlike dry/l
 If this command has never been executed with the existing config file, then it will train a new model
 for each pair, for each backtesting window within the bigger `--timerange`.

---
 !!! Note "Model reuse"
    Once the training is completed, the user can execute this again with the same config file and
    FreqAI will find the trained models and load them instead of spending time training. This is useful
@@ -430,7 +428,7 @@ The FreqAI strategy requires the user to include the following lines of code in
        :param coin: the name of the coin which will modify the feature names.
        """

-        coint = pair.split('/')[0]
+        coin = pair.split('/')[0]

        with self.freqai.lock:
            if informative is None:
@@ -480,7 +478,7 @@ The FreqAI strategy requires the user to include the following lines of code in

 ```

-Notice how the `populate_any_indicators()` is where the user adds their own features and labels (more information [here](#feature-engineering)). See a full example at `templates/FreqaiExampleStrategy.py`.
+Notice how the `populate_any_indicators()` is where the user adds their own features and labels ([more information](#feature-engineering)). See a full example at `templates/FreqaiExampleStrategy.py`.

 ### Setting classifier targets

@@ -490,7 +488,6 @@ FreqAI includes a the `CatboostClassifier` via the flag `--freqaimodel CatboostC
 df['&s-up_or_down'] = np.where( df["close"].shift(-100) > df["close"], 'up', 'down')
 ```

-
 ### Running the model live

 FreqAI can be run dry/live using the following command
@@ -499,7 +496,7 @@ FreqAI can be run dry/live using the following command
 freqtrade trade --strategy FreqaiExampleStrategy --config config_freqai.example.json --freqaimodel LightGBMRegressor
 ```

-By default, FreqAI will not find find any existing models and will start by training a new one 
+By default, FreqAI will not find any existing models and will start by training a new one
 given the user configuration settings. Following training, it will use that model to make predictions on incoming candles until a new model is available. New models are typically generated as often as possible, with FreqAI managing an internal queue of the pairs to try and keep all models equally "young." FreqAI will always use the newest trained model to make predictions on incoming live data. If users do not want FreqAI to retrain new models as often as possible, they can set `live_retrain_hours` to tell FreqAI to wait at least that number of hours before retraining a new model. Additionally, users can set `expired_hours` to tell FreqAI to avoid making predictions on models aged over this number of hours.

 If the user wishes to start dry/live from a backtested saved model, the user only needs to reuse
@@ -522,7 +519,7 @@ and if a full `live_retrain_hours` has elapsed since the end of the loaded model

 Model training parameters are unique to the ML library used by the user. FreqAI allows users to set any parameter for any library using the `model_training_parameters` dictionary in the user configuration file. The example configuration files show some of the example parameters associated with `Catboost` and `LightGBM`, but users can add any parameters available in those libraries.

-Data split parameters are defined in `data_split_parameters` which can be any parameters associated with `Sklearn`'s `train_test_split()` function. Meanwhile, FreqAI includes some additional parameters such `weight_factor` which allows the user to weight more recent data more strongly
+Data split parameters are defined in `data_split_parameters` which can be any parameters associated with `Sklearn`'s `train_test_split()` function. FreqAI includes some additional parameters such `weight_factor` which allows the user to weight more recent data more strongly
 than past data via an exponential function:

 $$ W_i = \exp(\frac{-i}{\alpha*n}) $$
@@ -545,8 +542,8 @@ data point and all other training data points:
 $$ d_{ab} = \sqrt{\sum_{j=1}^p(X_{a,j}-X_{b,j})^2} $$

 where $d_{ab}$ is the distance between the normalized points $a$ and $b$. $p$
-is the number of features i.e. the length of the vector $X$. The
-characteristic distance, $\overline{d}$ for a set of training data points is simply the mean
+is the number of features i.e. the length of the vector $X$. 
+The characteristic distance, $\overline{d}$ for a set of training data points is simply the mean
 of the average distances:

 $$ \overline{d} = \sum_{a=1}^n(\sum_{b=1}^n(d_{ab}/n)/n) $$
@@ -574,8 +571,7 @@ to low levels of certainty. Activating the Dissimilarity Index can be achieved w
    }
 ```

-The user can tweak the DI with `DI_threshold` to increase or decrease the extrapolation of the 
-trained model.
+The user can tweak the DI with `DI_threshold` to increase or decrease the extrapolation of the trained model.

 ### Reducing data dimensionality with Principal Component Analysis

@@ -609,7 +605,7 @@ FreqAI will train an SVM on the training data (or components if the user activat

 ### Clustering the training data and removing outliers with DBSCAN

-The user can tell FreqAI to use DBSCAN to cluster training data and remove outliers from the training data set. The user activates `use_DBSCAN_to_remove_outliers` to cluster training data for identification of outliers. Also used to detect incoming outliers for prediction data points.
+The user can configure FreqAI to use DBSCAN to cluster training data and remove outliers from the training data set. The user activates `use_DBSCAN_to_remove_outliers` to cluster training data for identification of outliers. Also used to detect incoming outliers for prediction data points.

 ```json
    "freqai": {
@@ -709,11 +705,11 @@ this historical data to be reloaded if the user stops and restarts with the same
 ## Extra returns per train

 Users may find that there are some important metrics that they'd like to return to the strategy at the end of each retrain.
-Users can include these metrics by assigining them to `dk.data['extra_returns_per_train']['my_new_value'] = XYZ` inside their custom prediction
+Users can include these metrics by assigning them to `dk.data['extra_returns_per_train']['my_new_value'] = XYZ` inside their custom prediction
 model class. FreqAI takes the `my_new_value` assigned in this dictionary and expands it to fit the return dataframe to the strategy.
 The user can then use the value in the strategy with `dataframe['my_new_value']`. An example of how this is already used in FreqAI is
 the `&*_mean` and `&*_std` values, which indicate the mean and standard deviation of that particular label during the most recent training.
-Another example is shown below if the user wants to use live metrics from the trade databse.
+Another example is shown below if the user wants to use live metrics from the trade database.

 The user needs to set the standard dictionary in the config so FreqAI can return proper dataframe shapes:

@@ -726,7 +722,6 @@ The user needs to set the standard dictionary in the config so FreqAI can return
 These values will likely be overridden by the user prediction model, but in the case where the user model has yet to set them, or needs
 a default initial value - this is the value that will be returned.

-
 ## Building an IFreqaiModel

 FreqAI has multiple example prediction model based libraries such as `Catboost` regression (`freqai/prediction_models/CatboostRegressor.py`) and `LightGBM` regression.
@@ -755,8 +750,8 @@ This file structure is heavily controlled and read by the `FreqaiDataKitchen()`
 and should therefore not be modified.

 ## Credits
-FreqAI was developed by a group of individuals who all contributed specific skillsets to the 
-project.
+
+FreqAI was developed by a group of individuals who all contributed specific skillsets to the project.

 Conception and software development:
 Robert Caulk @robcaulk