expose full parameter set for SVM outlier detection. Set default shuffle to false to improve reproducibility

2022-07-30 13:40:05 +02:00
parent f22b140782
commit dd8288c090
2 changed files with 4 additions and 3 deletions
@@ -97,7 +97,7 @@ Mandatory parameters are marked as **Required**, which means that they are requi
 | `weight_factor` | Used to set weights for training data points according to their recency, see details and a figure of how it works [here](##controlling-the-model-learning-process). <br> **Datatype:** positive float (typically below 1).
 | `principal_component_analysis` | Ask FreqAI to automatically reduce the dimensionality of the data set using PCA. <br> **Datatype:** boolean.
 | `use_SVM_to_remove_outliers` | Ask FreqAI to train a support vector machine to detect and remove outliers from the training data set as well as from incoming data points. <br> **Datatype:** boolean.
-| `svm_nu` | The `nu` parameter for the support vector machine. *Very* broadly, this is the percentage of data points that should be considered outliers. <br> **Datatype:** float between 0 and 1.
+| `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. E.g. `nu` *Very* broadly, is the percentage of data points that should be considered outliers. `shuffle` is by default false to maintain reprodicibility. But these and all others can be added/changed in this dictionary. <br> **Datatype:** dictionary.
 | `stratify_training_data` | This value is used to indicate the stratification of the data. e.g. 2 would set every 2nd data point into a separate dataset to be pulled from during training/testing. <br> **Datatype:** positive integer.
 | `indicator_max_period_candles` | The maximum *period* used in `populate_any_indicators()` for indicator creation. FreqAI uses this information in combination with the maximum timeframe to calculate how many data points it should download so that the first data point does not have a NaN <br> **Datatype:** positive integer.
 | `indicator_periods_candles` | A list of integers used to duplicate all indicators according to a set of periods and add them to the feature set. <br> **Datatype:** list of positive integers.
@@ -530,8 +530,9 @@ class FreqaiDataKitchen:
        else:
            # use SGDOneClassSVM to increase speed?
-            nu = self.freqai_config["feature_parameters"].get("svm_nu", 0.2)
+            svm_params = self.freqai_config["feature_parameters"].get(
-            self.svm_model = linear_model.SGDOneClassSVM(nu=nu).fit(
+                "svm_params", {"shuffle": False, "nu": 0.1})
            self.svm_model = linear_model.SGDOneClassSVM(**svm_params).fit(
                self.data_dictionary["train_features"]
            )
            y_pred = self.svm_model.predict(self.data_dictionary["train_features"])