Feature engineering is handled within `"feature_parameters":{}` in the `FreqAI` config file and in the user strategy. The user adds all their `base features`, such as, e.g., `RSI`, `MFI`, `EMA`, `SMA`, etc., to their strategy. The `base features` can be custom indicators or they can be imported from any technical-analysis library that the user can find. The `base features` are added by the user inside the `populate_any_indicators()` method of the strategy by prepending indicators with `%`, and labels with `&`.
It is advisable for the user to start from the `populate_any_indicators()` in the example strategy (found in `templates/FreqaiExampleStrategy.py`) to ensure that they are following the correct conventions. Here is an example of how to set the indicators and labels in the strategy:
The `include_timeframes` in the config above are the timeframes (`tf`) of each call to `populate_any_indicators()` in the strategy. In the presented case, the user is asking for the `5m`, `15m`, and `4h` timeframes of the `rsi`, `mfi`, `roc`, and `bb_width` to be included in the feature set.
The user can ask for each of the defined features to be included also for informative pairs using the `include_corr_pairlist`. This means that the feature set will include all the features from `populate_any_indicators` on all the `include_timeframes` for each of the correlated pairs defined in the config (`ETH/USD`, `LINK/USD`, and `BNB/USD` in the presented example).
`include_shifted_candles` indicates the number of previous candles to include in the feature set. For example, `include_shifted_candles: 2` tells `FreqAI` to include the past 2 candles for each of the features in the feature set.
In total, the number of features the user of the presented example strat has created is: length of `include_timeframes`* no. features in `populate_any_indicators()` * length of `include_corr_pairlist`* no. `include_shifted_candles` * length of `indicator_periods_candles`
FreqAI is strict when it comes to data normalization. The train features are always normalized to [-1, 1] and all other data (test data and unseen prediction data in dry/live/backtest) is always automatically normalized to the training feature space according to industry standards. FreqAI stores all the metadata required to ensure that test and prediction features will be properly normalized and that predictions are properly denormalized. For this reason, it is not recommended to eschew industry standards and modify FreqAI internals - however - advanced users can do so by inheriting `train()` in their custom `IFreqaiModel` and using their own normalization functions.
This will perform PCA on the features and reduce their dimensionality so that the explained variance of the data set is >= 0.999. Reducing data dimensionality makes training the model faster and hence allows for more up-to-date models.
The user defines the lookback window by setting `inlier_metric_window` and FreqAI computes the distance between the present time point and each of the previous `inlier_metric_window` lookback points. A Weibull function is fit to each of the lookback distributions and its cumulative distribution function is used to produce a quantile for each lookback point. The `inlier_metric` is then computed for each time point as the average of the corresponding lookback quantiles.
where $W_i$ is the weight of data point $i$ in a total set of $n$ data points. Below is a figure showing the effect of different weight factors on the data points in a feature set.