Merge branch 'develop' into feat/externalsignals

2022-09-06 13:02:36 -06:00
parent b1c0267449 dc4a4bdf09
commit 8bfaf0a998
11 changed files with 90 additions and 38 deletions
@@ -113,14 +113,14 @@ Mandatory parameters are marked as **Required**, which means that they are requi
 | `use_SVM_to_remove_outliers` | Train a support vector machine to detect and remove outliers from the training data set, as well as from incoming data points. See details about how it works [here](#removing-outliers-using-a-support-vector-machine-svm). <br> **Datatype:** Boolean.
 | `svm_params` | All parameters available in Sklearn's `SGDOneClassSVM()`. See details about some select parameters [here](#removing-outliers-using-a-support-vector-machine-svm). <br> **Datatype:** Dictionary.
 | `use_DBSCAN_to_remove_outliers` | Cluster data using DBSCAN to identify and remove outliers from training and prediction data. See details about how it works [here](#removing-outliers-with-dbscan). <br> **Datatype:** Boolean. 
-| `outlier_protection_percentage` | If more than `outlier_protection_percentage` fraction of points are removed as outliers, FreqAI will log a warning message and ignore outlier detection while keeping the original dataset intact. <br> **Datatype:** float. Default: `30`
-| `reverse_train_test_order` | If true, FreqAI will train on the latest data split and test on historical split of the data. This allows the model to be trained up to the most recent data point, while avoiding overfitting. However, users should be careful to understand unorthodox nature of this parameter before employing it. <br> **Datatype:** bool. Default: False
+| `outlier_protection_percentage` | If more than `outlier_protection_percentage` % of points are detected as outliers by the SVM or DBSCAN, FreqAI will log a warning message and ignore outlier detection while keeping the original dataset intact. If the outlier protection is triggered, no predictions will be made based on the training data. <br> **Datatype:** Float. Default: `30`
+| `reverse_train_test_order` | If true, FreqAI will train on the latest data split and test on historical split of the data. This allows the model to be trained up to the most recent data point, while avoiding overfitting. However, users should be careful to understand unorthodox nature of this parameter before employing it. <br> **Datatype:** Boolean. Default: False
 |  |  **Data split parameters**
 | `data_split_parameters` | Include any additional parameters available from Scikit-learn `test_train_split()`, which are shown [here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) (external website). <br> **Datatype:** Dictionary.
 | `test_size` | Fraction of data that should be used for testing instead of training. <br> **Datatype:** Positive float < 1.
-| `shuffle` | Shuffle the training data points during training. Typically, for time-series forecasting, this is set to `False`. <br>
+| `shuffle` | Shuffle the training data points during training. Typically, for time-series forecasting, this is set to `False`. <br> **Datatype:** Boolean.
 |  |  **Model training parameters**
-| `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected model library. For example, if the user uses `LightGBMRegressor`, this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) (external website). If the user selects a different model, this dictionary can contain any parameter from that model.  <br> **Datatype:** Dictionary.**Datatype:** Boolean.
+| `model_training_parameters` | A flexible dictionary that includes all parameters available by the user selected model library. For example, if the user uses `LightGBMRegressor`, this dictionary can contain any parameter available by the `LightGBMRegressor` [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html) (external website). If the user selects a different model, this dictionary can contain any parameter from that model.  <br> **Datatype:** Dictionary.
 | `n_estimators` | The number of boosted trees to fit in regression. <br> **Datatype:** Integer.
 | `learning_rate` | Boosting learning rate during regression. <br> **Datatype:** Float.
 | `n_jobs`, `thread_count`, `task_type` | Set the number of threads for parallel processing and the `task_type` (`gpu` or `cpu`). Different model libraries use different parameter names. <br> **Datatype:** Float.
@@ -280,7 +280,7 @@ The FreqAI strategy requires the user to include the following lines of code in
 Notice how the `populate_any_indicators()` is where the user adds their own features ([more information](#feature-engineering)) and labels ([more information](#setting-classifier-targets)). See a full example at `templates/FreqaiExampleStrategy.py`.

 ### Setting the `startup_candle_count`
-Users need to take care to set the `startup_candle_count` in their strategy the same way they would for any normal Freqtrade strategy (see details [here](strategy-customization.md/#strategy-startup-period)). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling on the `dataprovider` to avoid any NaNs at the beginning of the first training. Users can easily set this value by identifying the longest period (in candle units) that they pass to their indicator creation functions (e.g. talib functions). In the present example, the user would pass 20 to as this value (since it is the maximum value in their `indicators_periods_candles`).
+Users need to take care to set the `startup_candle_count` in their strategy the same way they would for any normal Freqtrade strategy (see details [here](strategy-customization.md#strategy-startup-period)). This value is used by Freqtrade to ensure that a sufficient amount of data is provided when calling on the `dataprovider` to avoid any NaNs at the beginning of the first training. Users can easily set this value by identifying the longest period (in candle units) that they pass to their indicator creation functions (e.g. talib functions). In the present example, the user would pass 20 to as this value (since it is the maximum value in their `indicators_periods_candles`).

 !!! Note
    Typically it is best for users to be safe and multiply their expected `startup_candle_count` by 2. There are instances where the talib functions actually require more data than just the passed `period`. Anecdotally, multiplying the `startup_candle_count` by 2 always leads to a fully NaN free training dataset. Look out for this log message to confirm that your data is clean:
@@ -515,10 +515,10 @@ and if a full `live_retrain_hours` has elapsed since the end of the loaded model
 The FreqAI backtesting module can be executed with the following command:

 ```bash
-freqtrade backtesting --strategy FreqaiExampleStrategy --config config_examples/config_freqai.example.json --freqaimodel LightGBMRegressor --timerange 20210501-20210701
+freqtrade backtesting --strategy FreqaiExampleStrategy --strategy-path freqtrade/templates --config config_examples/config_freqai.example.json --freqaimodel LightGBMRegressor --timerange 20210501-20210701
 ```

-Backtesting mode requires the user to have the data pre-downloaded (unlike in dry/live mode where FreqAI automatically downloads the necessary data). The user should be careful to consider that the time range of the downloaded data is more than the backtesting time range. This is because FreqAI needs data prior to the desired backtesting time range in order to train a model to be ready to make predictions on the first candle of the user-set backtesting time range. More details on how to calculate the data to download can be found [here](#deciding-the-sliding-training-window-and-backtesting-duration).
+Backtesting mode requires the user to have the data [pre-downloaded](#downloading-data-for-backtesting) (unlike in dry/live mode where FreqAI automatically downloads the necessary data). The user should be careful to consider that the time range of the downloaded data is more than the backtesting time range. This is because FreqAI needs data prior to the desired backtesting time range in order to train a model to be ready to make predictions on the first candle of the user-set backtesting time range. More details on how to calculate the data to download can be found [here](#deciding-the-sliding-training-window-and-backtesting-duration). 

 If this command has never been executed with the existing config file, it will train a new model
 for each pair, for each backtesting window within the expanded `--timerange`.
@@ -546,7 +546,7 @@ FreqAI will train have trained 8 separate models at the end of `--timerange` (be
    Although fractional `backtest_period_days` is allowed, the user should be aware that the `--timerange` is divided by this value to determine the number of models that FreqAI will need to train in order to backtest the full range. For example, if the user wants to set a `--timerange` of 10 days, and asks for a `backtest_period_days` of 0.1, FreqAI will need to train 100 models per pair to complete the full backtest. Because of this, a true backtest of FreqAI adaptive training would take a *very* long time. The best way to fully test a model is to run it dry and let it constantly train. In this case, backtesting would take the exact same amount of time as a dry run.

 ### Downloading data for backtesting
-Live/dry instances will download the data automatically for the user, but users who wish to use backtesting functionality still need to download the necessary data using `download-data` (details [here](data-download/#data-downloading)). FreqAI users need to pay careful attention to understanding how much *additional* data needs to be downloaded to ensure that they have a sufficient amount of training data *before* the start of their backtesting timerange. The amount of additional data can be roughly estimated by taking subtracting `train_period_days` and the `startup_candle_count` ([details](#setting-the-startupcandlecount)) from the beginning of the desired backtesting timerange. 
+Live/dry instances will download the data automatically for the user, but users who wish to use backtesting functionality still need to download the necessary data using `download-data` (details [here](data-download.md#data-downloading)). FreqAI users need to pay careful attention to understanding how much *additional* data needs to be downloaded to ensure that they have a sufficient amount of training data *before* the start of their backtesting timerange. The amount of additional data can be roughly estimated by moving the start date of the timerange backwards by `train_period_days` and the `startup_candle_count` ([details](#setting-the-startupcandlecount)) from the beginning of the desired backtesting timerange. 

 As an example, if we wish to backtest the `--timerange` above of `20210501-20210701`, and we use the example config which sets `train_period_days` to 15. The startup candle count is 40 on a maximum `include_timeframes` of 1h. We would need 20210501 - 15 days - 40 * 1h / 24 hours = 20210414 (16.7 days earlier than the start of the desired training timerange).

@@ -738,7 +738,7 @@ Given a number of data points $N$, and a distance $\varepsilon$, DBSCAN clusters

 ![dbscan](assets/freqai_dbscan.jpg)

-FreqAI uses `sklearn.cluster.DBSCAN` (details are available on scikit-learn's webpage [here](#https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)) with `min_samples` ($N$) taken as double the no. of user-defined features, and `eps` ($\varepsilon$) taken as the longest distance in the *k-distance graph* computed from the nearest neighbors in the pairwise distances of all data points in the feature set.
+FreqAI uses `sklearn.cluster.DBSCAN` (details are available on scikit-learn's webpage [here](#https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)) with `min_samples` ($N$) taken as 1/4 of the no. of time points in the feature set, and `eps` ($\varepsilon$) taken as the elbow point in the *k-distance graph* computed from the nearest neighbors in the pairwise distances of all data points in the feature set.

 ## Additional information

@@ -763,5 +763,5 @@ Code review, software architecture brainstorming:
@xmatthias

 Beta testing and bug reporting:
-@bloodhunter4rc, Salah Lamkadem @ikonx, @ken11o2, @longyu, @paranoidandy, @smidelis, @smarm
+@bloodhunter4rc, Salah Lamkadem @ikonx, @ken11o2, @longyu, @paranoidandy, @smidelis, @smarm,
 Juha Nykänen @suikula, Wagner Costa @wagnercosta
@@ -824,6 +824,8 @@ Options:
 - Merge the dataframe without lookahead bias
 - Forward-fill (optional)

+For a full sample, please refer to the [complete data provider example](#complete-data-provider-sample) below.
+
 All columns of the informative dataframe will be available on the returning dataframe in a renamed fashion:

 !!! Example "Column renaming"
@@ -147,13 +147,16 @@ class FreqtradeBot(LoggingMixin):
        :return: None
        """
        logger.info('Cleaning up modules ...')
+        try:
+            # Wrap db activities in shutdown to avoid problems if database is gone,
+            # and raises further exceptions.
+            if self.config['cancel_open_orders_on_exit']:
+                self.cancel_all_open_orders()

-        if self.config['cancel_open_orders_on_exit']:
-            self.cancel_all_open_orders()
+            self.check_for_open_trades()

-        self.check_for_open_trades()
-
-        self.strategy.ft_bot_cleanup()
+        finally:
+            self.strategy.ft_bot_cleanup()

        self.rpc.cleanup()
        if self.emc:
@@ -296,7 +299,7 @@ class FreqtradeBot(LoggingMixin):
                    pair=trade.pair,
                    amount=trade.amount,
                    is_short=trade.is_short,
-                    open_date=trade.open_date_utc
+                    open_date=trade.date_last_filled_utc
                )
                trade.funding_fees = funding_fees
        else:
@@ -741,10 +744,11 @@ class FreqtradeBot(LoggingMixin):
        fee = self.exchange.get_fee(symbol=pair, taker_or_maker='maker')
        base_currency = self.exchange.get_pair_base_currency(pair)
        open_date = datetime.now(timezone.utc)
-        funding_fees = self.exchange.get_funding_fees(
-            pair=pair, amount=amount, is_short=is_short, open_date=open_date)
+
        # This is a new trade
        if trade is None:
+            funding_fees = self.exchange.get_funding_fees(
+                pair=pair, amount=amount, is_short=is_short, open_date=open_date)
            trade = Trade(
                pair=pair,
                base_currency=base_currency,
@@ -1499,7 +1503,7 @@ class FreqtradeBot(LoggingMixin):
            pair=trade.pair,
            amount=trade.amount,
            is_short=trade.is_short,
-            open_date=trade.open_date_utc,
+            open_date=trade.date_last_filled_utc,
        )
        exit_type = 'exit'
        exit_reason = exit_tag or exit_check.exit_reason
@@ -686,7 +686,7 @@ class Backtesting:
                self.futures_data[trade.pair],
                amount=trade.amount,
                is_short=trade.is_short,
-                open_date=trade.open_date_utc,
+                open_date=trade.date_last_filled_utc,
                close_date=exit_candle_time,
            )

@@ -421,9 +421,10 @@ class Hyperopt:
        preprocessed = self.backtesting.strategy.advise_all_indicators(data)

        # Trim startup period from analyzed dataframe to get correct dates for output.
-        processed = trim_dataframes(preprocessed, self.timerange, self.backtesting.required_startup)
-        self.min_date, self.max_date = get_timerange(processed)
-        return processed
+        trimmed = trim_dataframes(preprocessed, self.timerange, self.backtesting.required_startup)
+        self.min_date, self.max_date = get_timerange(trimmed)
+        # Real trimming will happen as part of backtesting.
+        return preprocessed

    def prepare_hyperopt_data(self) -> None:
        HyperoptStateContainer.set_state(HyperoptState.DATALOAD)
@@ -212,17 +212,18 @@ def migrate_orders_table(engine, table_back_name: str, cols_order: List):
    ft_fee_base = get_column_def(cols_order, 'ft_fee_base', 'null')
    average = get_column_def(cols_order, 'average', 'null')
    stop_price = get_column_def(cols_order, 'stop_price', 'null')
+    funding_fee = get_column_def(cols_order, 'funding_fee', '0.0')

    # sqlite does not support literals for booleans
    with engine.begin() as connection:
        connection.execute(text(f"""
            insert into orders (id, ft_trade_id, ft_order_side, ft_pair, ft_is_open, order_id,
            status, symbol, order_type, side, price, amount, filled, average, remaining, cost,
-            stop_price, order_date, order_filled_date, order_update_date, ft_fee_base)
+            stop_price, order_date, order_filled_date, order_update_date, ft_fee_base, funding_fee)
            select id, ft_trade_id, ft_order_side, ft_pair, ft_is_open, order_id,
            status, symbol, order_type, side, price, amount, filled, {average} average, remaining,
            cost, {stop_price} stop_price, order_date, order_filled_date,
-            order_update_date, {ft_fee_base} ft_fee_base
+            order_update_date, {ft_fee_base} ft_fee_base, {funding_fee} funding_fee
            from {table_back_name}
            """))

@@ -307,9 +308,10 @@ def check_migrate(engine, decl_base, previous_tables) -> None:
    # Check if migration necessary
    # Migrates both trades and orders table!
    # if ('orders' not in previous_tables
-    # or not has_column(cols_orders, 'stop_price')):
+    # or not has_column(cols_orders, 'funding_fee')):
    migrating = False
-    if not has_column(cols_trades, 'contract_size'):
+    # if not has_column(cols_trades, 'contract_size'):
+    if not has_column(cols_orders, 'funding_fee'):
        migrating = True
        logger.info(f"Running database migration for trades - "
                    f"backup: {table_back_name}, {order_table_bak_name}")
@@ -65,6 +65,8 @@ class Order(_DECL_BASE):
    order_filled_date = Column(DateTime, nullable=True)
    order_update_date = Column(DateTime, nullable=True)

+    funding_fee = Column(Float, nullable=True)
+
    ft_fee_base = Column(Float, nullable=True)

    @property
@@ -72,6 +74,13 @@ class Order(_DECL_BASE):
        """ Order-date with UTC timezoneinfo"""
        return self.order_date.replace(tzinfo=timezone.utc)

+    @property
+    def order_filled_utc(self) -> Optional[datetime]:
+        """ last order-date with UTC timezoneinfo"""
+        return (
+            self.order_filled_date.replace(tzinfo=timezone.utc) if self.order_filled_date else None
+        )
+
    @property
    def safe_price(self) -> float:
        return self.average or self.price
@@ -119,6 +128,10 @@ class Order(_DECL_BASE):
        self.ft_is_open = True
        if self.status in NON_OPEN_EXCHANGE_STATES:
            self.ft_is_open = False
+            if self.trade:
+                # Assign funding fee up to this point
+                # (represents the funding fee since the last order)
+                self.funding_fee = self.trade.funding_fees
            if (order.get('filled', 0.0) or 0.0) > 0:
                self.order_filled_date = datetime.now(timezone.utc)
        self.order_update_date = datetime.now(timezone.utc)
@@ -179,6 +192,10 @@ class Order(_DECL_BASE):
        self.remaining = 0
        self.status = 'closed'
        self.ft_is_open = False
+        # Assign funding fees to Order.
+        # Assumes backtesting will use date_last_filled_utc to calculate future funding fees.
+        self.funding_fee = trade.funding_fees
+
        if (self.ft_order_side == trade.entry_side):
            trade.open_rate = self.price
            trade.recalc_trade_from_orders()
@@ -346,6 +363,15 @@ class LocalTrade():
        else:
            return self.amount

+    @property
+    def date_last_filled_utc(self) -> datetime:
+        """ Date of the last filled order"""
+        orders = self.select_filled_orders()
+        if not orders:
+            return self.open_date_utc
+        return max([self.open_date_utc,
+                    max(o.order_filled_utc for o in orders if o.order_filled_utc)])
+
    @property
    def open_date_utc(self):
        return self.open_date.replace(tzinfo=timezone.utc)
@@ -843,10 +869,14 @@ class LocalTrade():
        close_profit = 0.0
        close_profit_abs = 0.0
        profit = None
-        for o in self.orders:
+        # Reset funding fees
+        self.funding_fees = 0.0
+        funding_fees = 0.0
+        ordercount = len(self.orders) - 1
+        for i, o in enumerate(self.orders):
            if o.ft_is_open or not o.filled:
                continue
-
+            funding_fees += (o.funding_fee or 0.0)
            tmp_amount = FtPrecise(o.safe_amount_after_fee)
            tmp_price = FtPrecise(o.safe_price)

@@ -861,7 +891,11 @@ class LocalTrade():
                    avg_price = current_stake / current_amount

            if is_exit:
-                # Process partial exits
+                # Process exits
+                if i == ordercount and is_closing:
+                    # Apply funding fees only to the last closing order
+                    self.funding_fees = funding_fees
+
                exit_rate = o.safe_price
                exit_amount = o.safe_amount_after_fee
                profit = self.calc_profit(rate=exit_rate, amount=exit_amount,
@@ -871,6 +905,7 @@ class LocalTrade():
                    exit_rate, amount=exit_amount, open_rate=avg_price)
            else:
                total_stake = total_stake + self._calc_open_trade_value(tmp_amount, price)
+        self.funding_fees = funding_fees

        if close_profit:
            self.close_profit = close_profit
@@ -261,11 +261,15 @@ class RPC:
                        profit_str += f" ({fiat_profit:.2f})"
                        fiat_profit_sum = fiat_profit if isnan(fiat_profit_sum) \
                            else fiat_profit_sum + fiat_profit
+                open_order = (trade.select_order_by_order_id(
+                    trade.open_order_id) if trade.open_order_id else None)
+
                detail_trade = [
                    f'{trade.id} {direction_str}',
-                    trade.pair + ('*' if (trade.open_order_id is not None
-                                          and trade.close_rate_requested is None) else '')
-                    + ('**' if (trade.close_rate_requested is not None) else ''),
+                    trade.pair + ('*' if (open_order
+                                  and open_order.ft_order_side == trade.entry_side) else '')
+                    + ('**' if (open_order and
+                                open_order.ft_order_side == trade.exit_side is not None) else ''),
                    shorten_date(arrow.get(trade.open_date).humanize(only_distance=True)),
                    profit_str
                ]
@@ -12,7 +12,7 @@ arrow==1.2.3
 cachetools==4.2.2
 requests==2.28.1
 urllib3==1.26.12
-jsonschema==4.14.0
+jsonschema==4.15.0
 TA-Lib==0.4.24
 technical==1.3.0
 tabulate==0.8.10
@@ -615,21 +615,25 @@ def test_calc_open_close_trade_price(
        is_short=is_short,
        leverage=lev,
        trading_mode=trading_mode,
-        funding_fees=funding_fees
    )
    entry_order = limit_order[trade.entry_side]
    exit_order = limit_order[trade.exit_side]
    trade.open_order_id = f'something-{is_short}-{lev}-{exchange}'

    oobj = Order.parse_from_ccxt_object(entry_order, 'ADA/USDT', trade.entry_side)
-    trade.orders.append(oobj)
+    oobj.trade = trade
+    oobj.update_from_ccxt_object(entry_order)
    trade.update_trade(oobj)

+    trade.funding_fees = funding_fees
+
    oobj = Order.parse_from_ccxt_object(exit_order, 'ADA/USDT', trade.exit_side)
-    trade.orders.append(oobj)
+    oobj.trade = trade
+    oobj.update_from_ccxt_object(exit_order)
    trade.update_trade(oobj)

    assert trade.is_open is False
+    assert trade.funding_fees == funding_fees

    assert pytest.approx(trade._calc_open_trade_value(trade.amount, trade.open_rate)) == open_value
    assert pytest.approx(trade.calc_close_trade_value(trade.close_rate)) == close_value