Document hdf5 dataformat

This commit is contained in:
Matthias 2020-07-25 17:06:58 +02:00
parent edb582e522
commit 119bf2a8ea
3 changed files with 84 additions and 44 deletions

View File

@ -15,61 +15,91 @@ Otherwise `--exchange` becomes mandatory.
### Usage ### Usage
``` ```
usage: freqtrade download-data [-h] [-v] [--logfile FILE] [-V] [-c PATH] [-d PATH] [--userdir PATH] [-p PAIRS [PAIRS ...]] usage: freqtrade download-data [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[--pairs-file FILE] [--days INT] [--dl-trades] [--exchange EXCHANGE] [-d PATH] [--userdir PATH]
[-p PAIRS [PAIRS ...]] [--pairs-file FILE]
[--days INT] [--dl-trades]
[--exchange EXCHANGE]
[-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...]] [-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...]]
[--erase] [--data-format-ohlcv {json,jsongz}] [--data-format-trades {json,jsongz}] [--erase]
[--data-format-ohlcv {json,jsongz,hdf5}]
[--data-format-trades {json,jsongz,hdf5}]
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...] -p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Show profits for only these pairs. Pairs are space-separated. Show profits for only these pairs. Pairs are space-
separated.
--pairs-file FILE File containing a list of pairs to download. --pairs-file FILE File containing a list of pairs to download.
--days INT Download data for given number of days. --days INT Download data for given number of days.
--dl-trades Download trades instead of OHLCV data. The bot will resample trades to the desired timeframe as specified as --dl-trades Download trades instead of OHLCV data. The bot will
--timeframes/-t. resample trades to the desired timeframe as specified
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no config is provided. as --timeframes/-t.
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no
config is provided.
-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...], --timeframes {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...] -t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...], --timeframes {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...]
Specify which tickers to download. Space-separated list. Default: `1m 5m`. Specify which tickers to download. Space-separated
--erase Clean all existing data for the selected exchange/pairs/timeframes. list. Default: `1m 5m`.
--data-format-ohlcv {json,jsongz} --erase Clean all existing data for the selected
Storage format for downloaded candle (OHLCV) data. (default: `json`). exchange/pairs/timeframes.
--data-format-trades {json,jsongz} --data-format-ohlcv {json,jsongz,hdf5}
Storage format for downloaded trades data. (default: `jsongz`). Storage format for downloaded candle (OHLCV) data.
(default: `json`).
--data-format-trades {json,jsongz,hdf5}
Storage format for downloaded trades data. (default:
`jsongz`).
Common arguments: Common arguments:
-v, --verbose Verbose mode (-vv for more, -vvv to get all messages). -v, --verbose Verbose mode (-vv for more, -vvv to get all messages).
--logfile FILE Log to the file specified. Special values are: 'syslog', 'journald'. See the documentation for more details. --logfile FILE Log to the file specified. Special values are:
'syslog', 'journald'. See the documentation for more
details.
-V, --version show program's version number and exit -V, --version show program's version number and exit
-c PATH, --config PATH -c PATH, --config PATH
Specify configuration file (default: `config.json`). Multiple --config options may be used. Can be set to `-` Specify configuration file (default:
to read config from stdin. `userdir/config.json` or `config.json` whichever
exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH -d PATH, --datadir PATH
Path to directory with historical backtesting data. Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH --userdir PATH, --user-data-dir PATH
Path to userdata directory. Path to userdata directory.
``` ```
### Data format ### Data format
Freqtrade currently supports 2 dataformats, `json` (plain "text" json files) and `jsongz` (a gzipped version of json files). Freqtrade currently supports 3 data-formats for both OHLCV and trades data:
* `json` (plain "text" json files)
* `jsongz` (a gzip-zipped version of json files)
* `hdf5` (a high performance datastore)
By default, OHLCV data is stored as `json` data, while trades data is stored as `jsongz` data. By default, OHLCV data is stored as `json` data, while trades data is stored as `jsongz` data.
This can be changed via the `--data-format-ohlcv` and `--data-format-trades` parameters respectivly. This can be changed via the `--data-format-ohlcv` and `--data-format-trades` command line arguments respectively.
To persist this change, you can should also add the following snippet to your configuration, so you don't have to insert the above arguments each time:
If the default dataformat has been changed during download, then the keys `dataformat_ohlcv` and `dataformat_trades` in the configuration file need to be adjusted to the selected dataformat as well. ``` jsonc
// ...
"dataformat_ohlcv": "hdf5",
"dataformat_trades": "hdf5",
// ...
```
If the default data-format has been changed during download, then the keys `dataformat_ohlcv` and `dataformat_trades` in the configuration file need to be adjusted to the selected dataformat as well.
!!! Note !!! Note
You can convert between data-formats using the [convert-data](#subcommand-convert-data) and [convert-trade-data](#subcommand-convert-trade-data) methods. You can convert between data-formats using the [convert-data](#sub-command-convert-data) and [convert-trade-data](#sub-command-convert-trade-data) methods.
#### Subcommand convert data #### Sub-command convert data
``` ```
usage: freqtrade convert-data [-h] [-v] [--logfile FILE] [-V] [-c PATH] usage: freqtrade convert-data [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[-d PATH] [--userdir PATH] [-d PATH] [--userdir PATH]
[-p PAIRS [PAIRS ...]] --format-from [-p PAIRS [PAIRS ...]] --format-from
{json,jsongz} --format-to {json,jsongz} {json,jsongz,hdf5} --format-to
[--erase] {json,jsongz,hdf5} [--erase]
[-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...]] [-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w} ...]]
optional arguments: optional arguments:
@ -77,9 +107,9 @@ optional arguments:
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...] -p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Show profits for only these pairs. Pairs are space- Show profits for only these pairs. Pairs are space-
separated. separated.
--format-from {json,jsongz} --format-from {json,jsongz,hdf5}
Source format for data conversion. Source format for data conversion.
--format-to {json,jsongz} --format-to {json,jsongz,hdf5}
Destination format for data conversion. Destination format for data conversion.
--erase Clean all existing data for the selected --erase Clean all existing data for the selected
exchange/pairs/timeframes. exchange/pairs/timeframes.
@ -94,9 +124,10 @@ Common arguments:
details. details.
-V, --version show program's version number and exit -V, --version show program's version number and exit
-c PATH, --config PATH -c PATH, --config PATH
Specify configuration file (default: `config.json`). Specify configuration file (default:
Multiple --config options may be used. Can be set to `userdir/config.json` or `config.json` whichever
`-` to read config from stdin. exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH -d PATH, --datadir PATH
Path to directory with historical backtesting data. Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH --userdir PATH, --user-data-dir PATH
@ -112,23 +143,23 @@ It'll also remove original json data files (`--erase` parameter).
freqtrade convert-data --format-from json --format-to jsongz --datadir ~/.freqtrade/data/binance -t 5m 15m --erase freqtrade convert-data --format-from json --format-to jsongz --datadir ~/.freqtrade/data/binance -t 5m 15m --erase
``` ```
#### Subcommand convert-trade data #### Sub-command convert trade data
``` ```
usage: freqtrade convert-trade-data [-h] [-v] [--logfile FILE] [-V] [-c PATH] usage: freqtrade convert-trade-data [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[-d PATH] [--userdir PATH] [-d PATH] [--userdir PATH]
[-p PAIRS [PAIRS ...]] --format-from [-p PAIRS [PAIRS ...]] --format-from
{json,jsongz} --format-to {json,jsongz} {json,jsongz,hdf5} --format-to
[--erase] {json,jsongz,hdf5} [--erase]
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...] -p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Show profits for only these pairs. Pairs are space- Show profits for only these pairs. Pairs are space-
separated. separated.
--format-from {json,jsongz} --format-from {json,jsongz,hdf5}
Source format for data conversion. Source format for data conversion.
--format-to {json,jsongz} --format-to {json,jsongz,hdf5}
Destination format for data conversion. Destination format for data conversion.
--erase Clean all existing data for the selected --erase Clean all existing data for the selected
exchange/pairs/timeframes. exchange/pairs/timeframes.
@ -140,13 +171,15 @@ Common arguments:
details. details.
-V, --version show program's version number and exit -V, --version show program's version number and exit
-c PATH, --config PATH -c PATH, --config PATH
Specify configuration file (default: `config.json`). Specify configuration file (default:
Multiple --config options may be used. Can be set to `userdir/config.json` or `config.json` whichever
`-` to read config from stdin. exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH -d PATH, --datadir PATH
Path to directory with historical backtesting data. Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH --userdir PATH, --user-data-dir PATH
Path to userdata directory. Path to userdata directory.
``` ```
##### Example converting trades ##### Example converting trades
@ -158,21 +191,21 @@ It'll also remove original jsongz data files (`--erase` parameter).
freqtrade convert-trade-data --format-from jsongz --format-to json --datadir ~/.freqtrade/data/kraken --erase freqtrade convert-trade-data --format-from jsongz --format-to json --datadir ~/.freqtrade/data/kraken --erase
``` ```
### Subcommand list-data ### Sub-command list-data
You can get a list of downloaded data using the `list-data` subcommand. You can get a list of downloaded data using the `list-data` sub-command.
``` ```
usage: freqtrade list-data [-h] [-v] [--logfile FILE] [-V] [-c PATH] [-d PATH] usage: freqtrade list-data [-h] [-v] [--logfile FILE] [-V] [-c PATH] [-d PATH]
[--userdir PATH] [--exchange EXCHANGE] [--userdir PATH] [--exchange EXCHANGE]
[--data-format-ohlcv {json,jsongz}] [--data-format-ohlcv {json,jsongz,hdf5}]
[-p PAIRS [PAIRS ...]] [-p PAIRS [PAIRS ...]]
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no --exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no
config is provided. config is provided.
--data-format-ohlcv {json,jsongz} --data-format-ohlcv {json,jsongz,hdf5}
Storage format for downloaded candle (OHLCV) data. Storage format for downloaded candle (OHLCV) data.
(default: `json`). (default: `json`).
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...] -p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
@ -194,6 +227,7 @@ Common arguments:
Path to directory with historical backtesting data. Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH --userdir PATH, --user-data-dir PATH
Path to userdata directory. Path to userdata directory.
``` ```
#### Example list-data #### Example list-data
@ -257,7 +291,7 @@ This will download historical candle (OHLCV) data for all the currency pairs you
### Trades (tick) data ### Trades (tick) data
By default, `download-data` subcommand downloads Candles (OHLCV) data. Some exchanges also provide historic trade-data via their API. By default, `download-data` sub-command downloads Candles (OHLCV) data. Some exchanges also provide historic trade-data via their API.
This data can be useful if you need many different timeframes, since it is only downloaded once, and then resampled locally to the desired timeframes. This data can be useful if you need many different timeframes, since it is only downloaded once, and then resampled locally to the desired timeframes.
Since this data is large by default, the files use gzip by default. They are stored in your data-directory with the naming convention of `<pair>-trades.json.gz` (`ETH_BTC-trades.json.gz`). Incremental mode is also supported, as for historic OHLCV data, so downloading the data once per week with `--days 8` will create an incremental data-repository. Since this data is large by default, the files use gzip by default. They are stored in your data-directory with the naming convention of `<pair>-trades.json.gz` (`ETH_BTC-trades.json.gz`). Incremental mode is also supported, as for historic OHLCV data, so downloading the data once per week with `--days 8` will create an incremental data-repository.

View File

@ -59,6 +59,7 @@ class HDF5DataHandler(IDataHandler):
_data = data.copy() _data = data.copy()
filename = self._pair_data_filename(self._datadir, pair, timeframe) filename = self._pair_data_filename(self._datadir, pair, timeframe)
ds = pd.HDFStore(filename, mode='a', complevel=9, complib='blosc') ds = pd.HDFStore(filename, mode='a', complevel=9, complib='blosc')
ds.put(key, _data.loc[:, self._columns], format='table', data_columns=['date']) ds.put(key, _data.loc[:, self._columns], format='table', data_columns=['date'])
@ -139,6 +140,7 @@ class HDF5DataHandler(IDataHandler):
column sequence as in DEFAULT_TRADES_COLUMNS column sequence as in DEFAULT_TRADES_COLUMNS
""" """
key = self._pair_trades_key(pair) key = self._pair_trades_key(pair)
ds = pd.HDFStore(self._pair_trades_filename(self._datadir, pair), ds = pd.HDFStore(self._pair_trades_filename(self._datadir, pair),
mode='a', complevel=9, complib='blosc') mode='a', complevel=9, complib='blosc')
ds.put(key, pd.DataFrame(data, columns=DEFAULT_TRADES_COLUMNS), ds.put(key, pd.DataFrame(data, columns=DEFAULT_TRADES_COLUMNS),

View File

@ -9,7 +9,8 @@ from pandas import DataFrame
from freqtrade.configuration import TimeRange from freqtrade.configuration import TimeRange
from freqtrade.constants import DEFAULT_DATAFRAME_COLUMNS from freqtrade.constants import DEFAULT_DATAFRAME_COLUMNS
from freqtrade.data.converter import (ohlcv_to_dataframe, from freqtrade.data.converter import (clean_ohlcv_dataframe,
ohlcv_to_dataframe,
trades_remove_duplicates, trades_remove_duplicates,
trades_to_ohlcv) trades_to_ohlcv)
from freqtrade.data.history.idatahandler import IDataHandler, get_datahandler from freqtrade.data.history.idatahandler import IDataHandler, get_datahandler
@ -202,7 +203,10 @@ def _download_pair_history(datadir: Path,
if data.empty: if data.empty:
data = new_dataframe data = new_dataframe
else: else:
data = data.append(new_dataframe) # Run cleaning again to ensure there were no duplicate candles
# Especially between existing and new data.
data = clean_ohlcv_dataframe(data.append(new_dataframe), timeframe, pair,
fill_missing=False, drop_incomplete=False)
logger.debug("New Start: %s", logger.debug("New Start: %s",
f"{data.iloc[0]['date']:%Y-%m-%d %H:%M:%S}" if not data.empty else 'None') f"{data.iloc[0]['date']:%Y-%m-%d %H:%M:%S}" if not data.empty else 'None')