Merge pull request #7458 from xmatthias/dataformat/feather

new Dataformats feather and Parquet
This commit is contained in:
Matthias
2022-09-25 09:00:22 +02:00
committed by GitHub
11 changed files with 433 additions and 38 deletions

View File

@@ -26,7 +26,7 @@ usage: freqtrade download-data [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[--timerange TIMERANGE] [--dl-trades]
[--exchange EXCHANGE]
[-t TIMEFRAMES [TIMEFRAMES ...]] [--erase]
[--data-format-ohlcv {json,jsongz,hdf5}]
[--data-format-ohlcv {json,jsongz,hdf5,feather,parquet}]
[--data-format-trades {json,jsongz,hdf5}]
[--trading-mode {spot,margin,futures}]
[--prepend]
@@ -55,7 +55,7 @@ optional arguments:
list. Default: `1m 5m`.
--erase Clean all existing data for the selected
exchange/pairs/timeframes.
--data-format-ohlcv {json,jsongz,hdf5}
--data-format-ohlcv {json,jsongz,hdf5,feather,parquet}
Storage format for downloaded candle (OHLCV) data.
(default: `json`).
--data-format-trades {json,jsongz,hdf5}
@@ -76,7 +76,7 @@ Common arguments:
`userdir/config.json` or `config.json` whichever
exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH
-d PATH, --datadir PATH, --data-dir PATH
Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH
Path to userdata directory.
@@ -179,9 +179,11 @@ freqtrade download-data --exchange binance --pairs ETH/USDT XRP/USDT BTC/USDT --
Freqtrade currently supports 3 data-formats for both OHLCV and trades data:
* `json` (plain "text" json files)
* `jsongz` (a gzip-zipped version of json files)
* `hdf5` (a high performance datastore)
* `json` - plain "text" json files
* `jsongz` - a gzip-zipped version of json files
* `hdf5` - a high performance datastore
* `feather` - a dataformat based on Apache Arrow
* `parquet` - columnar datastore
By default, OHLCV data is stored as `json` data, while trades data is stored as `jsongz` data.
@@ -200,38 +202,74 @@ If the default data-format has been changed during download, then the keys `data
!!! Note
You can convert between data-formats using the [convert-data](#sub-command-convert-data) and [convert-trade-data](#sub-command-convert-trade-data) methods.
#### Dataformat comparison
The following comparisons have been made with the following data, and by using the linux `time` command.
```
Found 6 pair / timeframe combinations.
+----------+-------------+--------+---------------------+---------------------+
| Pair | Timeframe | Type | From | To |
|----------+-------------+--------+---------------------+---------------------|
| BTC/USDT | 5m | spot | 2017-08-17 04:00:00 | 2022-09-13 19:25:00 |
| ETH/USDT | 1m | spot | 2017-08-17 04:00:00 | 2022-09-13 19:26:00 |
| BTC/USDT | 1m | spot | 2017-08-17 04:00:00 | 2022-09-13 19:30:00 |
| XRP/USDT | 5m | spot | 2018-05-04 08:10:00 | 2022-09-13 19:15:00 |
| XRP/USDT | 1m | spot | 2018-05-04 08:11:00 | 2022-09-13 19:22:00 |
| ETH/USDT | 5m | spot | 2017-08-17 04:00:00 | 2022-09-13 19:20:00 |
+----------+-------------+--------+---------------------+---------------------+
```
Timings have been taken in a not very scientific way with the following command, which forces reading the data into memory.
``` bash
time freqtrade list-data --show-timerange --data-format-ohlcv <dataformat>
```
| Format | Size | timing |
|------------|-------------|-------------|
| `json` | 149Mb | 25.6s |
| `jsongz` | 39Mb | 27s |
| `hdf5` | 145Mb | 3.9s |
| `feather` | 72Mb | 3.5s |
| `parquet` | 83Mb | 3.8s |
Size has been taken from the BTC/USDT 1m spot combination for the timerange specified above.
To have a best performance/size mix, we recommend the use of either feather or parquet.
#### Sub-command convert data
```
usage: freqtrade convert-data [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[-d PATH] [--userdir PATH]
[-p PAIRS [PAIRS ...]] --format-from
{json,jsongz,hdf5} --format-to
{json,jsongz,hdf5} [--erase]
[-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} ...]]
{json,jsongz,hdf5,feather,parquet} --format-to
{json,jsongz,hdf5,feather,parquet} [--erase]
[--exchange EXCHANGE]
[-t TIMEFRAMES [TIMEFRAMES ...]]
[--trading-mode {spot,margin,futures}]
[--candle-types {spot,,futures,mark,index,premiumIndex,funding_rate} [{spot,,futures,mark,index,premiumIndex,funding_rate} ...]]
[--candle-types {spot,futures,mark,index,premiumIndex,funding_rate} [{spot,futures,mark,index,premiumIndex,funding_rate} ...]]
optional arguments:
-h, --help show this help message and exit
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Limit command to these pairs. Pairs are space-
separated.
--format-from {json,jsongz,hdf5}
--format-from {json,jsongz,hdf5,feather,parquet}
Source format for data conversion.
--format-to {json,jsongz,hdf5}
--format-to {json,jsongz,hdf5,feather,parquet}
Destination format for data conversion.
--erase Clean all existing data for the selected
exchange/pairs/timeframes.
-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} ...], --timeframes {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} ...]
Specify which tickers to download. Space-separated
list. Default: `1m 5m`.
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no
config is provided.
--trading-mode {spot,margin,futures}
-t TIMEFRAMES [TIMEFRAMES ...], --timeframes TIMEFRAMES [TIMEFRAMES ...]
Specify which tickers to download. Space-separated
list. Default: `1m 5m`.
--trading-mode {spot,margin,futures}, --tradingmode {spot,margin,futures}
Select Trading mode
--candle-types {spot,,futures,mark,index,premiumIndex,funding_rate} [{spot,,futures,mark,index,premiumIndex,funding_rate} ...]
--candle-types {spot,futures,mark,index,premiumIndex,funding_rate} [{spot,futures,mark,index,premiumIndex,funding_rate} ...]
Select candle type to use
Common arguments:
@@ -245,7 +283,7 @@ Common arguments:
`userdir/config.json` or `config.json` whichever
exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH
-d PATH, --datadir PATH, --data-dir PATH
Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH
Path to userdata directory.
@@ -267,20 +305,24 @@ freqtrade convert-data --format-from json --format-to jsongz --datadir ~/.freqtr
usage: freqtrade convert-trade-data [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[-d PATH] [--userdir PATH]
[-p PAIRS [PAIRS ...]] --format-from
{json,jsongz,hdf5} --format-to
{json,jsongz,hdf5} [--erase]
{json,jsongz,hdf5,feather,parquet}
--format-to
{json,jsongz,hdf5,feather,parquet}
[--erase] [--exchange EXCHANGE]
optional arguments:
-h, --help show this help message and exit
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Show profits for only these pairs. Pairs are space-
Limit command to these pairs. Pairs are space-
separated.
--format-from {json,jsongz,hdf5}
--format-from {json,jsongz,hdf5,feather,parquet}
Source format for data conversion.
--format-to {json,jsongz,hdf5}
--format-to {json,jsongz,hdf5,feather,parquet}
Destination format for data conversion.
--erase Clean all existing data for the selected
exchange/pairs/timeframes.
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no
config is provided.
Common arguments:
-v, --verbose Verbose mode (-vv for more, -vvv to get all messages).
@@ -293,7 +335,7 @@ Common arguments:
`userdir/config.json` or `config.json` whichever
exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH
-d PATH, --datadir PATH, --data-dir PATH
Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH
Path to userdata directory.
@@ -318,9 +360,9 @@ This command will allow you to repeat this last step for additional timeframes w
usage: freqtrade trades-to-ohlcv [-h] [-v] [--logfile FILE] [-V] [-c PATH]
[-d PATH] [--userdir PATH]
[-p PAIRS [PAIRS ...]]
[-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} ...]]
[-t TIMEFRAMES [TIMEFRAMES ...]]
[--exchange EXCHANGE]
[--data-format-ohlcv {json,jsongz,hdf5}]
[--data-format-ohlcv {json,jsongz,hdf5,feather,parquet}]
[--data-format-trades {json,jsongz,hdf5}]
optional arguments:
@@ -328,12 +370,12 @@ optional arguments:
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Limit command to these pairs. Pairs are space-
separated.
-t {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} ...], --timeframes {1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} [{1m,3m,5m,15m,30m,1h,2h,4h,6h,8h,12h,1d,3d,1w,2w,1M,1y} ...]
-t TIMEFRAMES [TIMEFRAMES ...], --timeframes TIMEFRAMES [TIMEFRAMES ...]
Specify which tickers to download. Space-separated
list. Default: `1m 5m`.
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no
config is provided.
--data-format-ohlcv {json,jsongz,hdf5}
--data-format-ohlcv {json,jsongz,hdf5,feather,parquet}
Storage format for downloaded candle (OHLCV) data.
(default: `json`).
--data-format-trades {json,jsongz,hdf5}
@@ -351,7 +393,7 @@ Common arguments:
`userdir/config.json` or `config.json` whichever
exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH
-d PATH, --datadir PATH, --data-dir PATH
Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH
Path to userdata directory.
@@ -371,7 +413,7 @@ You can get a list of downloaded data using the `list-data` sub-command.
```
usage: freqtrade list-data [-h] [-v] [--logfile FILE] [-V] [-c PATH] [-d PATH]
[--userdir PATH] [--exchange EXCHANGE]
[--data-format-ohlcv {json,jsongz,hdf5}]
[--data-format-ohlcv {json,jsongz,hdf5,feather,parquet}]
[-p PAIRS [PAIRS ...]]
[--trading-mode {spot,margin,futures}]
[--show-timerange]
@@ -380,13 +422,13 @@ optional arguments:
-h, --help show this help message and exit
--exchange EXCHANGE Exchange name (default: `bittrex`). Only valid if no
config is provided.
--data-format-ohlcv {json,jsongz,hdf5}
--data-format-ohlcv {json,jsongz,hdf5,feather,parquet}
Storage format for downloaded candle (OHLCV) data.
(default: `json`).
-p PAIRS [PAIRS ...], --pairs PAIRS [PAIRS ...]
Limit command to these pairs. Pairs are space-
separated.
--trading-mode {spot,margin,futures}
--trading-mode {spot,margin,futures}, --tradingmode {spot,margin,futures}
Select Trading mode
--show-timerange Show timerange available for available data. (May take
a while to calculate).
@@ -402,7 +444,7 @@ Common arguments:
`userdir/config.json` or `config.json` whichever
exists). Multiple --config options may be used. Can be
set to `-` to read config from stdin.
-d PATH, --datadir PATH
-d PATH, --datadir PATH, --data-dir PATH
Path to directory with historical backtesting data.
--userdir PATH, --user-data-dir PATH
Path to userdata directory.