Statement of need

Journal of Open Source Software

JOSS

2475-9066

Open Journals

N/A

FreqAI: generalizing adaptive modeling for chaotic time-series market forecasts

0000-0001-5618-8629

Ph.D

Robert A. Caulk

0000-0003-3289-8604

Ph.D

Elin Törnquist

Voppichler

Matthias

Lawless

Andrew R.

McMullan

Ryan

Santos

Wagner Costa

Pogue

Timothy C.

van der Vlugt

Johan

Gehring

Stefan P.

Schmidt

Pascal

Emergent Methods LLC, Arvada Colorado, 80005, USA

Freqtrade open source project

¿VOL? ¿ISSUE? ¿PAGE?

Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0)

2022

The article authors

Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0)

Python Machine Learning adaptive modeling chaotic systems time-series forecasting

Statement of need

Forecasting chaotic time-series based systems, such as equity/cryptocurrency markets, requires a broad set of tools geared toward testing a wide range of hypotheses. Fortunately, a recent maturation of robust machine learning libraries (e.g. scikit-learn), has opened up a wide range of research possibilities. Scientists from a diverse range of fields can now easily prototype their studies on an abundance of established machine learning algorithms. Similarly, these user-friendly libraries enable “citzen scientists” to use their basic Python skills for data-exploration. However, leveraging these machine learning libraries on historical and live chaotic data sources can be logistically difficult and expensive. Additionally, robust data-collection, storage, and handling presents a disparate challenge. FreqAI aims to provide a generalized and extensible open-sourced framework geared toward live deployments of adaptive modeling for market forecasting. The FreqAI framework is effectively a sandbox for the rich world of open-source machine learning libraries. Inside the FreqAI sandbox, users find they can combine a wide variety of third-party libraries to test creative hypotheses on a free live 24/7 chaotic data source - cryptocurrency exchange data.

Summary

FreqAI evolved from a desire to test and compare a range of adaptive time-series forecasting methods on chaotic data. Cryptocurrency markets provide a unique data source since they are operational 24/7 and the data is freely available. Luckily, an existing open-source software, Freqtrade, had already matured under a range of talented developers to support robust data collection/storage, as well as robust live environmental interactions for standard algorithmic trading. Freqtrade also provides a set of data analysis/visualization tools for the evaluation of historical performance as well as live environmental feedback. FreqAI builds on top of Freqtrade to include a user-friendly well tested interface for integrating external machine learning libraries for adaptive time-series forecasting. Beyond enabling the integration of existing libraries, FreqAI hosts a range of custom algorithms and methodologies aimed at improving computational and predictive performances. Thus, FreqAI contains a range of unique features which can be easily tested in combination with all the existing Python-accessible machine learning libraries to generate novel research on live and historical data.

The high-level overview of the software is depicted in Figure 1.

freqai-algo Abstracted overview of FreqAI algorithm

Connecting machine learning libraries

Although the FreqAI framework is designed to accommodate any Python library in the “Model training” and “Feature set engineering” portions of the software (Figure 1), it already boasts a wide range of well documented examples based on various combinations of:

scikit-learn (Pedregosa et al., 2011), Catboost (Prokhorenkova et al., 2018), LightGBM (Ke et al., 2017), XGBoost (Chen & Guestrin, 2016), stable_baselines3 (Raffin et al., 2021), openai gym (Brockman et al., 2016), tensorflow (Abadi et al., 2015), pytorch (Paszke et al., 2019), Scipy (Virtanen et al., 2020), Numpy (Harris et al., 2020), and pandas (McKinney & others, 2010).

These mature projects contain a wide range of peer-reviewed and industry standard methods, including:

Regression, Classification, Neural Networks, Reinforcement Learning, Support Vector Machines, Principal Component Analysis, point clustering, and much more.

which are all leveraged in FreqAI for users to use as templates or extend with their own methods.

Furnishing novel methods and features

Beyond the industry standard methods available through external libraries - FreqAI includes novel methods which are not available anywhere else in the open-source (or scientific) world. For example, FreqAI provides :

a custom algorithm/methodology for adaptive modeling

rapid and self-monitored feature engineering tools

unique model features/indicators

optimized data collection algorithms

safely integrated outlier detection methods

websocket communicated forecasts

Of particular interest for researchers, FreqAI provides the option of large scale experimentation via an optimized websocket communications interface.

Optimizing the back-end

FreqAI aims to make it simple for users to combine all the above tools to run studies based in two distinct modules:

backtesting studies

live-deployments

Both of these modules and their respective data management systems are built on top of Freqtrade, a mature and actively developed cryptocurrency trading software. This means that FreqAI benefits from a wide range of tangential/disparate feature developments such as:

FreqUI, a graphical interface for backtesting and live monitoring

telegram control

robust database handling

futures/leverage trading

dollar cost averaging

trading strategy handling

a variety of free data sources via CCXT (FTX, Binance, Kucoin etc.)

These features derive from a strong external developer community that shares in the benefit and stability of a communal CI (Continuous Integration) system. Beyond the developer community, FreqAI benefits strongly from the userbase of Freqtrade, where most FreqAI beta-testers/developers originated. This symbiotic relationship between Freqtrade and FreqAI ignited a thoroughly tested beta, which demanded a four month beta and comprehensive documentation containing:

numerous example scripts

a full parameter table

methodological descriptions

high-resolution diagrams/figures

detailed parameter setting recommendations

Providing a reproducible foundation for researchers

FreqAI provides an extensible, robust, framework for researchers and citizen data scientists. The FreqAI sandbox enables rapid conception and testing of exotic hypotheses. From a research perspective, FreqAI handles the multitude of logistics associated with live deployments, historical backtesting, and feature engineering. With FreqAI, researchers can focus on their primary interests of feature engineering and hypothesis testing rather than figuring out how to collect and handle data. Further - the well maintained and easily installed open-source framework of FreqAI enables reproducible scientific studies. This reproducibility component is essential to general scientific advancement in time-series forecasting for chaotic systems.

Technical details

Typical users configure FreqAI via two files:

A configuration file (--config) which provides access to the full parameter list available here:

control high-level feature engineering

customize adaptive modeling techniques

set any model training parameters available in third-party libraries

manage adaptive modeling parameters (retrain frequency, training window size, continual learning, etc.)

A strategy file (--strategy) where users:

list of the base training features

set standard technical-analysis strategies

control trade entry/exit criteria

With these two files, most users can exploit a wide range of pre-existing integrations in Catboost and 7 other libraries with a simple command:

freqtrade trade --config config_freqai.example.json --strategy FreqaiExampleStrategy --freqaimodel CatboostRegressor

Advanced users will edit one of the existing --freqaimodel files, which are simply an children of the IFreqaiModel (details below). Within these files, advanced users can customize training procedures, prediction procedures, outlier detection methods, data preparation, data saving methods, etc. This is all configured in a way where they can customize as little or as much as they want. This flexible customization is owed to the foundational architecture in FreqAI, which is comprised of three distinct Python objects:

IFreqaiModel

A singular long-lived object containing all the necessary logic to collect data, store data, process data, engineer features, run training, and inference models.

FreqaiDataKitchen

A short-lived object which is uniquely created for each asset/model. Beyond metadata, it also contains a variety of data processing tools.

FreqaiDataDrawer

Singular long-lived object containing all the historical predictions, models, and save/load methods.

These objects interact with one another with one goal in mind - to provide a clean data set to machine learning experts/enthusiasts at the user endpoint. These power-users interact with an inherited IFreqaiModel that allows them to dig as deep or as shallow as they wish into the inheritence tree. Typical power-users focus their efforts on customizing training procedures and testing exotic functionalities available in third-party libraries. Thus, power-users are freed from the algorithmic weight associated with data management, and can instead focus their energy on testing creative hypotheses. Meanwhile, some users choose to override deeper functionalities within IFreqaiModel to help them craft unique data structures and training procedures.

The class structure and algorithmic details are depicted in the following diagram:

image Class diagram summarizing object interactions in FreqAI

Online documentation

The documentation for FreqAI is available online at https://www.freqtrade.io/en/latest/freqai/ and covers a wide range of materials:

Quick-start with a single command and example files - (beginners)

Introduction to the feature engineering interface and basic configurations - (intermediate users)

Parameter table with indepth descriptions and default parameter setting recommendations - (intermediate users)

Data analysis and post-processing - (advanced users)

Methodological considerations complemented by high resolution figures - (advanced users)

Instructions for integrating third party machine learning libraries into custom prediction models - (advanced users)

Software architectural description with class diagram - (developers)

File structure descriptions - (developers)

The docs direct users to a variety of pre-made examples which integrate Catboost, LightGBM, XGBoost, Sklearn, stable_baselines3, torch, tensorflow. Meanwhile, developers will also find thorough docstrings and type hinting throughout the source code to aid in code readability and customization.

FreqAI also benefits from a strong support network of users and developers on the Freqtrade discord as well as on the FreqAI discord. Within the FreqAI discord, users will find a deep and easily searched knowledge base containing common errors. But more importantly, users in the FreqAI discord share anectdotal and quantitative observations which compare performance between various third-party libraries and methods.

State of the field

There are two other open-source tools which are geared toward helping users build models for time-series forecasts on market based data. However, each of these tools suffer from a non-generalized frameworks that do not permit comparison of methods and libraries. Additionally, they do not permit easy live-deployments or adaptive-modeling methods. For example, two open-sourced projects called tensortrade (Tensortrade, 2022) and FinRL (AI4Finance-Foundation, 2022) limit users to the exploration of reinforcement learning on historical data. These softwares also do not provide robust live deployments, they do not furnish novel feature engineering algorithms, and they do not provide custom data analysis tools. FreqAI fills the gap.

On-going research

Emergent Methods, based in Arvada CO, is actively using FreqAI to perform large scale experiments aimed at comparing machine learning libraries in live and historical environments. Past projects include backtesting parametric sweeps, while active projects include a 3 week live deployment comparison between CatboosRegressor, LightGBMRegressor, and XGBoostRegressor. Results from these studies are on track for publication in scientific journals as well as more general data science blogs (e.g. Medium).

Installing and running <monospace>FreqAI</monospace>

FreqAI is automatically installed with Freqtrade using the following commands on linux systems:

git clone git@github.com:freqtrade/freqtrade.git cd freqtrade ./setup.sh -i

However, FreqAI also benefits from Freqtrade docker distributions, and can be run with docker by pulling the stable or develop images from Freqtrade distributions.

Funding sources

FreqAI has had no official sponsors, and is entirely grass roots. All donations into the project (e.g. the GitHub sponsor system) are kept inside the project to help support development of open-sourced and communally beneficial features.

Acknowledgements

We would like to acknowledge various beta testers of FreqAI:

Richárd Józsa

Juha Nykänen

Salah Lamkadem

As well as various Freqtrade developers maintaining tangential, yet essential, modules.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

Vanderplas

Passos

Cournapeau

Brucher

Perrot

Duchesnay

Scikit-learn: Machine learning in Python

Journal of Machine Learning Research 2011 12 2825 2830

Prokhorenkova

Liudmila

Gusev

Gleb

Vorobev

Aleksandr

Dorogush

Anna Veronika

Gulin

Andrey

CatBoost: Unbiased boosting with categorical features

Proceedings of the 32nd international conference on neural information processing systems Curran Associates Inc.

Red Hook, NY, USA

2018 6639 6649

Guolin

Meng

Finley

Thomas

Wang

Taifeng

Chen

Wei

Weidong

Qiwei

Liu

Tie-Yan

Lightgbm: A highly efficient gradient boosting decision tree

Advances in neural information processing systems 2017 30 3146 3154

Chen

Tianqi

Guestrin

Carlos

XGBoost: A scalable tree boosting system

Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining ACM

New York, NY, USA

2016 978-1-4503-4232-2 http://doi.acm.org/10.1145/2939672.2939785

10.1145/2939672.2939785

785 794

Raffin

Antonin

Hill

Ashley

Gleave

Adam

Kanervisto

Anssi

Ernestus

Maximilian

Dormann

Noah

Stable-Baselines3: Reliable reinforcement learning implementations

Journal of Machine Learning Research 2021 22 268 http://jmlr.org/papers/v22/20-1364.html 1 8

Brockman

Greg

Cheung

Vicki

Pettersson

Ludwig

Schneider

Jonas

Schulman

John

Tang

Jie

Zaremba

Wojciech

OpenAI gym

2016 https://arxiv.org/abs/1606.01540

Abadi

Martín

Agarwal

Ashish

Barham

Paul

Brevdo

Eugene

Chen

Zhifeng

Citro

Craig

Corrado

Greg S.

Davis

Andy

Dean

Jeffrey

Devin

Matthieu

Ghemawat

Sanjay

Goodfellow

Ian

Harp

Andrew

Irving

Geoffrey

Isard

Michael

Jia

Yangqing

Jozefowicz

Rafal

Kaiser

Lukasz

Kudlur

Manjunath

Levenberg

Josh

Mané

Dandelion

Monga

Rajat

Moore

Sherry

Murray

Derek

Olah

Chris

Schuster

Mike

Shlens

Jonathon

Steiner

Benoit

Sutskever

Ilya

Talwar

Kunal

Tucker

Paul

Vanhoucke

Vincent

Vasudevan

Vijay

Viégas

Fernanda

Vinyals

Oriol

Warden

Pete

Wattenberg

Martin

Wicke

Martin

Yuan

Zheng

Xiaoqiang

TensorFlow: Large-scale machine learning on heterogeneous systems

2015 https://www.tensorflow.org/

Paszke

Adam

Gross

Sam

Massa

Francisco

Lerer

Adam

Bradbury

James

Chanan

Gregory

Killeen

Trevor

Lin

Zeming

Gimelshein

Natalia

Antiga

Luca

Desmaison

Alban

Kopf

Andreas

Yang

Edward

DeVito

Zachary

Raison

Martin

Tejani

Alykhan

Chilamkurthy

Sasank

Steiner

Benoit

Fang

Bai

Junjie

Chintala

Soumith

PyTorch: An imperative style, high-performance deep learning library

Advances in neural information processing systems 32 Wallach

Larochelle

Beygelzimer

dAlché-Buc

Fox

Garnett

Curran Associates, Inc.

2019 http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf 8024 8035

Virtanen

Pauli

Gommers

Ralf

Oliphant

Travis E.

Haberland

Matt

Reddy

Tyler

Cournapeau

David

Burovski

Evgeni

Peterson

Pearu

Weckesser

Warren

Bright

Jonathan

van der Walt

Stéfan J.

Brett

Matthew

Wilson

Joshua

Millman

K. Jarrod

Mayorov

Nikolay

Nelson

Andrew R. J.

Jones

Eric

Kern

Robert

Larson

Eric

Carey

C J

Polat

İlhan

Feng

Moore

Eric W.

VanderPlas

Jake

Laxalde

Denis

Perktold

Josef

Cimrman

Robert

Henriksen

Ian

Quintero

E. A.

Harris

Charles R.

Archibald

Anne M.

Ribeiro

Antônio H.

Pedregosa

Fabian

van Mulbregt

Paul

SciPy 1.0 Contributors

SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python

Nature Methods 2020 17

10.1038/s41592-019-0686-2

261 272

Harris

Charles R.

Millman

K. Jarrod

Walt

Stéfan J. van der

Gommers

Ralf

Virtanen

Pauli

Cournapeau

David

Wieser

Eric

Taylor

Julian

Berg

Sebastian

Smith

Nathaniel J.

Kern

Robert

Picus

Matti

Hoyer

Stephan

Kerkwijk

Marten H. van

Brett

Matthew

Haldane

Allan

Río

Jaime Fernández del

Wiebe

Mark

Peterson

Pearu

Gérard-Marchant

Pierre

Sheppard

Kevin

Reddy

Tyler

Weckesser

Warren

Abbasi

Hameer

Gohlke

Christoph

Oliphant

Travis E.

Array programming with NumPy

Nature Springer Science; Business Media LLC 202009 585 7825 https://doi.org/10.1038/s41586-020-2649-2

10.1038/s41586-020-2649-2

357 362

McKinney

Wes

others

Data structures for statistical computing in python

Proceedings of the 9th python in science conference Austin, TX 2010 445 51 56

AI4Finance-foundation

2022

20220930

https://github.com/AI4Finance-Foundation/FinRL

Tensortrade

2022

20220930

https://tensortradex.readthedocs.io/en/latest/L