Read time: 3 minutes
Everyone is talking about MLOps pipelines. But no one anyone is telling you exactly how to build one. Let’s address this blind spot today.
Feature backfilling with Prefect
Let’s build a feature backfill pipeline, that can run on-demand, and that helps us fill the Feature Store with historical feature values. Backfilling a feature store is a necessary operation in any real-world ML system, that you need to do before you can train any ML model.
Tools 🛠️
I have been wanting to try Prefect for a while. It is a pretty popular open-source orchestration tool, that has in the last couple of years become a serious contender for the archi-popular Apache Airflow orchestration service.
Hopsworks is our managed Feature Store.
GitHub project
Link to the source code Give it a star ⭐ GitHub if you got value
I am trying to build a real-time ML system for crypto trading, so I need to have plenty of historical data in my Feature Store first before I can start training any model.
Dumping historical features into your Feature Store is called backfilling. And backfilling is something you can do pretty well with Prefect.
In this project, I built a Prefect flow that can backfill historical BTC/USD OHLC 1-min data using historical trades provided by the Kraken API.
For each day in the past, the backfill pipeline:
Fetches historical trades (aka raw data) from the Kraken API
Transforms raw trade data into OHLC 1-minute candles (Open-High-Low-Close)
Saves these features to a Feature Store.
Calling the pipeline is as easy as running the following command:
$ make from_day=2023-01-01 to_day=2023-01-31 backfill
Go play with the code, fork it, and modify it for your needs.
Build something. And share it with me 🤗
Whana take your ML engineering skills to the next level?
→ Join Serverless ML, a community for real-world ML builders 👨👨👦👦
→ Join the Real-World ML Tutorial, and build your first end-2-end, step-by-step, A-to-Z real-world ML service, to predict taxi demand in NYC. 🚕💰
→ Wanna land your first ML engineering job? I can help. Book a 45-minute session with me, and get a 20% discount using the coupon “ROCKET” 🚀
Keep on learning!
Enjoy your day,
Pau