Machine Learning models are as good as the input features you feed at training and inference time.
And for many real-world applications, like financial trading, these features must be generated and served as fast as possible, so the ML system produces the best predictions possible.
Generating and serving features fast is what a real-time feature pipeline does.
How can you implement a real-time feature pipeline?
Python alone is not a language designed for speed π’, which makes it unsuitable for real-time processing. Because of this, real-time feature pipelines were usually written with Java-based tools like Apache Spark or Apache Flink.
However, things are changing fast with the emergence of Rust π¦ and libraries like Bytewax π that expose a pure Python API on top of a highly-efficient language like Rust.
So you get the best of both worlds.
Rust's speed and performance, plus
Pythonβs rich ecosystem of libraries.
So you can develop highly performant and scalable real-time pipelines, leveraging top-notch Python libraries.
π¦ + π + π = β‘
Hereβs an example
In this repository, you will learn how to develop and deploy a real-time feature pipeline in 100% Python that
fetches real-time trade data (aka raw data) from the Coinbase Websocket API
transforms trade data into OHLC data (aka features) in real-time using Bytewax, and
stores these features in the Hopsworks Feature Store
You will also build a dashboard using Bokeh and Streamlit to visualize the final features, in real-time.
I would be also very grateful if you could give a star β to the GitHub repository if you like it π.
Letβs go real time! β‘
The only way to learn real-time ML is to get your hands dirty.
β Go pip install bytewax
β Support their open-source project and give them a star on GitHub β and
β Start building π οΈ
Ready to become a better ML engineer?
I wish you a fantastic day,
Keep on learning,
Peace and Love
Pau