Let's build your first real-world ML app

You can do it 🫵 I can help you 🤗

Dec 30, 2023

Companies hiring ML engineers do not look for ML course completers. They look for problem solvers, who can solve real-world business problems using data, a bit of science and lots of engineering.

And the thing is, the only way to become good at problem solving is by… (drums 🥁🥁🥁) … solving problems!

👉🏽 Subscribe to the Real-World ML Youtube channel for more free lectures like this

What is preventing you from building your first ML app?

A while ago I asked you on Twitter:

Pau Labarta Bajo @paulabartabajo_

What is preventing you from building a *complete* real-world Machine Learning app?

And the answers I got can be bucketed into 4 categories.

“I don’t have data” (lack of data)
“I don’t have any project idea” (lack of inspiration)
“It costs money” (tight budget)
“I don’t know how to do it” (lack of knowledge)

Whatever is blocking you from building an ML product, I have a solution for you 🤗 ↓

Excuse 1 → “I don’t have data”

Junaid Ali @junaid_py_

@paulabartabajo_ The freaking data itself. Unavailability.

Solution: here you have 3 options.

Search in Kaggle datasets, one of the largest repositories of public datasets in the World.
Find a public API in this TOP repository.
Find the website you are interested (for example NBA stats page) and build your own scrapper. True, it is time-consuming, but you will learn tons of Python in return.

Excuse 2 → “I don’t have a project idea”

LaLaLandLefty @ai_sparkle99

@paulabartabajo_ A list of such project ideas would be swell

Solution: here is a list of 20 ML apps built by students at the Royal Insitute of Technology in Stockholm (KTH) under the supervision of Jim Dowling.

I bet you will find something that will inspire you.

→ Check them out here.

Excuse 3 → “It costs money”

Deepak Kumar @DeepakK92429856

@paulabartabajo_ Money and Data

Solution: this statement is simply not true.

To run an ML app you need 3 types of services:

Computing services, to run your model training jobs, and your model inference in a real-time ML system.
- Solution: GitHub actions are free VMs you can use to run such jobs.
  And Streamlit offers free computing to run your inference. Boom.
Storage service, to store and serve data (aka features), model artifacts (e.g. pickle files with serialized models), and metadata (e.g. offline validation metrics of your latest model).
- Solution: Hopsworks is a managed feature store with up to 25GB of free storage, which is more than enough to build a serious ML project. Click here to get your feature store for free. If you wanna use a full-featured experimentation tool + model registry you can use Weights&Biases too.
Orchestration, to coordinate the execution of the 3 pipelines of your system (feature pipeline, training pipeline, inference/deployment pipeline). I recommend you read this previous installment of The Real-World ML Newsletter to have the full context.
- Solution: Again, GitHub actions are flexible enough to build a fully working ML app with the 3-pipeline design.

Excuse 4 → “I don’t know MLOps”

Siddharth Shukla @sid8491

@paulabartabajo_ learning curve and lack of documentation/tutorials for mlops projects.

Solution: I agree. Most of the MLOps resources out there are either

“tool-specific “, meaning it is hard to grasp the underlying structure and essence of an ML app, beyond specific tools or stacks.
“too large to understand”, like most engineering blogs published by Tech Giants like Uber, Binance, or Facebook.

The idea is simple 💡
Is there a universal way to design ML systems, that you can learn once and apply every time?
Yes!
Stop thinking in terms of tools, and start thinking of 3 pipelines:
→ Feature pipelines → transform raw data into model features
→ Training pipelines → produce models
→ Inference pipelines → serve models’ predictions.
👉🏽 Subscribe to the Real-World ML Youtube channel for more free lectures like this

Now it is YOUR turn 👊

You have no excuses.

It is time to get your hands dirty.

Enjoy the journey.

Pau

Real-World Machine Learning

Discussion about this post

Ready for more?