Real-World ML #014: It is time to build 🏗️

You won't have an excuse

Pau Labarta Bajo

Mar 04, 2023

Read time: 5 minutes

Last week I asked you on Twitter:

Pau Labarta Bajo @paulabartabajo_

What is preventing you from building a *complete* real-world Machine Learning app?

And the answers you gave me can be bucketed into 4 categories.

“I don’t have data” (lack of data)
“I don’t have any project idea” (lack of inspiration)
“It costs money” (tight budget)
“I don’t know how to do it” (lack of knowledge)

Whatever is blocking you from building an ML product, I have a solution for you 🤗 ↓

Excuse #1. Lack of project ideas

LaLaLandLefty @ai_sparkle99

@paulabartabajo_ A list of such project ideas would be swell

Solution: here is a list of 20 ML apps built by students at the Royal Insitute of Technology in Stockholm (KTH) under the supervision of Jim Dowling.

I bet you will find something that will inspire you.

→ Check them out here.

Excuse #2. Lack of data

Junaid Ali @junaid_py_

@paulabartabajo_ The freaking data itself. Unavailability.

Solution: here you have 3 options.

Search in Kaggle datasets, one of the largest repositories of public datasets in the World.
Find a public API in this TOP repository.
Find the website you are interested (for example NBA stats page) and build your own scrapper. True, it is time-consuming, but you will learn tons of Python in return.

Excuse #3. It costs money

Deepak Kumar @DeepakK92429856

@paulabartabajo_ Money and Data

Solution: this statement is simply not true.

To run an ML app you need 3 types of services:

Computing services, to run your model training jobs, and your model inference in a real-time ML system.
- Solution: GitHub actions are free VMs you can use to run such jobs.
  And Streamlit offers free computing to run your inference. Boom.
Storage service, to store and serve data (aka features), model artifacts (e.g. pickle files with serialized models), and metadata (e.g. offline validation metrics of your latest model).
- Solution: Hopsworks is a managed feature store with up to 25GB of free storage, which is more than enough to build a serious ML project. Click here to get your feature store for free. If you wanna use a full-featured experimentation tool + model registry you can use Weights&Biases too.
Orchestration, to coordinate the execution of the 3 pipelines of your system (feature pipeline, training pipeline, inference/deployment pipeline). I recommend you read this previous installment of The Real-World ML Newsletter to have the full context.
- Solution: Again, GitHub actions are flexible enough to build a fully working ML app with the 3-pipeline design.

Excuse #4. Lack of knowledge

Siddharth Shukla @sid8491

@paulabartabajo_ learning curve and lack of documentation/tutorials for mlops projects.

Solution: I agree. Most of the MLOps resources out there are either

“tool-specific “, meaning it is hard to grasp the underlying structure and essence of an ML app, beyond specific tools or stacks.
“too large to understand”, like most engineering blogs published by Tech Giants like Uber, Binance, or Facebook.

And this is precisely why I created The Real-World ML Tutorial.

It is a hands-on tutorial that teaches you step-by-step, how to build this real-world ML app to predict taxi rides in NYC using live data and MLOps best practices

🎁 Massive discount until next Saturday

As I want to leave you without excuses, I am offering a massive 50% discount on the tutorial until next Saturday.
Use the discount coupon “NIL” at checkout or use this direct link, and get it for only $75.
You will also get life-time access to my private Discord server, to connect with me and a growing community of ML builders.

Now it is YOUR turn 👊

You have no excuses.

If you have any questions drop me a line at plabartabajo@gmail.com

Happy building!

Pau

Real-World Machine Learning

Discussion about this post