Building a real-world ML project is the best way to differentiate yourself from other applicants and build a reputation, that helps you land ML jobs.
Show.
Not tell.
This is not what most people looking for jobs are doing (because it is hard!). But this is, on the other hand, exactly what employers looking for tech professionals need.
Clear ideas. Straight execution.
Many of you asked me this week on Twitter (sorry X) the following:
OK, I understand the steps to build a real-world ML product.
Now, how can I find a real-world problem and dataset from which I can build my OWN project”
Let me help you 🤗 ↓
The 3-meta ingredients behind a successful ML project
To build a real-world ML project you need 3 meta-ingredients:
A real-world problem you care about, so you do not quit when things get tough.
A source of data, because there is no Machine Learning without data.
A basic understanding of the main steps to follow to develop an ML product (check this article if you don’t know them)
So, assuming you have 3), you only need to decide on 1) and find 2).
Here is what I usually do.
Step 1. Make a list of 3 problems you are *genuinely* interested in
The trick is to work on a real-world problem that genuinely interests you, so you don’t quit when things get tough.
Building a project is way harder than completing an online course. You will go through ups and downs. So, by working on a problem you are passionate about will make you stick and not quit when things get tough.
A few project ideas I would work on, for example, would be:
Predict the outcome of NBA matches. I love basketball, played it for a long time, and have some domain expertise I could leverage here.
Build a chatbot that writes stand-up comedy. I occasionally perform in stand-up clubs and admire the ability of humans to make other humans laugh. Moreover, NLP is an area that is booming and generating lots of job opportunities.
Build a trading bot. I am deeply interested in real-time ML, and financial trading is the perfect playground for that: plenty of data, APIs, and also market interest.
I hope these examples inspired you.
Now, go and think about yours. What do you REALLY WANT to BUILD?
Give yourself a few options, and then choose based on data availability (more on this in the next block).
Step 2. Find a source of (preferably live) data
Without data, there is no Machine Learning.
Luckily, there is still plenty of freely available data to build amazing ML apps. Here I am giving you 3 options.
Option 1. Use a dataset from Kaggle and simulate API calls
Kaggle has plenty of historical datasets you can use to get started. Such datasets help you train an ML model, however, they do not provide access to recent, live data.
Real-world ML systems need to regularly ingest new data, to generate new predictions, otherwise, they do not bring value.
To accomplish this, you can implement a Python function that simulates calls to a production API, by sampling historical data from your CSV/Parquet file.
💡 This is the technique we use in The Real-World ML Tutorial to generate live taxi rides in NYC from this website.
Option 2. Build a web scrapper
For example, in case I wanted to predict NBA match outcomes, I could build a web scrapper to get all the historical AND live data from NBA matches from here.
Building a web scrapper in Python is a great exercise, that will sharpen your software engineering skills. So, if you haven’t done it before, I encourage you to do it.
Option 3. Find a public API
There are tons of public APIs you can use for many real-world problems, like the ones in this repository.
Using a battle-tested API is the ideal option to build a robust feature pipeline for your ML app.
Now it is YOUR time 🤟
Passion and data are the 2 blockers at the onset of the project. I hope this article gave you the inspiration and ideas to get your thing going.
And if you need technical help during your project adventure, consider joining any of my Real World ML Courses. They are tough. But you learn what real world ML is.
I would love to hear about your project ideas and progress, so do not hesitate to drop your comment below.
Keep on learning!
Pau
Some of your git repo are old on tools used and when one raise issues it takes forever to respond, I am not sure if you are even going to respond. This does not speak well
Let's go 2025 🤏