The Small Language Model Revolution is here ✊✊🏾✊🏻
Here’s an example ⬇️
We (developers of the world) have been told so many times that “Bigger models are better” that we have internalised the mantra.
No questions.
Zero.
Cero.
Zéro
Nula.
ноль.
零
ज़ीरो
Every time we need to build an LLM-powered feature or product, we follow these steps:
go to OpenAI/DeepSeek/Anthropic/Gemini/whatever-LLM-provider-you-like
pick up the latest/larges/shiniest Large Language Model they offer
set up the billing (if it is your first time using that API)
generate a new API key,
uv install whatever Python SDK we need to talk to the API (e.g uv install anthropic), and…
BOOM!
We start streaming tokens from a data center the size of Luxembourg (and paying good cash for it)
And the thing is, this approach works well for demos and small projects, but does NOT work when you are building real-world apps and features that need to be used by real people.
Why?
This approach is super valid when your goal is to quickly put together a Proof-of Concept that you can
showcase to potential clients (hello freelancer!), or
pitch internally in your company.
However, as soon as you get buy in and need to take the project to the next step...
Your boss 👨🏻💼:
“Hey, we love the demo you built. Can we integrate this super cool feature into our application?”
...an ocean of questions and technical problems opens in front of you…
For example:
What about cost?
Third-party pay-as-you-go LM providers are great at small scale, but as your application grows, so does your bill.What about privacy?
There are industries that are very sensitive about data privacy and security, for example, healthcare and finance. How do you get good predictions from
OpeanAI/Anthropic without sharing this data with them?What about latency?
Is it acceptable a delay of a few seconds to get model responses?What about offline operability?
What if you plan to embed an LLM in a drone that will be deployed in a remote area where internet is not available? Or what about a self-driving car that needs to make decisions without internet?
These are not easy problems to solve when the LLM you are using is a hundreds-of-billion parameter model running on a remote data center God knows where.
Instead, you should take a step back, take your initial premise of work, and ask yourself:
"Is bigger REALLY better for my use case?"
And the answer is usually no.
Why Bigger is NOT better is most cases
Going for the bigger and most accurate model out there is like saying:
"I need to travel and I know that planes are faster than cars. Hence I will get a plane"
This argument does not make ANY SENSE until you ask yourself:
"Where the heck do I need to go?"
And the thing is, most trips in this world are short enough to be done more efficiently with a bike or a car than a plane.
The same happens to Language Models (aka the vehicles) and all the real-world problems companies need to solve with these LLMs (aka the destinations).
In the same way you pick the right vehicle for the trip, you need to pick the right LLM for the problem you are trying to solve.
Or, if you prefer to update your mantra, take this one:
The new mantra 🪬
Smaller, task-and-device specific models are better!…
... because they are cheaper, faster, private, and more accurate for the task you are trying to solve.
For example
I don't want to convince you with words. I want to show you with code.
A few weeks ago I built a Chess Game using a tiny 350M parameter model by Liquid AI, and deployed it in an iOS app.
The main steps were:
Download 7k historical chess games played by the great Magnus Carlsen
Process raw data into an instruction dataset for supervised fine-tuning, that you can find on Hugging Face hub.
Fine-tune LFM2-350M to imitate Magnus (aka LFM2-350M-MagnusInstruct)
Evaluate the model by playing agains another programatic chess player.
Bundle the model with leap-bundle
Create a simple iOS game and embed our fine-tuned LLM using the Leap Edge SDK for iOS
You can find a full article about this project here:
The bigger picture
This isn't just about chess or mobile games.
It's about proving that AI can be:
✅ Privacy-first (data never leaves the device)
✅ Cost-effective (no cloud compute costs)
✅ Accessible (works offline)
✅ Performant (optimized for mobile)
For LLM engineers tired of cloud-only solutions, this is your blueprint for bringing intelligence to the edge.
And for edge developers without AI experience, this is the easiest path to embed cost-effective, privacy-first, and performant AI in your projects.
The trend is clear. Small Language Models are the future (and present)
Let's catch this wave together!
Enjoy the weekend,
Peace and Love,
Pau



Thanks for the good 😊
++ Good Post. Also, start here : 500+ LLM, AI Agents, RAG, ML System Design Case Studies, 300+ Implemented Projects, Research papers in detail
https://open.substack.com/pub/naina0405/p/most-important-llm-system-design-77e?r=14q3sp&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false