AI lab Inside Out
How AI building REALLY works
The plan for today was to release the 2nd part of the hands-on series on Visual Language Models that we started last week.
I wanted to show you how structured output generation helps you build rock-solid image classifiers using small Visual Language Model, but…
… I decided to change plans.
Don’t worry!
If you cannot wait until next week to see how structured generation works, you can check the code implementation yourself. Next week I will release the blog post.
So, what happened?
This week I started working at Liquid AI as a Dev Rel Engineer.
My role?
I am sitting in the middle of a magic triangle between:
Liquid AI tech team → the chefs who cook our frontier AI models
Liquid product team → the wizards who build AI powered products for enterprises using our models and tools.
YOU, my dear developer!
Yes, YOU are a key element of this equation.
Because
I will succeed if you succeed at building AI systems using the resources, hands-on tutorials and examples I will build and share with you every single week.
So, what’s the plan?
I want to help you from 2 angles:
Engineering deep-dives. I am a mathematician by training. I don’t tolerate fluffy content and half-baked ideas. So expect more of what I have been doing for the last 3 years:
End-2-end tutorials on how to build and deploy AI powered systems that work.
Straight to the point videos, with and without my wife exercising in the background.
A few jokes here and there, because when you laugh and relax, you learn faster and live longer.
Business perspective. There is soooo much noise on social media about AI, that it gets super hard to know what are real trends and what are “hallucinated” trends in the AI industry. I will try to cut the noise to the minimum and bring you the cleanest signal I can, directly from the inside of a top industrial AI lab.
Where is the AI industry heading?
What are frontier AI labs working on?
What are the most important challenges that enterprises and startups face when building AI powered products?
And now, let’s start! 🚀
This week we had 2 big releases…
Release #1 → LFM2-Audio-1.5B is out! 📣
LFM2-Audio-1.5B is an open-weights small end-2-end audio foundation model that can handle both text and audio input and outputs.
👉🏽 You can run inference on this model using this open-source Python SDK
What is unique about it?
First of all, its size.
With only 1.5B parameters it is small enough to run on phone devices with production speed and quality.
Second, it can handle both text and audio input and spit out both text and audio output.
What does this mean?
With LFM2-Audio-1.5B you can build all sorts of applications, including:
Conversational chats using audio input.
Text-to-speech synthesis (TTS)
Speech-to-text transcription (ASR), or
(hold my beer) Real-time speech-to-speech conversation!
For example
Imagine a phone call with a Chinese colleague, where you speak in English and she speaks in Chinese, and you both hear each other’s voices in your native languages.
Well, stop imagining. We will build it together soon and deploy it using the Liquid AI Edge Platform SDK (aka LEAP)
Release #2 → Apollo app for Android 🌙🤖
Apollo is 100% local and privacy-first iOS and Android app that helps you vibe check small and local Language Models.
Why is this relevant?
A common mistake I have seen among devs (including myself) is to start every LM project by building a prototype using a frontier model, like Claude Sonnet or GPT-4.
And this is ok, until you get green-light and need to move to the production phase.
Many questions arise:
Costs scale linearly with usage
Latency cannot be reduced to acceptable levels.
Bye bye data privacy
Offline operability
And the thing is, you can solve this problem in 2 ways:
Option 1. The hard way 🏋
You stick to Claude-<PLACEHOLDER>, and put serious engineering efforts to mitigate as many of these problems as you can. Some you cannot, like data privacy.
Option 2. The easy (and smart) way 🧠
Take one step back.
Instead of using a massive frontier model you follow these 3 steps:
Vibe-check different Small Language Models that might work for your use case. You need to filter out models that can
fit on your target deployment platform (e.g. phone, laptop, robot…), and
give decent performance out-of-the box on your evaluation dataset.
Customize the best model you found at stage 1, using either
autoprompting, or
parameter fine-tuning, like Supervised Fine Tuning + LoRA adaptation to speed things up.
Deploy to your application. This step is becoming increasingly easier thanks to new tools like the LEAP SDK.
Step 1 is what Apollo helps you with.
You can download the app for iOS
or
I wish you a happy vibe-checking!
That’s it for today!
My kids just came back from kindergarten and they are about to put the house on fire. I have to stop them.
Talk to you next week!
Peace and Love,
Pau



