Let's build a Real-Time Audio Transcription System that runs 100% locally
Say hello to LFM2-Audio-1.5B š
Every time you use Siri, Google Assistant, or any voice-to-text service, your voice gets packaged up and sent to a cloud server somewhere for processing.
Think about that for a second.
Your voice. Your conversations. Your private audio. All traveling through the internet to be processed by someone elseās computer.
And weāve normalized this so much that we donāt even question it anymore.
But hereās the thing: it doesnāt have to be this way.
Whatās the problem?
Cloud-based transcription works great for demos and small projects. But when youāre building real-world applications that need to handle sensitive audio data, a whole ocean of problems opens up:
What about privacy? Healthcare applications recording patient consultations. Financial advisors taking client meetings. Legal depositions. Therapy sessions. Do you really want all that audio flying through the internet to OpenAI/Google/Amazon servers?
What about latency? Audio travels to a server. Gets processed. Response comes back. Thatās seconds of delay in a conversation that should feel instant.
What about cost? Third-party transcription APIs are great at small scale, but as your application grows, so does your bill. Processing thousands of hours of audio per month? Get ready for some fat invoices.
What about offline operability? What if you need transcription in a remote area with spotty internet? What about embedded devices, smart home systems, or self-driving cars that need to process voice commands without relying on connectivity?
These are not easy problems to solve when your transcription model is running on a remote data center God knows where.
So hereās the question we should be asking:
Can we do this completely locally?
And the answer is⦠YES!
Let me show you a hands-on example using the open-weight LFM2-Audio-1.5B model by Liquid AI and llama.cpp for fast local inference.
Letās get to it!
Hands-on example
You can find all the source code in the Liquid AI Cookbook, your one-stop shop for learning how to build local AI that works.
Give it a ā on Github if you get value from it š
These are the steps to run the CLI on your local desktop:
Clone the cookbook repository and cd into the folder for this example
git clone https://github.com/Liquid4All/cookbook.git cd cookbook/examples/audio-transcription-cliInstall uv on your system if you donāt have it already.
Mac/Linux
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -ExecutionPolicy ByPass -c āirm https://astral.sh/uv/install.ps1 | iexā
Download a few audio samples
uv run download_audio_samples.py
Run the transcription CLI, and see the transcription of the audio sample in the console.
uv run transcribe --audio ā./audio-samples/barackobamafederalplaza.mp3ā --play-audioBy passing the
--play-audioflag, you will hear the audio in the background during transcription.
Here is an example using the sample audio you downloaded in step 3.
Letās get deeper into the inner workings of this tool.
Under the hood š„·š„·š¾
The entry-point of this tool is the transcribe.py script, which:
Download the llama.cpp binaries required to run the audio model
Loads the config parameters from the config.py file with things like:
Paths to the llama.cpp executable that serves the model predictions.
Paths to the GGUF files that make up the audio model.
Audio chunking and overlapping seconds.
Visual UI effects like typewrite speed.
Kicks-off the transcription by calling
model = LFM2AudioWrapper(config) transcription = model.transcribe_with_real_timing( audio_file_path=audio_file, chunk_duration=2.0, overlap=0.5, play_audio=play_audio, typewriter_effect=typewriter_effect )
The heavy-lifting is done by llama.cpp and the Python wrapper implemented in model_wrapper.py, which:
Chunk the input audio to simulate real-time processing.
Transcribes each chunk with super-low latency using llama.cpp
Whatās next?
The CLI works, but the output text has some overlapping and grammatical issues. To fix this, we could chain LFM2-Audio-1.5B with LFM2-350M, into a 2-step low-latency workflow where:
Step 1 ā LFM2-Audio-1.5B converts audio ā raw text
Step 2 ā LFM2-350M cleans raw text ā polished transcription
Both models running locally. Both blazingly fast.
I did not have time to cover this today, but I plan to do it in an upcoming newsletter.
And before you leaveā¦
A few years ago I was looking for a GOOD MLOps tutorial on the internet, something that explained MLOps from first principles. Cloud-agnostic. Transferable from project to project, no matter if I (or my client) was using.
AWS Sagemaker
Google Vertex AI or
A Kubernetes cluster using fully open-source tools.
And let me tell you something. It was no easy, until I stumbled a collection of open-source self-paced tutorial on MLOps published by Jim Dowling.
Jim is the CEO and co-founder of Hopsworks, the first open-source Feature Store in the world.
He is also the father of the Feature-Training-Inference pipeline design, the gold standard used by companies of all sizes for building modular AI systems that scale.
A couple of weeks ago Jim published his first OāReilly book, called āBuilding Machine Learning System with a Feature Storeā.
In this book he shares hands-on with code all the best-practices he has learned over more than 20 years working with enterprises, startups and students, building production-ready AI systems that work.
The book talks to
Data Scientists who want to transition from training models to building ML systems
ML Engineers who want to learn about how to build batch, real-time, LLM systems in modular parts that you compose into a ML system
Data Engineers who want to learn about the data transformation taxonomy for ML and how badly structured DAGs prevent reuse in ML systems
Architects who want to learn how modularity helps you build faster and better ML systems.
The book is hands-on, full of tips and tricks and straight-to-the-point.
I have a gift for you š
As a subscriber of the Real-World ML Newsletter you can use the promo code RWML25 and download a digital copy of this book for FREE.
If you prefer a Kindle-friendly or a physical copy of the book, hereās the link
Letās keep on building together,
Enjoy your Saturday,
Peace and Love
Pau







The privacy angle here is really compling. I never thought about how much audio data we casually send to cloud services for processing. Running transcription localy with LFM2-Audio makes so much sense for sensitive applications like healthcare or legal work. Curious to see how the two-step workflow with LFM2-350M will improve the output quality.