Agent architectures that scale

in the real world

May 03, 2025

These past 4 weeks I have talked to several companies, who want to develop some sort of agentic platform.

They are all in the same situation.

They managed to build a Proof Of Concept, for their business case, using third-party services (like OpenAI API) and duck-taped Python scripts

For example
An agent demo to automate customer support workflows in a gaming company, built as a single Python Langgraph flow running on AWS Lambda.

What’s the problem

I have worked as an ML engineer for 10 years, and I have seen this hype-crash-back-to-basics several times.

10 years ago it was the Deep Learning hype, when companies of all sorts (including the one I was working back then) thought that Deep Learning for Computer Vision was all about training neural networks from scratch, without thinking of deployment and operationalisation (aka fancy ML without Ops).

Needless to say, all these projects ended up badly (including the one I was working xD).

And the thing is, I see a similar trend these days with LLM engineering and LLMOps.

Companies know how to build LLM demos, but they don’t know how to scale and operate them. So their toys never see the light of production.

Either because they are

too slow
too expensive
too hard to trust,
or all of these reasons together.

Today I want to share my 2 cents to help you go from LLM toys to LLM products that move the needle for your company.

An LLMOps blueprint

Agentic platforms are no dark magic.

They are just a bunch of applications running as containerised services inside a compute platform.

Let’s go one by one:

Compute platform

When you are a small company, you can start with serverless compute like AWS Lambda or GCP Cloud Functions. However, as you start to grow, and token volumes increase in your agentic workflows, your cloud bills will start to grow. A LOT.

So, unless you are willing to burn cash, I recommend you get into the Kubernetes train.

Your cost curve will flatten, and you will increase return of investment.

My advice
The Kubernetes learning curve is steep, especially at the beginning. However, when you overcome this first shock, a whole new world of possibilities opens in front of your eyes.

Agent workflow logic

This is typically a Python script written using libraries like

Langchain
Pydantic AI
Langgraph
Llamaindex

or, even better, using a Rust alternative like Rig.

My tip
Rust is a compiled language, that produces very small binaries, that translate into super slim containers running in your cluster. This means you can run 10-50x more agents in Rust than in Python, using the same infrastructure. So you build safer, faster and cheaper agents.

LLM servers

LLM servers, that provide text completions used by the agent workflows to reason, and to output responses.

Tool servers

Tool servers, that acts as gateways between the agents and the external services theses agents invoke to accomplish their tasks

And the thing is, with the emergence of standards for

Agent-tool interaction, like Model Context Protocol introduced by Anthropic
Agent-to-Agent interaction, like the newly proposed Agent2Agent protocol by Google.

this modularisation will take us to yet-another era of microservices architectures.

In this case, micro-agent architectures.

Which means that, if you and your company want to make the most out of it, you need to go back to good-old software engineering and DevOps best practices.

In a nutshell

IMHO this is the design that unlocks the door to Agentic Architectures that help you either

make more money for your business, or
spend less money in your business.

Wanna learn LLMOps that work in the Real World?

Marius Rugan and I are preparing a hands-on course on LLMOps.

No more toys.

No more demos.

Just the things we learn every day while working for our clients, building LLM and agent-based systems.

Which means, we will share you all the tips and tricks we discover at work, building production-ready LLM solutions.

Hope this helps,

Enjoy the weekend,

And talk to you next week

Pau

Real-World Machine Learning

Discussion about this post