In the last Monday Coffee at the Real World ML Community we discussed about enterprise-grade RAG systems, and we got our hands dirty with some NVIDIA blueprints.
You can watch the full recording on this link
The good (best?) thing of talking to smart people is that you learn a lot, and you start asking yourself questions that you never thought about before. Things that lie a bit of your comfort zone, that can be a bit scary, but that are also very interesting. Things that can take you to the next level.
What the heck I am talking about?
Infrastructure
As AI engineers we live our lives in the Python high-level world, abstracted away from most of the low-level details that power our systems, starting from the infrastructure (aka the computers) where our Python scripts (hopefully as Docker containers) run.
But the thing is, in the world of an AI engineer (as in life), the most amazing things happen when you step out of your comfort zone.
In my experience
Working as a data scientist/ML engineer/AI engineer (whatever this thing will be called next year), the people I have learned the most from where either data engineers or infrastructure engineers.
Now, in the Real World ML Community there is a guy called Marius Rugan, who happens to be world-class infrastructure engineer.
He was the one demoing the NVIDIA blueprints last Monday, and the guy I am working with in the upcoming LLMOps bootcamp we will start on October.
During our Monday Coffee, Marius kicked off a sample deployment of this RAG system blueprint by NVIDIA, that combines best in class LLMs for different tasks, and a vector database.
Unfortunately, we did not have time to finish the deployment, and see the system in action.
This is what triggered me today to say:
"Hey Pau, you should write a blog post about this".
So I tried to reproduce the deployment steps from scratch, like a cowboy, and share my experience with you here.
The goal
I like to learn new things by setting myself a very precise goal.
In this case, the goal was to run the system in this Github repo on a single GPU instance.
The initial goal is one, but then reality hits you hard. And you start finding little rocks on your way, that you need to circumvent to move forward, and on the way you learn a lot.
What was the first rock I encountered?
I don't have an NVIDIA GPU instance
I am a Mac user, so I don't have an NVIDIA GPU instance.
So I have to rent one.
Mind the gap
There are many GPU providers out there, but you need to be carefull with the price gaps between them. You can find up to 4x price differences for the same GPU instance from 2 different providers.
For example, last week renting 8x H100 80GB instances from Lambda-labs was 28 USD per hour, while the same instance on Google Cloud was 110 USD per hour.
In my case, I used Paperspace (now part of DigitalOcean) to fire up an A4000 machine (less than 1 USD per hour) from the command line using my API key.
curl --silent -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $PAPERSPACE_API_KEY" \
-d '{
"name": "a4000-7e8b",
"machineType": "A4000",
"templateId": "t0nspur5",
"diskSize": 50,
"region": "NY2",
"publicIpType": "dynamic"
}' \
"https://api.paperspace.com/v1/machines"
You will see the machine start up, and then you can ssh into it.
ssh -i ~/.ssh/whatever-key.pem paperspace@INSTANCE_IP
where `INSTANCE_IP` is the IP of the machine you just created, that you can find in the Paperspace console.
Nice, we got the machine. Are we ready to run the system?
Next rock: the VM has no Docker!
As I said, learning hands-on is all about finding unexpected rocks on your way, and solving them.
In this case, the machine we rented is super bare bone. In particular, there is no docker installed. Not to talk about the NVIDIA Container Toolkit, and some other tools we need to run the Docker containers of this blueprint.
Now, I could install all these things manually from the command line. But I am lazy, and I wanted to use a tool that can help me with this.
This is where my path took a detour, and stopped for a bit in the world of Ansible.
What is Ansible?
Ansible is a tool that helps you (among other things) automate the installation of software on your machines. It is a very lightweight tool, that only requires
Python to be installed on your machine, and
ssh access to the machines you want to install software on.
In this case, I want to install docker, the NVIDIA Container Toolkit, and some other tools on the Paperspace GPU instance I rented.
As a starter, let's focus only on Docker.
Basic step-by-step Ansible example
To use Ansible, you first need to install it on your machine.
pip install ansible
Then you need to create a `inventory` file, that contains the list of machines you want to install software on.
[paperspace]
your-instance-ip ansible_user=paperspace ansible_ssh_private_key_file=~/.ssh/your-key.pem
Then you need to create a `playbook`, that contains the list of tasks you want to run on the machines.
This is the one I used to install Docker with full-batteries included on the Paperspace GPU instance.
---
- name: Install Docker on Paperspace GPU instance
hosts: paperspace
become: yes
vars:
docker_packages:
- apt-transport-https
- ca-certificates
- curl
- gnupg
- lsb-release
tasks:
- name: Update apt package index
apt:
update_cache: yes
cache_valid_time: 3600
- name: Install required packages
apt:
name: "{{ docker_packages }}"
state: present
- name: Add Docker's official GPG key
apt_key:
url: https://download.docker.com/linux/ubuntu/gpg
state: present
- name: Add Docker repository
apt_repository:
repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
state: present
- name: Update apt package index (after adding Docker repo)
apt:
update_cache: yes
- name: Install Docker CE
apt:
name:
- docker-ce
- docker-ce-cli
- containerd.io
state: present
- name: Start and enable Docker service
systemd:
name: docker
state: started
enabled: yes
- name: Add user to docker group
user:
name: "{{ ansible_user }}"
groups: docker
append: yes
- name: Install Docker Compose
get_url:
url: "https://github.com/docker/compose/releases/download/v2.20.2/docker-compose-linux-x86_64"
dest: /usr/local/bin/docker-compose
mode: '0755'
- name: Verify Docker installation
command: docker --version
register: docker_version
- name: Display Docker version
debug:
msg: "Docker installed successfully: {{ docker_version.stdout }}"
To run the playbook, you need to run the following command:
ansible-playbook -i inventory.ini playbook.yml
And BOOM, you got the machine ready for Docker.
Time to wrap up
Today I showed you some of the things I have worked and thought about this week. In particular, how to get your hands-dirty setting up some minimal infrastructure to deploy a RAG system.
My takeaways here are:
Scratch the surface of infrastructure engineering. It will make you a better AI engineer.
I recommend you set one specific goal and start walking (aka bulding) from it.
Along the way you will encounter many rocks, that will slow you down, but teach you tons of stuff if you decide to take short detours, like today with Ansible.
Talk to you next week,
Pau
Hey Pau, when is it good to roll your own GPU vs just using existing? Maybe some specific case that absolutely will give one an edge? Just asking. Nothing more. I have attended your classes.