Say you want to build an ML service to extract crypto market signals from financial news.
The output of such service can be piped into a predictive ML model, together with other predictive features, like real time market prices for all currencies, to come up with the best price prediction possible.
This is, by the way, what we will do in the next cohort of Building a Real-Time ML System. Together.
Let me show you how to solve this real world problem, using LLMs and a bit of prompt engineering.
The problem
We want to build a Python microservice that
given a news headline
news = "FED to increase interest rates"
outputs a market signal and the reasoning behind this score.
{
"signal": "bearish",
"reasoning": "The news about FED increasing interest rates is typically bearish for crypto markets for several reasons:\n1. Higher interest rates make borrowing more expensive, reducing liquidity in the market\n2. Higher rates make traditional yield-bearing investments more attractive compared to crypto\n3. Risk assets like cryptocurrencies tend to perform poorly in high interest rate environments\n4. Historically, crypto prices have shown negative correlation with interest rate hikes"
}
The signal is a categorical with 3 possible values
bullish (positive market impact)
neutral (neutral/unclear impact), or
bearish (negative market impact)
Transforming raw input into a structured output is something LLMs excel at, so it makes sense we give them a try here.
Let me show you how to implement this in Python, using a powerful LLM and a bit of prompt engineering.
You can find all the source code in this repo.
👉 Github repo
The solution 🧠
We want our model to ingest raw textual data, and output a structured response.
The fastest way I know to solve this problem is by
using a strong LLM with function calling capabilities (for example Claude), and
a library like llama index, to make sure the output has the desired format.
In this example I will be using Antrhopic’s Claude, but feel free to replace it with another LLMs, for example
OpenAI, or
Llama 3.2 running locally with Ollama.
If you want to follow along with me, you will need to get your API key here and spend less $0,01.
These are the steps to build this.
1. Bootstrap the project 🏗️
I started using uv as my Python build tool. If you haven’t, I recommend you give it a try.
Install uv on Linux/Mac
curl -LsSf https://astral.sh/uv/install.sh | sh
or on Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Then bootstrap your project from the command line with this command
uv init crypto-sentiment-extractor --lib
cd crypto-sentiment-extractor
2. Install Python dependencies 🛠️
which are
llama-index and llama-index-llms-anthropic to interact with Claude easily, and control the output format
uv add llama-index-llms-anthropic uv add llama-index
pydantic to supercharge our Python code with type validations
uv add pydantic
pydantic-settings to load and validate configuration parameters easily
uv add "pydantic-settings>=2.6.1"
3. Set up the configuration 🔑
Create an `anthropic.env` file with your API key and other configuration parameters
API_KEY="YOUR_API_GOES_HERE"
MAX_TOKENS=300
MODEL_NAME=claude-3-5-sonnet-20241022
and load them easily into your Python runtime with this pydantic-settings class.
from pydantic_settings import BaseSettings, SettingsConfigDict
class AnthropicConfig(BaseSettings):
model_config = SettingsConfigDict(env_file='anthropic.env', env_file_encoding='utf-8')
api_key: str
model_name: str = "claude-3-opus-20240229"
max_tokens: int = 300
anthropic_config = AnthropicConfig()
4. Define the LLM output with Pydantic 📤
We create a pydantic class with the field names and descriptions we want the LLM to output.
from typing import Literal
from pydantic import BaseModel, Field
class NewsMarketSignal(BaseModel):
"""
Market signal of a news article about the crypto market
"""
signal: Literal["bullish", "bearish", "neutral"] = Field(description="""
The market signal of the news article about the crypto market.
Set it as neutral if the text is not related to crypto market, or not enough
information to determine the signal.
""")
reasoning: str = Field(description="""
The reasoning for the market signal.
Set it as non-relevant if the text is not related to crypto market
Set it as non-enough-info if there is not enough information to determine the signal
""")
It is very important you add Field descriptions. These provide the context your LLM needs to solve the task effectively.
5. Define the prompt template 📜
This system prompt instructs the language model on its expected behavior and functionality. For example
self.prompt = PromptTemplate(
"""
You are an expert at analyzing news articles related to cryptocurrencies and extracting market signal from the text in a structured format.
Here is the news article:
{text}
""")
6. Create a client object to interact with your LLM 🤝
from .config import AnthropicConfig
config = AnthropicConfig()
from llama_index.llms.anthropic import Anthropic
self.llm = Anthropic(
model=config.model_name,
max_tokens=config.max_tokens,
api_key=config.api_key,
)
7. Send a request to your LLM 📨
We want to
generate a structured output as specified in the NewsMarketSignal class,
using the system prompt self.prompt, and
for a given news text
response = self.llm.structured_predict(
NewsMarketSignal, self.prompt, text=text
)
You can encapsulate and organize this logic with a custom Python object
# claude.py
class ClaudeMarketSignalExtractor:
"""
A class to extract market signal from text using Claude's API
"""
def __init__(self):
"""
Initialize the ClaudeMarketSignalExtractor.
"""
config = AnthropicConfig()
self.llm = Anthropic(
model=config.model_name,
max_tokens=config.max_tokens,
api_key=config.api_key,
)
self.prompt = PromptTemplate(
"""
You are an expert at analyzing news articles related to cryptocurrencies and
extracting market signal from the text in a structured format.
Here is the news article:
{text}
"""
)
def get_signal(self, text: str) -> NewsMarketSignal:
"""
Extract the market signal from the text.
Args:
text (str): The text to extract the sentiment from.
Returns:
SentimentScore: The sentiment score.
"""
response = self.llm.structured_predict(
NewsMarketSignal, self.prompt, text=text
)
return response
Finally, to see it in action run
uv run python -m crypto_sentiment_extractor.claude
For example:
Trump Said appoints Crypto Lawyer Teresa Goody Guillén to Lead SEC
{
"signal": "bullish",
"reasoning": "The appointment of Teresa Goody Guill\u00e9n, a crypto lawyer, to lead the SEC under Trump's potential administration is likely bullish for the crypto market. This is significant because:\n1. Having a crypto lawyer in charge of the SEC suggests a more crypto-friendly regulatory approach\n2. This could potentially lead to more favorable policies and regulations for the cryptocurrency industry\n3. The appointment of someone with crypto expertise indicates a shift from the current SEC's relatively strict stance on crypto\n4. This could potentially lead to clearer regulatory frameworks and better institutional adoption"
}
FED to increase interest rates
{
"signal": "bearish",
"reasoning": "The news about FED increasing interest rates is typically bearish for crypto markets because:\n1. Higher interest rates make risk assets like cryptocurrencies less attractive to investors\n2. It reduces market liquidity as borrowing becomes more expensive\n3. Historically, crypto prices have shown negative correlation with interest rate hikes\n4. Investors tend to move capital from speculative assets to safer yield-bearing instruments during rate hike cycles"
}
The grass is green
{
"signal": "neutral",
"reasoning": "The text \"The grass is green\" is completely unrelated to the cryptocurrency market or any financial markets. It's a simple statement about nature/landscaping. Therefore, there is no relevant market signal to extract."
}
Next steps 👣
What we have built is a proof of concept. Which means it “seems to work”, but that is not enough.
To make it work in a robust manner, you need to evaluate the quality of these outputs. But this is something that we will leave for another day 😉
Wanna build this system with me? 🫵
On December 2nd, 229 brave students and myself will start building a real time ML system to predict crypto prices, in my live course Building a Real Time ML System. Together.
For that, we will build a real time ML pipeline to parse crypto market sentiment from raw news.
No pre-recorded session.
Everything is live.
You and me.
Step by step.
From zero to SYSTEM.
It will take us at least 4 weeks, and more than 40 hours of live coding sessions, to go from idea to a fully working system, that we will deploy to Kubernetes.
Along the way you will learn
Universal MLOps design principles
Tons of Python tricks
Feature engineering in real time
LLMs engineer market signals from unstructured data
Some Rust magic
.. and more
Gift 🎁
As a subscriber to the Real World ML Newsletter you have exclusive access to a 40% discount. For a few more hours you can still access it at a special price.
Talk to you next week,
Wish you a great weekend,
Pau