Crypto sentiment with LLMs

Nov 23, 2024

Say you want to build an ML service to extract crypto market signals from financial news.

The output of such service can be piped into a predictive ML model, together with other predictive features, like real time market prices for all currencies, to come up with the best price prediction possible.

This is, by the way, what we will do in the next cohort of Building a Real-Time ML System. Together.
👉 Read all the details here

Let me show you how to solve this real world problem, using LLMs and a bit of prompt engineering.

The problem

We want to build a Python microservice that

given a news headline

news = "FED to increase interest rates"

outputs a market signal and the reasoning behind this score.

{
  "signal": "bearish",
  "reasoning": "The news about FED increasing interest rates is typically bearish for crypto markets for several reasons:\n1. Higher interest rates make borrowing more expensive, reducing liquidity in the market\n2. Higher rates make traditional yield-bearing investments more attractive compared to crypto\n3. Risk assets like cryptocurrencies tend to perform poorly in high interest rate environments\n4. Historically, crypto prices have shown negative correlation with interest rate hikes"
}

The signal is a categorical with 3 possible values

bullish (positive market impact)
neutral (neutral/unclear impact), or
bearish (negative market impact)

Transforming raw input into a structured output is something LLMs excel at, so it makes sense we give them a try here.

Let me show you how to implement this in Python, using a powerful LLM and a bit of prompt engineering.

You can find all the source code in this repo.
👉 Github repo

The solution 🧠

We want our model to ingest raw textual data, and output a structured response.

The fastest way I know to solve this problem is by

using a strong LLM with function calling capabilities (for example Claude), and
a library like llama index, to make sure the output has the desired format.

In this example I will be using Antrhopic’s Claude, but feel free to replace it with another LLMs, for example
OpenAI, or
Llama 3.2 running locally with Ollama.
If you want to follow along with me, you will need to get your API key here and spend less $0,01.

These are the steps to build this.

1. Bootstrap the project 🏗️

I started using uv as my Python build tool. If you haven’t, I recommend you give it a try.

Install uv on Linux/Mac

curl -LsSf https://astral.sh/uv/install.sh | sh

or on Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Then bootstrap your project from the command line with this command

uv init crypto-sentiment-extractor --lib
cd crypto-sentiment-extractor

2. Install Python dependencies 🛠️

which are

llama-index and llama-index-llms-anthropic to interact with Claude easily, and control the output format
```
uv add llama-index-llms-anthropic
uv add llama-index
```
pydantic to supercharge our Python code with type validations
```
uv add pydantic
```

pydantic-settings to load and validate configuration parameters easily
```
uv add "pydantic-settings>=2.6.1"
```

3. Set up the configuration 🔑

Create an `anthropic.env` file with your API key and other configuration parameters

API_KEY="YOUR_API_GOES_HERE"
MAX_TOKENS=300
MODEL_NAME=claude-3-5-sonnet-20241022

and load them easily into your Python runtime with this pydantic-settings class.

from pydantic_settings import BaseSettings, SettingsConfigDict

class AnthropicConfig(BaseSettings):
    model_config = SettingsConfigDict(env_file='anthropic.env', env_file_encoding='utf-8')
    api_key: str
    model_name: str = "claude-3-opus-20240229"
    max_tokens: int = 300

anthropic_config = AnthropicConfig()

4. Define the LLM output with Pydantic 📤

We create a pydantic class with the field names and descriptions we want the LLM to output.

from typing import Literal
from pydantic import BaseModel, Field

class NewsMarketSignal(BaseModel):
    """
    Market signal of a news article about the crypto market
    """
    signal: Literal["bullish", "bearish", "neutral"] = Field(description="""
        The market signal of the news article about the crypto market.
        Set it as neutral if the text is not related to crypto market, or not enough
        information to determine the signal.
    """)

    reasoning: str = Field(description="""
        The reasoning for the market signal.
        Set it as non-relevant if the text is not related to crypto market
        Set it as non-enough-info if there is not enough information to determine the signal
    """)

It is very important you add Field descriptions. These provide the context your LLM needs to solve the task effectively.

5. Define the prompt template 📜

This system prompt instructs the language model on its expected behavior and functionality. For example

self.prompt = PromptTemplate(
"""
You are an expert at analyzing news articles related to cryptocurrencies and extracting market signal from the text in a structured format.

Here is the news article:
{text}
""")

6. Create a client object to interact with your LLM 🤝

from .config import AnthropicConfig
config = AnthropicConfig()

from llama_index.llms.anthropic import Anthropic
self.llm = Anthropic(
    model=config.model_name,
    max_tokens=config.max_tokens,
    api_key=config.api_key,
)

7. Send a request to your LLM 📨

We want to

generate a structured output as specified in the NewsMarketSignal class,
using the system prompt self.prompt, and
for a given news text

response = self.llm.structured_predict(
    NewsMarketSignal, self.prompt, text=text
)

You can encapsulate and organize this logic with a custom Python object

# claude.py

class ClaudeMarketSignalExtractor:
    """
    A class to extract market signal from text using Claude's API
    """
    def __init__(self):
        """
        Initialize the ClaudeMarketSignalExtractor.
        """
        config = AnthropicConfig()
        
        self.llm = Anthropic(
            model=config.model_name,
            max_tokens=config.max_tokens,
            api_key=config.api_key,
        )

        self.prompt = PromptTemplate(
            """
            You are an expert at analyzing news articles related to cryptocurrencies and
            extracting market signal from the text in a structured format.

            Here is the news article:
            {text}
            """
        )

    def get_signal(self, text: str) -> NewsMarketSignal:
        """
        Extract the market signal from the text.

        Args:
            text (str): The text to extract the sentiment from.

        Returns:
            SentimentScore: The sentiment score.
        """
        response = self.llm.structured_predict(
            NewsMarketSignal, self.prompt, text=text
        )

        return response

Finally, to see it in action run

uv run python -m crypto_sentiment_extractor.claude

For example:

Trump Said appoints Crypto Lawyer Teresa Goody Guillén to Lead SEC
{
  "signal": "bullish",
  "reasoning": "The appointment of Teresa Goody Guill\u00e9n, a crypto lawyer, to lead the SEC under Trump's potential administration is likely bullish for the crypto market. This is significant because:\n1. Having a crypto lawyer in charge of the SEC suggests a more crypto-friendly regulatory approach\n2. This could potentially lead to more favorable policies and regulations for the cryptocurrency industry\n3. The appointment of someone with crypto expertise indicates a shift from the current SEC's relatively strict stance on crypto\n4. This could potentially lead to clearer regulatory frameworks and better institutional adoption"
}

FED to increase interest rates
{
  "signal": "bearish",
  "reasoning": "The news about FED increasing interest rates is typically bearish for crypto markets because:\n1. Higher interest rates make risk assets like cryptocurrencies less attractive to investors\n2. It reduces market liquidity as borrowing becomes more expensive\n3. Historically, crypto prices have shown negative correlation with interest rate hikes\n4. Investors tend to move capital from speculative assets to safer yield-bearing instruments during rate hike cycles"
}

The grass is green
{
  "signal": "neutral",
  "reasoning": "The text \"The grass is green\" is completely unrelated to the cryptocurrency market or any financial markets. It's a simple statement about nature/landscaping. Therefore, there is no relevant market signal to extract."
}

Next steps 👣

What we have built is a proof of concept. Which means it “seems to work”, but that is not enough.

To make it work in a robust manner, you need to evaluate the quality of these outputs. But this is something that we will leave for another day 😉

Wanna build this system with me? 🫵

On December 2nd, 229 brave students and myself will start building a real time ML system to predict crypto prices, in my live course Building a Real Time ML System. Together.

For that, we will build a real time ML pipeline to parse crypto market sentiment from raw news.

No pre-recorded session.
Everything is live.
You and me.
Step by step.
From zero to SYSTEM.

It will take us at least 4 weeks, and more than 40 hours of live coding sessions, to go from idea to a fully working system, that we will deploy to Kubernetes.

Along the way you will learn

Universal MLOps design principles
Tons of Python tricks
Feature engineering in real time
LLMs engineer market signals from unstructured data
Some Rust magic
.. and more

Gift 🎁

As a subscriber to the Real World ML Newsletter you have exclusive access to a 40% discount. For a few more hours you can still access it at a special price.

Get 40% Off TODAY

Talk to you next week,

Wish you a great weekend,

Pau

Real-World Machine Learning

Discussion about this post