Testing and Inference
This repository, co-authored by Andreas Traczyk, is designed specifically for testing and inference against various models. You can access the code on GitHub. Use this repository to effectively compare different models and analyze their performance.
https://github.com/seandearnaley/llama_3_8b_sentiment_analysis_tests
This repository contains a Python project for testing and comparing the performance of various models on sentiment analysis tasks. The project utilizes the Ollama library for local model inference and includes scripts for running sentiment tests, generating comparison reports, and visualizing the results.
Follow the README.md for install instructions.
Specialized Prompting
This is an important step, the dataset was partially built using these prompting techniques (synthetic data), and often you can use this instead of fine tuning, we really want to evaluate whether we’re actually getting better performance in our fine tunes and whether its even worth doing. The goal is to get reliable JSON results back that pass the pydantic validation, we want JSON because it’s easy to pass into python functions (eg function calling).
Here is a special system prompt:
You are an advanced AI assistant created to perform sentiment analysis on financial news articles. I need you to classify each article you receive and provide your analysis using the following JSON schema:
{
"reasoning": {
"type": "string",
"description": "A brief description explaining the logic used to determine the numeric sentiment value.",
"required": true
},
"sentiment": {
"type": "number",
"description": "A floating-point representation of the sentiment of the news article, rounded to two decimal places. Scale ranges from -1.0 (negative) to 1.0 (positive), where 0.0 represents neutral sentiment.",
"required": true
},
"confidence": {
"type": "number",
"description": "A floating-point representation of how confident the analysis is, rounded to two decimal places. Scale ranges from 0.0 (not confident) to 1.0 (very confident).",
"required": true
}
}
Always respond with a valid JSON object adhering to this schema. Do not include any other text or messages in your response. Exclude markdown.
and we initialize the thread with 5 examples (known as 5-shot prompting, the fine tunes give us 0-shot, but specialized to this one specific task):
You will be provided with a financial news article enclosed within the following XML tags:
<article>{$ARTICLE}</article>
Your task is to carefully read the article and analyze the sentiment it expresses towards the potential future stock value of the company mentioned.
First, write out your reasoning and analysis of the article's sentiment inside the "reasoning" property. Explain the key points in the article that influence your assessment of the sentiment and how they would likely impact the stock price.
Then, output a numeric score between -1.0 and 1.0 representing the sentiment, where -1.0 is the most negative, 0 is neutral, and 1.0 is the most positive. Put this score inside the "sentiment" property.
Provide a sentiment value as a function of how positive or negative the sentiment is. If no conclusion can be drawn, provide a sentiment value of 0.0.
Provide a confidence value as a function of how confident you are in the sentiment value. If you are very confident, provide a confidence value of 1.0. If you are unsure, provide a confidence value of 0.0.
Make no alterations to the schema. This is important for our company.
Examples:
1. <article>NVDA shares rise 5% on earnings beat.</article>
Output:
{
"reasoning": "The news article reports a positive earnings beat, which is likely to increase investor confidence and, consequently, the stock value of NVDA.",
"sentiment": 0.75,
"confidence": 0.9
}
2. <article>NVDA shares may be affected by a drop in oil prices. Analysts predict a 5% drop in stock value due to NVDA's exposure to the energy sector.</article>
Output:
{
"reasoning": "The article suggests a potential negative impact on NVDA stock due to falling oil prices, which could lead to decreased investor confidence.",
"sentiment": -0.25,
"confidence": 0.8
}
3. <article>Apple's recent launch of its innovative AR glasses has not met expected sales targets.</article>
Output:
{
"reasoning": "Despite the innovative product launch, the failure to meet sales targets could lead to negative market reactions and a potential drop in Apple's stock value.",
"sentiment": -0.5,
"confidence": 0.6
}
4. <article>Boeing secures a $5 billion contract for new aircrafts from Emirates, signaling strong future revenues.</article>
Output:
{
"reasoning": "Securing a large contract suggests positive future revenue prospects for Boeing, likely boosting investor sentiment and stock value.",
"sentiment": 0.85,
"confidence": 0.9
}
5. Determine the sentiment towards the stock value of Tesla from the following article:
<article>Tesla recalls 100,000 vehicles due to safety concerns.</article>
Output:
{
"reasoning": "A significant recall due to safety issues could harm Tesla's brand reputation and negatively impact investor confidence, likely decreasing its stock value.",
"sentiment": -0.65,
"confidence": 0.7
}
Code Overview
generate_model_sentiments.py : This script runs sentiment analysis tests on a specified company using the models defined in the config.yaml file. It retrieves news articles related to the company, extracts relevant content, and analyzes the sentiment of each article using the specified models. The results are saved as JSON files in the sentiments folder.
generate_model_comparison_report.py : This script generates a comparison report based on the sentiment analysis results generated by generate_model_sentiments.py . It calculates various metrics for each model, including inference rate, sentiment variance, mean sentiment, and mean confidence. It also performs statistical comparisons between models using ANOVA and t-tests. The report is saved as an Excel spreadsheet with CSV’s in the reports folder.
: This script generates a comparison report based on the sentiment analysis results generated by . It calculates various metrics for each model, including inference rate, sentiment variance, mean sentiment, and mean confidence. It also performs statistical comparisons between models using ANOVA and t-tests. The report is saved as an Excel spreadsheet with CSV’s in the folder. utils : This folder contains utility modules used by the main scripts.
: This folder contains utility modules used by the main scripts. analysis_utils.py : Provides functions for cleaning company names, filtering news articles, testing models, and analyzing content.
: Provides functions for cleaning company names, filtering news articles, testing models, and analyzing content. context.py : Defines the AnalysisContext dataclass, which encapsulates the context for sentiment analysis.
: Defines the dataclass, which encapsulates the context for sentiment analysis. error_decorator.py : Provides a decorator for handling errors gracefully.
: Provides a decorator for handling errors gracefully. file_utils.py : Provides functions for reading and writing files, including JSON and YAML files.
: Provides functions for reading and writing files, including JSON and YAML files. validation_utils.py : Provides functions for validating JSON data and parsing numeric values.
: Provides functions for validating JSON data and parsing numeric values. web_scraper.py : Provides functions for scraping content from websites.
generate_model_sentiments.py will do a configurable number of iterations and we are going to run tests over the averages for variance etc. The JSON file for each iteration looks like this, each sentiment is hashed from the article which is pre-cached so we’re evaluating the same thing for each iteration:
{
"average_sentiment": 0.57,
"time_taken": 53.91,
"sentiments": {
...
"91ba90ac": {
"reasoning": "The article reports that the market is holding near record highs, with several companies such as BYD, Nvidia, and Walmart flashing buy signals, indicating a positive sentiment towards these stocks.",
"sentiment": 0.6,
"confidence": 0.8,
"valid": true,
"url": "https://finance.yahoo.com/m/ae28caa6-3ead-3745-aece-9ddb64e2ea1d/dow-jones-futures%3A-walmart%2C.html?.tsrc=rss",
"published": "Thu, 16 May 2024 23:52:02 +0000",
"time_taken": 3.17
},
"bf372e87": {
"reasoning": "The article reports that Nvidia's stock finished lower on Thursday, despite being on track to set a record high due to optimism around the chip maker ahead of its earnings report next week. The marketwide rally sparked by April's inflation data and upbeat analyst estimates lifted Nvidia shares, but ultimately led to a 0.3% decline.",
"sentiment": -0.15,
"confidence": 0.8,
"valid": true,
"url": "https://finance.yahoo.com/m/6ab7d488-38e1-3ef1-beef-bf75a726d6c2/nvidia-stock-couldn%E2%80%99t-close.html?.tsrc=rss",
"published": "Thu, 16 May 2024 20:30:00 +0000",
"time_taken": 5.12
},
"d4c4ccc1": {
"reasoning": "The article discusses Wolfe Research's positive outlook on Nvidia (NVDA) and Advanced Micro Devices (AMD), with a price target increase for Nvidia to $1,200. The addition of AMD to the Wolfe Alpha List highlights its robust AI product lineup, indicating potential growth opportunities. The analyst's tactical shift in priority towards AMD suggests a more balanced approach considering both stocks' performance.",
"sentiment": 0.75,
"confidence": 0.85,
"valid": true,
"url": "https://finance.yahoo.com/video/chip-stocks-wolfe-research-bullish-201319388.html?.tsrc=rss",
"published": "Thu, 16 May 2024 20:13:19 +0000",
"time_taken": 6.24
},
...
}
}
Methodology
We are using a Mac Pro M2 with 32gb. Ollama 0.1.38. We do 15 iterations on the same set of news articles from Yahoo Finance.
default_temperature: 0.2
context_window_size: 8192
num_tokens_to_predict: 1024
Ollama
Ollama is an tool that enables users to run open LLMs locally on their machines, eliminating the need for cloud services. Its a front end for llama.cpp and can load GGUF models. Designed for ease of use, it offers a simple API, OpenAI endpoint compatibility (eg can work with anything that supports OpenAI) and a library of pre-built models. Ollama runs on macOS, Linux, and Windows, can use CPU and GPU, it integrates seamlessly with popular frameworks like LangChain, LiteLLM and more. By providing local execution, , it ensures data privacy and reduces latency, making it an ideal choice for developers and researchers looking to leverage advanced NLP capabilities efficiently.
You can download our GGUF fine tuned models @ HuggingFace
Loading GGUFs into Ollama needs a custom Modelfile with the system message and template, remember to substitute the GGUF file for the quantization level you are using here we’re using llama3-8b-sentiment-may-3-2024-unsloth.Q4_K_M.gguf , you can name it whatever you want when importing into Ollama:
ollama create llama3:8b-instruct-sentiment_analysis-q4_K_M -f Modelfile