|
--- |
|
license: mit |
|
tags: |
|
- sentiment analysis |
|
- financial sentiment analysis |
|
- bert |
|
- text-classification |
|
- finance |
|
- finbert |
|
- financial |
|
--- |
|
# Trading Hero Financial Sentiment Analysis |
|
|
|
Model Description: This model is a fine-tuned version of [FinBERT](https://huggingface.co./yiyanghkust/finbert-pretrain), a BERT model pre-trained on financial texts. The fine-tuning process was conducted to adapt the model to specific financial NLP tasks, enhancing its performance on domain-specific applications for sentiment analysis. |
|
|
|
## Model Use |
|
|
|
Primary Users: Financial analysts, NLP researchers, and developers working on financial data. |
|
|
|
## Training Data |
|
|
|
Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows: |
|
|
|
Training Set: 10,918,272 tokens |
|
|
|
Validation Set: 1,213,184 tokens |
|
|
|
Test Set: 1,347,968 tokens |
|
|
|
Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including: |
|
|
|
Corporate Reports (10-K & 10-Q): 2.5 billion tokens |
|
|
|
Earnings Call Transcripts: 1.3 billion tokens |
|
|
|
Analyst Reports: 1.1 billion tokens |
|
|
|
## Evaluation |
|
|
|
* Test Accuracy = 0.908469 |
|
* Test Precision = 0.927788 |
|
* Test Recall = 0.908469 |
|
* Test F1 = 0.913267 |
|
* **Labels**: 0 -> Neutral; 1 -> Positive; 2 -> Negative |
|
|
|
|
|
## Usage |
|
|
|
``` |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
tokenizer = AutoTokenizer.from_pretrained("fuchenru/Trading-Hero-LLM") |
|
model = AutoModelForSequenceClassification.from_pretrained("fuchenru/Trading-Hero-LLM") |
|
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
# Preprocess the input text |
|
def preprocess(text, tokenizer, max_length=128): |
|
inputs = tokenizer(text, truncation=True, padding='max_length', max_length=max_length, return_tensors='pt') |
|
return inputs |
|
|
|
# Function to perform prediction |
|
def predict_sentiment(input_text): |
|
# Tokenize the input text |
|
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True) |
|
|
|
# Perform inference |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Get predicted label |
|
predicted_label = torch.argmax(outputs.logits, dim=1).item() |
|
|
|
# Map the predicted label to the original labels |
|
label_map = {0: 'neutral', 1: 'positive', 2: 'negative'} |
|
predicted_sentiment = label_map[predicted_label] |
|
|
|
return predicted_sentiment |
|
|
|
stock_news = [ |
|
"Market analysts predict a stable outlook for the coming weeks.", |
|
"The market remained relatively flat today, with minimal movement in stock prices.", |
|
"Investor sentiment improved following news of a potential trade deal.", |
|
....... |
|
] |
|
|
|
|
|
for i in stock_news: |
|
predicted_sentiment = predict_sentiment(i) |
|
print("Predicted Sentiment:", predicted_sentiment) |
|
``` |
|
``` |
|
Predicted Sentiment: neutral |
|
Predicted Sentiment: neutral |
|
Predicted Sentiment: positive |
|
``` |
|
|
|
## Citation |
|
|
|
``` |
|
@misc{yang2020finbert, |
|
title={FinBERT: A Pretrained Language Model for Financial Communications}, |
|
author={Yi Yang and Mark Christopher Siy UY and Allen Huang}, |
|
year={2020}, |
|
eprint={2006.08097}, |
|
archivePrefix={arXiv}, |
|
} |
|
``` |