fuchenru commited on
Commit
51ec3c2
1 Parent(s): 077edb9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -3
README.md CHANGED
@@ -1,3 +1,89 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - sentiment analysis
5
+ - financial sentiment analysis
6
+ - bert
7
+ - text-classification
8
+ - finance
9
+ - finbert
10
+ - financial
11
+ ---
12
+ # Trading Hero Financial Sentiment Analysis
13
+
14
+ Model Description: This model is a fine-tuned version of [FinBERT](https://huggingface.co/yiyanghkust/finbert-pretrain), a BERT model pre-trained on financial texts. The fine-tuning process was conducted to adapt the model to specific financial NLP tasks, enhancing its performance on domain-specific applications for sentiment analysis.
15
+
16
+ ## Model Use
17
+
18
+ Primary Users: Financial analysts, NLP researchers, and developers working on financial data.
19
+
20
+ ## Training Data
21
+
22
+ Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows:
23
+
24
+ Training Set: 10,918,272 tokens
25
+ Validation Set: 1,213,184 tokens
26
+ Test Set: 1,347,968 tokens
27
+
28
+ Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including:
29
+ Corporate Reports (10-K & 10-Q): 2.5 billion tokens
30
+ Earnings Call Transcripts: 1.3 billion tokens
31
+ Analyst Reports: 1.1 billion tokens
32
+
33
+ ## Evaluation
34
+
35
+ * Test Accuracy = 0.908469
36
+ * Test Precision = 0.927788
37
+ * Test Recall = 0.908469
38
+ * Test F1 = 0.913267
39
+ * **Labels**: 0 -> Neutral; 1 -> Positive; 2 -> Negative
40
+
41
+
42
+ ## Usage
43
+
44
+ ```
45
+ import torch
46
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
47
+ tokenizer = AutoTokenizer.from_pretrained("fuchenru/Trading-Hero-LLM")
48
+ model = AutoModelForSequenceClassification.from_pretrained("fuchenru/Trading-Hero-LLM")
49
+ nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
50
+ # Preprocess the input text
51
+ def preprocess(text, tokenizer, max_length=128):
52
+ inputs = tokenizer(text, truncation=True, padding='max_length', max_length=max_length, return_tensors='pt')
53
+ return inputs
54
+
55
+ # Function to perform prediction
56
+ def predict_sentiment(input_text):
57
+ # Tokenize the input text
58
+ inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True)
59
+
60
+ # Perform inference
61
+ with torch.no_grad():
62
+ outputs = model(**inputs)
63
+
64
+ # Get predicted label
65
+ predicted_label = torch.argmax(outputs.logits, dim=1).item()
66
+
67
+ # Map the predicted label to the original labels
68
+ label_map = {0: 'neutral', 1: 'positive', 2: 'negative'}
69
+ predicted_sentiment = label_map[predicted_label]
70
+
71
+ return predicted_sentiment
72
+
73
+ stock_news = [
74
+ "Market analysts predict a stable outlook for the coming weeks.",
75
+ "The market remained relatively flat today, with minimal movement in stock prices.",
76
+ "Investor sentiment improved following news of a potential trade deal.",
77
+ .......
78
+ ]
79
+
80
+
81
+ for i in stock_news:
82
+ predicted_sentiment = predict_sentiment(i)
83
+ print("Predicted Sentiment:", predicted_sentiment)
84
+ ```
85
+ ```
86
+ Predicted Sentiment: neutral
87
+ Predicted Sentiment: neutral
88
+ Predicted Sentiment: positive
89
+ ```