dnozza commited on
Commit
ce0490b
·
1 Parent(s): f4a4071

README extended

Browse files
Files changed (1) hide show
  1. README.md +27 -5
README.md CHANGED
@@ -10,8 +10,8 @@ tags:
10
  ## Abstract
11
 
12
  Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad?
13
- An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce FEEL-IT, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: anger, fear, joy, sadness. By collapsing them, we can also do sentiment analysis. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results.
14
- We release an open-source Python library, so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text.
15
 
16
  | Model | Download |
17
  | ------ | -------------------------|
@@ -21,15 +21,18 @@ We release an open-source Python library, so researchers can use a model trained
21
 
22
  ## Model
23
 
24
- The feel-it-italian-sentiment model performs sentiment analysis. We fine-tuned the [UmBERTo model](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1) on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different data sets.
25
 
26
  ## Data
27
 
28
- Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper.
29
 
30
  ## Performance
31
 
32
- We evaluate our performance using [SENTIPOLC16 Evalita](http://www.di.unito.it/~tutreeb/sentipolc-evalita16/data.html). This dataset comes with a training set and a testing set and thus we can compare the performance of different training datasets on the SENTIPOLC test set We collapsed the FEEL-IT classes into 2 by mapping joy to the positive class and anger, fear and sadness into the negative class.
 
 
 
33
 
34
  We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide better results on the SENTIPOLC test set than those that can be obtained with the SENTIPOLC training set.
35
 
@@ -42,10 +45,29 @@ We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide b
42
  ## Usage
43
 
44
  ```python
 
 
45
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
46
 
 
47
  tokenizer = AutoTokenizer.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
48
  model = AutoModelForSequenceClassification.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ```
50
 
51
  ## Citation
 
10
  ## Abstract
11
 
12
  Sentiment analysis is a common task to understand people's reactions online. Still, we often need more nuanced information: is the post negative because the user is angry or because they are sad?
13
+ An abundance of approaches has been introduced for tackling both tasks. However, at least for Italian, they all treat only one of the tasks at a time. We introduce *FEEL-IT*, a novel benchmark corpus of Italian Twitter posts annotated with four basic emotions: **anger, fear, joy, sadness**. By collapsing them, we can also do **sentiment analysis**. We evaluate our corpus on benchmark datasets for both emotion and sentiment classification, obtaining competitive results.
14
+ We release an [open-source Python library](https://github.com/MilaNLProc/feel-it), so researchers can use a model trained on FEEL-IT for inferring both sentiments and emotions from Italian text.
15
 
16
  | Model | Download |
17
  | ------ | -------------------------|
 
21
 
22
  ## Model
23
 
24
+ The *feel-it-italian-sentiment* model performs **sentiment analysis** on Italian. We fine-tuned the [UmBERTo model](https://huggingface.co/Musixmatch/umberto-commoncrawl-cased-v1) on our new dataset (i.e., FEEL-IT) obtaining state-of-the-art performances on different benchmark corpus.
25
 
26
  ## Data
27
 
28
+ Our data has been collected by annotating tweets from a broad range of topics. In total, we have 2037 tweets annotated with an emotion label. More details can be found in our paper (preprint available soon).
29
 
30
  ## Performance
31
 
32
+ We evaluate our performance using [SENTIPOLC16 Evalita](http://www.di.unito.it/~tutreeb/sentipolc-evalita16/). We collapsed the FEEL-IT classes into 2 by mapping joy to the *positive* class and anger, fear and sadness into the *negative* class. We compare three different training dataset combinations to understand whether it is better to train on FEEL-IT, SP16, or both by testing on the SP16 test set.
33
+
34
+
35
+ This dataset comes with a training set and a testing set and thus we can compare the performance of different training datasets on the SENTIPOLC test set.
36
 
37
  We use the fine-tuned UmBERTo model. The results show that FEEL-IT can provide better results on the SENTIPOLC test set than those that can be obtained with the SENTIPOLC training set.
38
 
 
45
  ## Usage
46
 
47
  ```python
48
+ import torch
49
+ import numpy as np
50
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
51
 
52
+ # Load model and tokenizer
53
  tokenizer = AutoTokenizer.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
54
  model = AutoModelForSequenceClassification.from_pretrained("MilaNLProc/feel-it-italian-sentiment")
55
+
56
+ sentence = 'Oggi sono proprio contento!'
57
+ inputs = tokenizer(sentence, return_tensors="pt")
58
+
59
+ # Call the model and get the logits
60
+ labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
61
+ outputs = model(**inputs, labels=labels)
62
+ loss, logits = outputs[:2]
63
+ logits = logits.squeeze(0)
64
+
65
+ # Extract probabilities
66
+ proba = torch.nn.functional.softmax(logits, dim=0)
67
+
68
+ # Unpack the tensor to obtain negative and positive probabilities
69
+ negative, positive = proba
70
+ print(f"Probabilities: Negative {np.round(negative.item(),4)} - Positive {np.round(positive.item(),4)}")
71
  ```
72
 
73
  ## Citation