|
--- |
|
language: en |
|
tags: |
|
- text-classification |
|
- tensorflow |
|
- roberta |
|
datasets: |
|
- go_emotions |
|
license: mit |
|
--- |
|
|
|
Connect me on LinkedIn |
|
- [linkedin.com/in/arpanghoshal](https://www.linkedin.com/in/arpanghoshal) |
|
|
|
|
|
## What is GoEmotions |
|
|
|
Dataset labelled 58000 Reddit comments with 28 emotions |
|
|
|
- admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral |
|
|
|
|
|
## What is RoBERTa |
|
|
|
RoBERTa builds on BERT’s language masking strategy and modifies key hyperparameters in BERT, including removing BERT’s next-sentence pretraining objective, and training with much larger mini-batches and learning rates. RoBERTa was also trained on an order of magnitude more data than BERT, for a longer amount of time. This allows RoBERTa representations to generalize even better to downstream tasks compared to BERT. |
|
|
|
|
|
## Hyperparameters |
|
|
|
| Parameter | | |
|
| ----------------- | :---: | |
|
| Learning rate | 5e-5 | |
|
| Epochs | 10 | |
|
| Max Seq Length | 50 | |
|
| Batch size | 16 | |
|
| Warmup Proportion | 0.1 | |
|
| Epsilon | 1e-8 | |
|
|
|
|
|
## Results |
|
|
|
Best Result of `Macro F1` - 49.30% |
|
|
|
## Usage |
|
|
|
```python |
|
|
|
from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, pipeline |
|
|
|
tokenizer = RobertaTokenizerFast.from_pretrained("arpanghoshal/EmoRoBERTa") |
|
model = TFRobertaForSequenceClassification.from_pretrained("arpanghoshal/EmoRoBERTa") |
|
|
|
emotion = pipeline('sentiment-analysis', |
|
model='arpanghoshal/EmoRoBERTa') |
|
|
|
emotion_labels = emotion("Thanks for using it.") |
|
print(emotion_labels) |
|
|
|
``` |
|
Output |
|
|
|
``` |
|
[{'label': 'gratitude', 'score': 0.9964383244514465}] |
|
``` |
|
|
|
|