cardiffnlp/twitter-roberta-base-2019-90m-tweet-topic-multi-2020
This model is a fine-tuned version of cardiffnlp/twitter-roberta-base-2019-90m on the tweet_topic_multi. This model is fine-tuned on train_2020
split and validated on test_2021
split of tweet_topic.
Fine-tuning script can be found here. It achieves the following results on the test_2021 set:
- F1 (micro): 0.7367104440275171
- F1 (macro): 0.5656244617373364
- Accuracy: 0.5134008338296605
Usage
import math
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
def sigmoid(x):
return 1 / (1 + math.exp(-x))
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-2019-90m-tweet-topic-multi-2020")
model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-2019-90m-tweet-topic-multi-2020", problem_type="multi_label_classification")
model.eval()
class_mapping = model.config.id2label
with torch.no_grad():
text = #NewVideo Cray Dollas- Water- Ft. Charlie Rose- (Official Music Video)- {{URL}} via {@YouTube@} #watchandlearn {{USERNAME}}
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)
flags = [sigmoid(s) > 0.5 for s in output[0][0].detach().tolist()]
topic = [class_mapping[n] for n, i in enumerate(flags) if i]
print(topic)
Reference
@inproceedings{dimosthenis-etal-2022-twitter,
title = "{T}witter {T}opic {C}lassification",
author = "Antypas, Dimosthenis and
Ushio, Asahi and
Camacho-Collados, Jose and
Neves, Leonardo and
Silva, Vitor and
Barbieri, Francesco",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics"
}
- Downloads last month
- 17
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train cardiffnlp/twitter-roberta-base-2019-90m-tweet-topic-multi-2020
Evaluation results
- F1 on cardiffnlp/tweet_topic_multiself-reported0.737
- F1 (macro) on cardiffnlp/tweet_topic_multiself-reported0.566
- Accuracy on cardiffnlp/tweet_topic_multiself-reported0.513