roberta-base_topic_classification_nyt_news
This model is a fine-tuned version of roberta-base on the NYT News dataset, which contains 256,000 news titles from articles published from 2000 to the present (https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present). It achieves the following results on the test set of 51200 cases:
- Accuracy: 0.91
- F1: 0.91
- Precision: 0.91
- Recall: 0.91
Training data
Training data was classified as follow:
class | Description |
---|---|
0 | Sports |
1 | Arts, Culture, and Entertainment |
2 | Business and Finance |
3 | Health and Wellness |
4 | Lifestyle and Fashion |
5 | Science and Technology |
6 | Politics |
7 | Crime |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|---|---|---|
0.3192 | 1.0 | 20480 | 0.4078 | 0.8865 | 0.8859 | 0.8892 | 0.8865 |
0.2863 | 2.0 | 40960 | 0.4271 | 0.8972 | 0.8970 | 0.8982 | 0.8972 |
0.1979 | 3.0 | 61440 | 0.3797 | 0.9094 | 0.9092 | 0.9098 | 0.9094 |
0.1239 | 4.0 | 81920 | 0.3981 | 0.9117 | 0.9113 | 0.9114 | 0.9117 |
0.1472 | 5.0 | 102400 | 0.4033 | 0.9137 | 0.9135 | 0.9134 | 0.9137 |
Model performance
- | precision | recall | f1 | support |
---|---|---|---|---|
Sports | 0.97 | 0.98 | 0.97 | 6400 |
Arts, Culture, and Entertainment | 0.94 | 0.95 | 0.94 | 6400 |
Business and Finance | 0.85 | 0.84 | 0.84 | 6400 |
Health and Wellness | 0.90 | 0.93 | 0.91 | 6400 |
Lifestyle and Fashion | 0.95 | 0.95 | 0.95 | 6400 |
Science and Technology | 0.89 | 0.83 | 0.86 | 6400 |
Politics | 0.93 | 0.88 | 0.90 | 6400 |
Crime | 0.85 | 0.93 | 0.89 | 6400 |
accuracy | 0.91 | 51200 | ||
macro avg | 0.91 | 0.91 | 0.91 | 51200 |
weighted avg | 0.91 | 0.91 | 0.91 | 51200 |
How to use roberta-base_topic_classification_nyt_news with HuggingFace
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
model = AutoModelForSequenceClassification.from_pretrained("dstefa/roberta-base_topic_classification_nyt_news")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his innocence and vowing."
pipe(text)
[{'label': 'Sports', 'score': 0.9989326596260071}]
Framework versions
- Transformers 4.32.1
- Pytorch 2.1.0+cu121
- Datasets 2.12.0
- Tokenizers 0.13.2
- Downloads last month
- 6,555
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for dstefa/roberta-base_topic_classification_nyt_news
Base model
FacebookAI/roberta-baseDataset used to train dstefa/roberta-base_topic_classification_nyt_news
Evaluation results
- F1 on New_York_Times_Topicsself-reported0.910
- accuracy on New_York_Times_Topicsself-reported0.910
- precision on New_York_Times_Topicsself-reported0.910
- recall on New_York_Times_Topicsself-reported0.910