metadata
license: cc-by-sa-4.0
datasets:
- cjvt/sentinews
language:
- sl
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: sloberta-sentinews-sentence
results:
- task:
type: text-classification
name: Sentiment classification
dataset:
type: cjvt/sentinews
name: SentiNews
config: sentence_level
metrics:
- type: f1
value: 0.6851357247321056
name: Test macro F1
- type: accuracy
value: 0.7158081705150977
name: Test accuracy
- type: f1
value: 0.6934678744913757
name: Validation macro F1
- type: accuracy
value: 0.7207815275310835
name: Validation accuracy
sloberta-sentinews-sentence
Slovenian 3-class sentiment classifier - SloBERTa fine-tuned on the sentence-level config of the SentiNews dataset.
The model is intended as:
(1) an out-of-the box sentence-level sentiment classifier or
(2) a sentence-level sentiment classification baseline.
Fine-tuning details
The model was fine-tuned on a random 90%/5%/5% train-val-test split of the sentence_level
configuration of the cjvt/sentinews dataset
using the following hyperparameters:
max_length = 79 # 99th percentile of encoded training sequences, sequences are padded/truncated to this length
batch_size = 128
optimizer = "adamw_torch"
learning_rate = 2e-5
num_epochs = 10
validation_metric = "macro_f1"
Feel free to inspect training_args.bin
for more details.
If you wish to directly compare your model to this one, you should use the same split as this model. To do so, use the following code:
import json
import datasets
# You can find split_indices.json in the 'Files and versions' tab
with open("split_indices.json", "r") as f_split:
split = json.load(f_split)
data = datasets.load_dataset("cjvt/sentinews", "sentence_level", split="train")
train_data = data.select(split["train_indices"])
dev_data = data.select(split["dev_indices"])
test_data = data.select(split["test_indices"])
Evaluation results
Best validation set results:
{
"eval_accuracy": 0.7207815275310835,
"eval_f1_macro": 0.6934678744913757,
"eval_f1_negative": 0.7042136003337507,
"eval_f1_neutral": 0.759215853398679,
"eval_f1_positive": 0.6169741697416974,
"eval_loss": 0.6337869167327881,
"eval_precision_negative": 0.6685148514851486,
"eval_precision_neutral": 0.7752393385552655,
"eval_precision_positive": 0.6314199395770392,
"eval_recall_negative": 0.74394006170119,
"eval_recall_neutral": 0.7438413361169103,
"eval_recall_positive": 0.6031746031746031
}
Test set results:
{
"test_loss": 0.6395984888076782,
"test_accuracy": 0.7158081705150977,
"test_precision_negative": 0.6570397111913358,
"test_recall_negative": 0.7292965271593945,
"test_f1_negative": 0.6912850812407682,
"test_precision_neutral": 0.7748017998714377,
"test_recall_neutral": 0.7418957734919983,
"test_f1_neutral": 0.7579918247563149,
"test_precision_positive": 0.6155642023346304,
"test_recall_positive": 0.5969811320754717,
"test_f1_positive": 0.6061302681992337,
"test_f1_macro": 0.6851357247321056,
}