Model Overview

A fine-tuned DistilBERT model for Named Entity Recognition (NER) in bias detection.

Model Details

We used distilbert-base-uncased and fine-tuned it on vector-institute/NMB-Plus-Named-Entities dataset.

How to Get Started with the Model

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_name = "vector-institute/nmb-plus-bias-ner-bert"
tokenizer = AutoTokenizer.from_pretrained(model_name)

label_list = ["O", "B-BIAS", "I-BIAS"]
id2label = {i: label for i, label in enumerate(label_list)}
label2id = {label: i for i, label in enumerate(label_list)}

model = AutoModelForTokenClassification.from_pretrained(
    model_name,
    id2label=id2label, 
    label2id=label2id
)


ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

text = "Fox News reported that Joe Biden met with CNN executives."
predictions = ner_pipeline(text)
print(predictions)

Training Hyperparameters

  • Training regime: Here's the training arguments we used:
training_args = TrainingArguments(
    learning_rate=2e-5,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=32,
    num_train_epochs=10,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    output_dir="./results",
    logging_dir="./logs",
    logging_steps=50,
    group_by_length=True,
)

Evaluation

We split the data to train(80%), validation(10%) and test(10%) sets.

Results

We used common classification metrics:

  • precision
  • recall
  • f1-score

Overall Results:

Metric Precision Recall F1-Score Support
Macro Avg 0.6405 0.5589 0.5922 48710
Weighted Avg 0.9330 0.9418 0.9366 48710

Per-class Results:

Label Precision Recall F1-Score Support
O 0.9615 0.9792 0.9703 45921
B-BIAS 0.5314 0.4183 0.4681 930
I-BIAS 0.4286 0.2792 0.3381 1859

Environmental Impact

Total energy consumption for fine-tuning is 0.032804 kWh

Local CO2 Emission: Approximately 3.12 grams of COâ‚‚ equivalent.

License

CC BY 4.0 (Creative Commons Attribution 4.0): Allows sharing and adaptation with proper credit.

Downloads last month
2
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for vector-institute/nmb-plus-bias-ner-bert

Finetuned
(7737)
this model

Dataset used to train vector-institute/nmb-plus-bias-ner-bert

Collection including vector-institute/nmb-plus-bias-ner-bert

Evaluation results