|
--- |
|
datasets: |
|
- jcblaise/fake_news_filipino |
|
- SEACrowd/ph_fake_news_corpus |
|
language: |
|
- tl |
|
- en |
|
base_model: |
|
- FacebookAI/xlm-roberta-base |
|
pipeline_tag: text-classification |
|
tags: |
|
- fake-news-detection |
|
- text-classification |
|
- tagalog |
|
- filipino |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
--- |
|
|
|
# Tagalog Fake News Detection Model |
|
|
|
## Overview |
|
This project implements a fake news detection model for Tagalog/Filipino using the XLM-RoBERTa base model with an accuracy of **95.46%**. |
|
|
|
### Dataset |
|
- Total Size: 18,522 samples |
|
- Composition: 50/50 split of real and fake news |
|
- Languages: Filipino, English |
|
|
|
#### Dataset Split |
|
- Train Set: ~12,968 samples |
|
- Validation Set: ~2,784 samples |
|
- Test Set: ~2,770 samples |
|
|
|
### Performance Metrics (on Evaluation Set) |
|
- Accuracy: 95.46% |
|
- F1 Score: 95.40% |
|
- Precision: 95.40% |
|
- Recall: 95.40% |
|
|
|
|
|
## Data Sources |
|
The model was trained on a combined dataset from two primary sources: |
|
|
|
1. [Fake News Filipino Dataset](https://huggingface.co./datasets/jcblaise/fake_news_filipino) |
|
- 3,206 rows used |
|
|
|
2. [Philippine Fake News Corpus](https://huggingface.co./datasets/SEACrowd/ph_fake_news_corpus) |
|
- 15,312 rows used out of 22,458 available |