File size: 1,183 Bytes
772a0a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad9e209
 
 
c4f42b0
ad9e209
 
 
 
fd42da8
772a0a4
ad9e209
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
772a0a4
ad9e209
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
datasets:
- jcblaise/fake_news_filipino
- SEACrowd/ph_fake_news_corpus
language:
- tl
- en
base_model:
- FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
tags:
- fake-news-detection
- text-classification
- tagalog
- filipino
metrics:
  - accuracy
  - f1
  - precision
  - recall
---

# Tagalog Fake News Detection Model

## Overview
This project implements a fake news detection model for Tagalog/Filipino using the XLM-RoBERTa base model with an accuracy of **95.46%**.

### Dataset
- Total Size: 18,522 samples
- Composition: 50/50 split of real and fake news
- Languages: Filipino, English

#### Dataset Split
- Train Set: ~12,968 samples
- Validation Set: ~2,784 samples
- Test Set: ~2,770 samples

### Performance Metrics (on Evaluation Set)
- Accuracy: 95.46%
- F1 Score: 95.40%
- Precision: 95.40%
- Recall: 95.40%


## Data Sources
The model was trained on a combined dataset from two primary sources:

1. [Fake News Filipino Dataset](https://huggingface.co./datasets/jcblaise/fake_news_filipino)
   - 3,206 rows used

2. [Philippine Fake News Corpus](https://huggingface.co./datasets/SEACrowd/ph_fake_news_corpus)
   - 15,312 rows used out of 22,458 available