iceman2434
/

xlm-roberta-base-fake-news-detection-tl

Text Classification

fake-news-detection

Model card Files Files and versions Community

xlm-roberta-base-fake-news-detection-tl / README.md

iceman2434's picture

Update README.md

c4f42b0 verified 3 months ago

|

history blame contribute delete

1.18 kB

	---
	datasets:
	- jcblaise/fake_news_filipino
	- SEACrowd/ph_fake_news_corpus
	language:
	- tl
	- en
	base_model:
	- FacebookAI/xlm-roberta-base
	pipeline_tag: text-classification
	tags:
	- fake-news-detection
	- text-classification
	- tagalog
	- filipino
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# Tagalog Fake News Detection Model

	## Overview
	This project implements a fake news detection model for Tagalog/Filipino using the XLM-RoBERTa base model with an accuracy of 95.46%.

	### Dataset
	- Total Size: 18,522 samples
	- Composition: 50/50 split of real and fake news
	- Languages: Filipino, English

	#### Dataset Split
	- Train Set: ~12,968 samples
	- Validation Set: ~2,784 samples
	- Test Set: ~2,770 samples

	### Performance Metrics (on Evaluation Set)
	- Accuracy: 95.46%
	- F1 Score: 95.40%
	- Precision: 95.40%
	- Recall: 95.40%


	## Data Sources
	The model was trained on a combined dataset from two primary sources:

	1. [Fake News Filipino Dataset](https://huggingface.co./datasets/jcblaise/fake_news_filipino)
	- 3,206 rows used

	2. [Philippine Fake News Corpus](https://huggingface.co./datasets/SEACrowd/ph_fake_news_corpus)
	- 15,312 rows used out of 22,458 available