Model Card for DistilBERT-PhishGuard

Model Overview

URLShield-DistilBERT is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links.

Intended Use

Use Cases: URL classification for phishing detection in emails, websites, and chat applications.
Limitations: This model may have reduced accuracy with non-English URLs or heavily obfuscated links.
Intended Users: Security researchers, application developers, and cybersecurity engineers.

Model Card for DistilBERT-PhishGuard

🔍 What Sets PhishGuard Apart? High Accuracy 📈 – Achieved up to 99.6% accuracy and 0.997 AUC on validation datasets. Optimized for Speed 🚀 – Leveraging a distilled transformer model for faster predictions without compromising accuracy. Real-World Data 🌐 – Trained and evaluated on diverse phishing and safe URLs, ensuring robust performance across domains. 📊 Performance Metrics (Averaged Across Epochs) Accuracy: 99.6% AUC (Area Under Curve): 0.997 Training Loss: 0.054 Validation Loss: 0.047

Markdown

Support the Project

If you find this project useful, consider buying me a coffee to support further development! ☕️

Usage

This model can be loaded and used with Hugging Face's transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

#Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard")
model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard")

#Sample URL for classification
url = "http://example.com"
inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe")

Performance

The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets.

Limitations and Biases

The model's performance may degrade on URLs containing obfuscated or novel phishing techniques. It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs.

Contact and Support

For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository.

Adnan-AI-Labs
/

URLShield-DistilBERT

Model Card for DistilBERT-PhishGuard

Model Overview

Intended Use

Model Card for DistilBERT-PhishGuard

Support the Project

Usage

Performance

Limitations and Biases

Contact and Support

Model tree for Adnan-AI-Labs/URLShield-DistilBERT

Dataset used to train Adnan-AI-Labs/URLShield-DistilBERT