Safetensors
English
distilbert
phishing_url
Edit model card

Model Card for DistilBERT-PhishGuard

Model Overview

URLShield-DistilBERT is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links.

Intended Use

  • Use Cases: URL classification for phishing detection in emails, websites, and chat applications.
  • Limitations: This model may have reduced accuracy with non-English URLs or heavily obfuscated links.
  • Intended Users: Security researchers, application developers, and cybersecurity engineers.

Model Card for DistilBERT-PhishGuard

πŸ” What Sets PhishGuard Apart? High Accuracy πŸ“ˆ – Achieved up to 99.6% accuracy and 0.997 AUC on validation datasets. Optimized for Speed πŸš€ – Leveraging a distilled transformer model for faster predictions without compromising accuracy. Real-World Data 🌐 – Trained and evaluated on diverse phishing and safe URLs, ensuring robust performance across domains. πŸ“Š Performance Metrics (Averaged Across Epochs) Accuracy: 99.6% AUC (Area Under Curve): 0.997 Training Loss: 0.054 Validation Loss: 0.047

Markdown

Support the Project

If you find this project useful, consider buying me a coffee to support further development! β˜•οΈ

Buy Me a Coffee

Usage

This model can be loaded and used with Hugging Face's transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

#Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard")
model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard")

#Sample URL for classification
url = "http://example.com"
inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe")

Performance

The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets.

Limitations and Biases

The model's performance may degrade on URLs containing obfuscated or novel phishing techniques. It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs.

Contact and Support

For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository.

Downloads last month
12
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for Adnan-AI-Labs/URLShield-DistilBERT

Finetuned
(6636)
this model

Dataset used to train Adnan-AI-Labs/URLShield-DistilBERT