My LoRA Fine-Tuned AI-generated Detector

This is a e5-small model fine-tuned with LoRA for sequence classification tasks. It is optimized to classify text into AI-generated or human-written with high accuracy.

  • Label_0: Represents human-written content.
  • Label_1: Represents AI-generated content.

Model Details

  • Base Model: intfloat/e5-small
  • Fine-Tuning Technique: LoRA (Low-Rank Adaptation)
  • Task: Sequence Classification
  • Use Cases: Text classification for AI-generated detection.
  • Hyperparameters:
    • Learning rate: 5e-5
    • Epochs: 3
    • LoRA rank: 8
    • LoRA alpha: 16

Training Details

  • Dataset:
    • 10,000 twitters and 10,000 rewritten twitters with GPT-4o-mini.
    • 80,000 human-written text from RAID-train.
    • 128,000 AI-generated text from RAID-train.
  • Hardware: Fine-tuned on a single NVIDIA A100 GPU.
  • Training Time: Approximately 2 hours.
  • Evaluation Metrics:
    Metric (Raw) E5-small Fine-tuned
    Accuracy 65.2% 89.0%
    F1 Score 0.653 0.887
    AUC 0.697 0.976

Collaborators

  • Menglin Zhou
  • Jiaping Liu
  • Xiaotian Zhan

Citation

If you use this model, please cite the RAID dataset as follows:

@inproceedings{dugan-etal-2024-raid,
    title = "{RAID}: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors",
    author = "Dugan, Liam  and
      Hwang, Alyssa  and
      Trhl{\'\i}k, Filip  and
      Zhu, Andrew  and
      Ludan, Josh Magnus  and
      Xu, Hainiu  and
      Ippolito, Daphne  and
      Callison-Burch, Chris",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.674",
    pages = "12463--12492",
}
Downloads last month
896
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for MayZhou/e5-small-lora-ai-generated-detector

Base model

intfloat/e5-small
Finetuned
(1)
this model

Dataset used to train MayZhou/e5-small-lora-ai-generated-detector

Evaluation results