ViBidLAQA_base: A Vietnamese Bidding Legal Abstractive Question Answering Model

Overview

ViBidLAQA_base is an abstractive question-answering (AQA) model specifically developed for the Vietnamese bidding law domain. Built upon the VietAI/vit5-base architecture and fine-tuned with a specialized bidding law dataset, this model demonstrates strong performance in generating natural and accurate responses to legal queries.

Model Description

Downstream task: Abstractive Question Answering
Domain: Vietnamese Bidding Law
Base Model: VietAI/vit5-base
Approach: Fine-tuning
Language: Vietnamese

Dataset

The ViBidLQA dataset features:

Training set: 5,300 samples
Test set: 1,000 samples
Data Creation Process:
- Training data was automatically generated by Claude 3.5 Sonnet and validated by two legal experts
- Two Vietnamese legal experts manually created test set

Performance

Metric	Score
ROUGE-1	75.09
ROUGE-2	63.43
ROUGE-L	65.72
ROUGE-L-SUM	65.79
BLEU-1	53.61
BLEU-2	47.51
BLEU-3	43.40
BLEU-4	39.54
METEOR	64.38
BERT-Score	86.65

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ntphuc149/ViBidLAQA_base")
model = AutoModelForSeq2SeqLM.from_pretrained("ntphuc149/ViBidLAQA_base")

# Example usage
question = "Thế nào là đấu thầu hạn chế?"
context = "Đấu thầu hạn chế là phương thức lựa chọn nhà thầu trong đó chỉ một số nhà thầu đáp ứng yêu cầu về năng lực và kinh nghiệm được bên mời thầu mời tham gia."

# Prepare input
inputs = tokenizer(f"question: {question} context: {context}", return_tensors="pt", max_length=512, truncation=True)

# Generate answer
outputs = model.generate(inputs.input_ids, max_length=128, min_length=10, num_beams=4)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)

Applications

This model is advantageous for:

Bidding law information retrieval systems
Legal advisory chatbots in the bidding domain
Automated question-answering systems for bidding law queries

Limitations

The model is specifically trained for the Vietnamese bidding law domain and may not perform well on other legal domains
Performance may vary depending on the complexity and specificity of the questions
The model should be used as a reference tool and not as a replacement for professional legal advice

Citation

If you use this model in your research, please cite:

comming soon...

Contact

For questions, feedback, or collaborations:

Email: [email protected]
GitHub Issues: @ntphuc149
HuggingFace: @ntphuc149

License

This project is licensed under the MIT License - see the LICENSE file for details.

ntphuc149
/

ViBidLAQA_base