NaverHustQA/viLegal_cross_Quang

This is an cross-encoder model for Vietnamese legal domain: It returns a relevance score of a query-context input and can be used for information retrieval.

We use vinai/phobert-base-v2 as the pre-trained backbone.

Usage (HuggingFace Transformers)

You can use the model like below (Remember to word-segment inputs first):

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load cross-encoder
model_name = "NaverHustQA/viLegal_cross_Quang"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Define query and context
query = "'Uống rượu lái_xe bị phạt bao_nhiêu tiền ?'"
context = "Uống rượu lái_xe bị phạt 500,000 đồng ."

# Tokenize input (Cross-encoder format: query and context as a single input)
inputs = tokenizer(query, context, return_tensors="pt", padding=True, truncation=True)

# Run through model
with torch.no_grad():
    outputs = model(**inputs)
    score = outputs.logits.item()  # Extract classification score

print(f"Relevance Score: {score}")

Training

You can find full information of our training methods and datasets in our reports.

Authors

Le Thanh Huong, Nguyen Nhat Quang.

Downloads last month
10
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support sentence-similarity models for generic library.