File size: 1,544 Bytes
30d0306 6bd7052 30d0306 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
pipeline_tag: sentence-similarity
tags:
- cross-encoder
- sentence-similarity
- transformers
- legal
- reranker
library_name: generic
language:
- vi
---
# NaverHustQA/viLegal_cross_Quang
This is an cross-encoder model for Vietnamese legal domain: It returns a relevance score of a query-context input and can be used for information retrieval.
We use [vinai/phobert-base-v2](https://huggingface.co./vinai/phobert-base-v2) as the pre-trained backbone.
<!--- Describe your model here -->
## Usage (HuggingFace Transformers)
You can use the model like below (Remember to word-segment inputs first):
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load cross-encoder
model_name = "NaverHustQA/viLegal_cross_Quang"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Define query and context
query = "'Uống rượu lái_xe bị phạt bao_nhiêu tiền ?'"
context = "Uống rượu lái_xe bị phạt 500,000 đồng ."
# Tokenize input (Cross-encoder format: query and context as a single input)
inputs = tokenizer(query, context, return_tensors="pt", padding=True, truncation=True)
# Run through model
with torch.no_grad():
outputs = model(**inputs)
score = outputs.logits.item() # Extract classification score
print(f"Relevance Score: {score}")
```
## Training
You can find full information of our training methods and datasets in our reports.
## Authors
Le Thanh Huong, Nguyen Nhat Quang. |