coang commited on
Commit
30d0306
·
verified ·
1 Parent(s): 7c9a77f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: cross-encoder
3
+ tags:
4
+ - cross-encoder
5
+ - sentence-similarity
6
+ - transformers
7
+ - legal
8
+ - reranker
9
+ library_name: generic
10
+ language:
11
+ - vi
12
+ ---
13
+
14
+ # NaverHustQA/viLegal_cross_Quang
15
+
16
+ This is an cross-encoder model for Vietnamese legal domain: It returns a relevance score of a query-context input and can be used for information retrieval.
17
+
18
+ We use [vinai/phobert-base-v2](https://huggingface.co/vinai/phobert-base-v2) as the pre-trained backbone.
19
+
20
+
21
+ <!--- Describe your model here -->
22
+
23
+ ## Usage (HuggingFace Transformers)
24
+
25
+ You can use the model like below (Remember to word-segment inputs first):
26
+
27
+ ```python
28
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
29
+ import torch
30
+
31
+ # Load cross-encoder
32
+ model_name = "NaverHustQA/viLegal_cross_Quang"
33
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
34
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
35
+
36
+ # Define query and context
37
+ query = "'Uống rượu lái_xe bị phạt bao_nhiêu tiền ?'"
38
+ context = "Uống rượu lái_xe bị phạt 500,000 đồng ."
39
+
40
+ # Tokenize input (Cross-encoder format: query and context as a single input)
41
+ inputs = tokenizer(query, context, return_tensors="pt", padding=True, truncation=True)
42
+
43
+ # Run through model
44
+ with torch.no_grad():
45
+ outputs = model(**inputs)
46
+ score = outputs.logits.item() # Extract classification score
47
+
48
+ print(f"Relevance Score: {score}")
49
+ ```
50
+ ## Training
51
+ You can find full information of our training methods and datasets in our reports.
52
+
53
+ ## Authors
54
+ Le Thanh Huong, Nguyen Nhat Quang.