PumeTu's picture
update readme
2da5c2f
|
raw
history blame
17.5 kB
metadata
library_name: transformers
base_model: meta-llama/Llama-3.1-8B
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - airesearch/WangchanX-Legal-ThaiCCL-RAG
model-index:
  - name: llama3.1-8b-legal-combine-ccl-16
    results: []

Llama-3.1-Legal-ThaiCCL-8B

Llama-3.1-Legal-ThaiCCL-8B is a large language model built upon Llama-3.1-8B, designed to answer Thai legal questions. It is full finetuned on the WangchanX Thai Legal dataset using the WangchanX Finetuning pipeline. The model is intended to be used with a supporting Retrieval-Augmented Generation (RAG) system which queries relevant supporting legal documents for the model to reference when responding to the questions.

Model description

Model Usage

from transformers import pipeline
import torch

EN_QA_TEMPLATE = "Given the user's query in the context of Thai legal matters, the RAG system retrieves the top_n related documents. From these documents, it's crucial to identify and utilize only the most relevant ones to craft an accurate and informative response.Context information is below.\n\n---------------------\nContext: Thai legal domain\nQuery: {query_str}\nRetrieved Documents: {context_str}\n---------------------\n\n Using the provided context information and the list of retrieved documents, you will focus on selecting the documents that are most relevant to the user's query. This selection process involves evaluating the content of each document for its pertinency to the query, ensuring that the response is based on accurate and contextually appropriate information.Based on the selected documents, you will synthesize a response that addresses the user's query, drawing directly from the content of these documents to provide a precise, legally informed answer.You must answer in Thai.\nAnswer:"

EN_SYSTEM_PROMPT_STR = """You are a legal assistant named Sommai (สมหมาย in Thai). You provide legal advice in a friendly, clear, and approachable manner. When answering questions, you reference the relevant law sections, including the name of the act or code they are from. You explain what these sections entail, including any associated punishments, fees, or obligations. Your tone is polite yet informal, making users feel comfortable, like consulting a trusted friend. If a question falls outside your knowledge, you must respond with the exact phrase: 'สมหมายไม่สามารถตอบคำถามนี้ได้ครับ'. You avoid making up information and guide users based on accurate legal references relevant to their situation. Where applicable, you provide practical advice, such as preparing documents, seeking medical attention, or contacting authorities. If asked about past Supreme Court judgments, you must state that you do not have information on those judgments at this time."""

query = "การร้องขอให้ศาลสั่งให้บุคคลเป็นคนไร้ความสามารถมีหลักเกณฑ์การพิจารณาอย่างไร"

context = """ประมวลกฎหมายแพ่งและพาณิชย์ มาตรา 33 ในคดีที่มีการร้องขอให้ศาลสั่งให้บุคคลใดเป็นคนไร้ความสามารถเพราะวิกลจริต ถ้าทางพิจารณาได้ความว่าบุคคลนั้นไม่วิกลจริต แต่มีจิตฟั่นเฟือนไม่สมประกอบ เมื่อศาลเห็นสมควรหรือเมื่อมีคำขอของคู่ความหรือของบุคคลตามที่ระบุไว้ในมาตรา 28 ศาลอาจสั่งให้บุคคลนั้นเป็นคนเสมือนไร้ความสามารถก็ได้ หรือในคดีที่มีการร้องขอให้ศาลสั่งให้บุคคลใดเป็นคนเสมือนไร้ความสามารถเพราะมีจิตฟั่นเฟือนไม่สมประกอบ ถ้าทางพิจารณาได้ความว่าบุคคลนั้นวิกลจริต เมื่อมีคำขอของคู่ความหรือของบุคคลตามที่ระบุไว้ในมาตรา 28 ศาลอาจสั่งให้บุคคลนั้นเป็นคนไร้ความสามารถก็ได้"""


model_id = "airesearch/LLaMa3.1-8B-Legal-ThaiCCL-Combine"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

sample = [
    {"role": "system", "content": SYSTEM_PROMPT_STR},
    {"role": "user", "content": QA_template.format(context_str=context, query_str=query)},
]

prompt = pipeline.tokenizer.apply_chat_template(sample, 
                                                tokenize=False, 
                                                add_generation_prompt=True)

outputs = pipeline(
    prompt,
    max_new_tokens = 512,
    eos_token_id = terminators,
    do_sample = True,
    temperature = 0.6,
    top_p = 0.9
)

print(outputs[0]["generated_text"][-1])

Training Data

The model is trained on the WangchanX Legal ThaiCCL RAG dataset, which is a Thai legal question-answering dataset created using a RAG system to query relevant supporting legal datasets based on a question for the LLM to reference in its answer. For more information on how the datasets was created please refer to this blog.

To emulate a real world use case, during training we incorporated both the positive and negative context (if available) into the prompt. We found that this resulted in a model that is more robust towards cases that the RAG system also passes in irrelevant contexts mixed with the correct context to reference (refer to the evaluation section for results).

Prompt Format

We recommend using the same chat template (system prompt and question template of context, query, and retreived documents) when using the provided weights, since the model was trained with the specific system prompt and question template. Example input prompt:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a legal assistant named Sommai (สมหมาย in Thai), you provide legal advice to users in a friendly and understandable manner. When answering questions, you specifically reference the law sections relevant to the query, including the name of the act or code they originated from, an explanation of what those sections entail, and any associated punishments or fees. Your tone is approachable and informal yet polite, making users feel as if they are seeking advice from a friend. If a question arises that does not match the information you possess, you must acknowledge your current limitations by stating this exactly sentence: 'สมหมายไม่สามารถตอบคำถามนี้ได้ครับ'. You will not fabricate information but rather guide users based on actual law sections relevant to their situation. Additionally, you offer practical advice on next steps, such as gathering required documents, seeking medical attention, or visiting a police station, as applicable. If inquired about past Supreme Court judgments, you must reply that you do not have information on those judgments yet.<|eot_id|>
<|start_header_id|>user<|end_header_id|>

Given the user's query in the context of Thai legal matters, the RAG system retrieves the top_n related documents. From these documents, it's crucial to identify and utilize only the most relevant ones to craft an accurate and informative response.

Context information is below.
---------------------
Context: Thai legal domain
Query: {question}
Retreived Documents: {retreived legal documents}
---------------------

Using the provided context information and the list of retrieved documents, you will focus on selecting the documents that are most relevant to the user's query. This selection process involves evaluating the content of each document for its pertinency to the query, ensuring that the response is based on accurate and contextually appropriate information.
Based on the selected documents, you will synthesize a response that addresses the user's query, drawing directly from the content of these documents to provide a precise, legally informed answer.
You must answer in Thai.
Answer:
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Here is a Python code snippet on how to apply the chat template with the provided system prompt and question template on the WangchanX Legal Thai CCL dataset:

EN_QA_TEMPLATE = "Given the user's query in the context of Thai legal matters, the RAG system retrieves the top_n related documents. From these documents, it's crucial to identify and utilize only the most relevant ones to craft an accurate and informative response.Context information is below.\n\n---------------------\nContext: Thai legal domain\nQuery: {query_str}\nRetrieved Documents: {context_str}\n---------------------\n\n Using the provided context information and the list of retrieved documents, you will focus on selecting the documents that are most relevant to the user's query. This selection process involves evaluating the content of each document for its pertinency to the query, ensuring that the response is based on accurate and contextually appropriate information.Based on the selected documents, you will synthesize a response that addresses the user's query, drawing directly from the content of these documents to provide a precise, legally informed answer.You must answer in Thai.\nAnswer:"

EN_SYSTEM_PROMPT_STR = """You are a legal assistant named Sommai (สมหมาย in Thai). You provide legal advice in a friendly, clear, and approachable manner. When answering questions, you reference the relevant law sections, including the name of the act or code they are from. You explain what these sections entail, including any associated punishments, fees, or obligations. Your tone is polite yet informal, making users feel comfortable, like consulting a trusted friend. If a question falls outside your knowledge, you must respond with the exact phrase: 'สมหมายไม่สามารถตอบคำถามนี้ได้ครับ'. You avoid making up information and guide users based on accurate legal references relevant to their situation. Where applicable, you provide practical advice, such as preparing documents, seeking medical attention, or contacting authorities. If asked about past Supreme Court judgments, you must state that you do not have information on those judgments at this time."""

def format(example):
    if "คำตอบ: " in example["positive_answer"]:
        example["positive_answer"] = example["positive_answer"].replace("คำตอบ: ", "")
    if example['positive_contexts']:
        context = ''.join([v['text'] for v in example['positive_contexts'][:5]])
        message = [
            {"content": EN_SYSTEM_PROMPT_STR, "role": "system"}, 
            {"content": EN_QA_TEMPLATE.format(query_str=example['question'], context_str=context), "role": "user"}, 
        ]
    else:
        message = [
            {"content": EN_SYSTEM_PROMPT_STR, "role": "system"}, 
            {"content": EN_QA_TEMPLATE.format(query_str=example['question'], context_str=" "), "role": "user"}, 
        ]
    return dict(messages=message)
dataset = dataset.map(format, batched=False)

Training hyperparameters

We full fine-tuned Llama-3.1-8B using the following hyperparameters:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 256
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Total training time: 2:15:14.66

Evaluation

We tested our model based on the test set of the WangchanX Legal Thai CCL dataset using both traditional (MRC) metrics and a LLM as judge technique based on the paper CHIE: Generative MRC Evaluation for in-context QA with Correctness, Helpfulness, Irrelevancy, and Extraneousness Aspects

Note: LLaMa3.1-8B-Legal-ThaiCCL is trained on only positive contexts while LLaMa3.1-8B-Legal-ThaiCCL-Combine is trained on both positive and negative contexts

Table 1: MRC Results

Model Context Type Answer Type ROUGE-L Character Error Rate (CER) Word Error Rate (WER) BERT Score F1-score XQuAD Exact Match XQuAD
Zero-shot LLaMa3.1-8B-Instruct Golden Passage Only Positive 0.553 1.181 1.301 0.769 48.788 0.0
LLaMa3.1-8B-Legal-ThaiCCL Golden Passage Only Positive 0.603 0.667 0.736 0.821 60.039 0.053
LLaMa3.1-8B-Legal-ThaiCCL-Combine Golden Passage Only Positive 0.715 0.695 0.758 0.833 64.578 0.614
Zero-shot LLaMa3.1-70B-Instruct Golden Passage Only Positive 0.830 0.768 0.848 0.830 61.497 0.0
Zero-shot LLaMa3.1-8B-Instruct Retrieval Passage Only Positive 0.422 1.631 1.773 0.757 39.639 0.0
LLaMa3.1-8B-Legal-ThaiCCL Retrieval Passage Only Positive 0.366 1.078 1.220 0.779 44.238 0.03
LLaMa3.1-8B-Legal-ThaiCCL-Combine Retrieval Passage Only Positive 0.516 0.884 0.884 0.816 54.948 0.668
Zero-shot LLaMa3.1-70B-Instruct Retrieval Passage Only Positive 0.616 0.934 1.020 0.816 54.930 0.0

Table 2: CHIE Results

Model Context Type Answer Type Q1: Correctness [H] Q2: Helpfulness [H] Q3: Irrelevancy [L] Q4: Out-of-Context [L]
Zero-shot LLaMa3.1-8B-Instruct Golden Passage Only Positive 0.740 0.808 0.480 0.410
LLaMa3.1-8B-Legal-ThaiCCL Golden Passage Only Positive 0.705 0.486 0.294 0.208
LLaMa3.1-8B-Legal-ThaiCCL-Combine Golden Passage Only Positive 0.565 0.468 0.405 0.325
Zero-shot LLaMa3.1-70B-Instruct Golden Passage Only Positive 0.870 0.658 0.316 0.247
Zero-shot LLaMa3.1-8B-Instruct Retrieval Passage Only Positive 0.480 0.822 0.557 0.248
LLaMa3.1-8B-Legal-ThaiCCL Retrieval Passage Only Positive 0.274 0.470 0.720 0.191
LLaMa3.1-8B-Legal-ThaiCCL-Combine Retrieval Passage Only Positive 0.532 0.445 0.508 0.203
Zero-shot LLaMa3.1-70B-Instruct Retrieval Passage Only Positive 0.748 0.594 0.364 0.202

License and use

The model is released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.