|
--- |
|
language: en |
|
tags: |
|
- summarization |
|
- legal |
|
- t5 |
|
license: apache-2.0 |
|
datasets: custom |
|
--- |
|
|
|
# Legal Document Summarizer |
|
|
|
This model is fine-tuned from `t5-base` to summarize large legal documents like constitutions and finance bills. It simplifies complex legal language, making it more accessible to non-experts. |
|
|
|
## Training Data |
|
|
|
The model was trained on a custom dataset of legal documents and their corresponding summaries. |
|
|
|
## Intended Use |
|
|
|
- **Task**: Legal document summarization. |
|
- **Target audience**: Legal professionals, researchers, and non-experts who need quick summaries of complex legal texts. |
|
- **Input**: A long legal document. |
|
- **Output**: A concise, simplified summary. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
|
|
tokenizer = T5Tokenizer.from_pretrained("VincentMuriuki/legal-summarizer") |
|
model = T5ForConditionalGeneration.from_pretrained("VincentMuriuki/legal-summarizer") |
|
|
|
text = "Your long legal document here..." |
|
inputs = tokenizer("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True) |
|
summary_ids = model.generate(inputs["input_ids"], max_length=150, min_length=50, length_penalty=2.0, num_beams=4, early_stopping=True) |
|
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
print(summary) |
|
|