--- language: - ko metrics: - accuracy library_name: transformers --- # KLUE Robeta-base for legal documents - KLUE/Robeta-Base Model을 판결문으로 이뤄진 legal_text_merged02_light.txt 파일을 사용하여 재학습 시킨 모델입니다. ## Model Details ### Model Description - **Developed by:** J.Park @ KETI - **Model type:** klue/roberta-base - **Language(s) (NLP):** korean - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### 학습 방법 ```python base_model = 'klue/roberta-base' base_tokenizer = 'klue/roberta-base' from transformers import RobertaTokenizer, RobertaForMaskedLM from transformers import AutoModel, AutoTokenizer model = RobertaForMaskedLM.from_pretrained(base_model) tokenizer = AutoTokenizer.from_pretrained(base_tokenizer) from transformers import LineByLineTextDataset dataset = LineByLineTextDataset( tokenizer=tokenizer, file_path=fpath_dataset, block_size=512, ) from transformers import DataCollatorForLanguageModeling data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=True, mlm_probability=0.15 ) from transformers import Trainer, TrainingArguments training_args = TrainingArguments( output_dir=output_dir, overwrite_output_dir=True, num_train_epochs=5, per_device_train_batch_size=18, save_steps=100, save_total_limit=2, seed=1 ) trainer = Trainer( model=model, args=training_args, data_collator=data_collator, train_dataset=dataset ) train_metrics = trainer.train() trainer.save_model(output_dir) trainer.push_to_hub() ```