--- tags: - generated_from_trainer model-index: - name: Qra-1b-dolly-instruction-0.1 results: [] datasets: - s3nh/alpaca-dolly-instruction-only-polish language: - pl base_model: Qra-1b pipeline_tag: text-generation inference: true widget: - messages: - role: user content: Napisz kod w pythonie? --- # Qra-1b-dolly-instruction-0.1 This model if a fine-tuned version of [OPI-PG/Qra-1b](https://huggingface.co./OPI-PG/Qra-1b) on the [s3nh/alpaca-dolly-instruction-only-polish](https://huggingface.co./datasets/s3nh/alpaca-dolly-instruction-only-polish) dataset. ## Model Description Trained from [OPI-PG/Qra-1b](https://huggingface.co./OPI-PG/Qra-1b) ## Intended uses & limitations This model has been fine-tuned for question-answering task. It is possible to use it as a chat, but it doesn't work well because the dataset did not contain conversations. ```py import torch from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline model_id = "nie3e/Qra-1b-dolly-instruction-0.1" device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained(model_id) pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, device=device ) def get_answer(system_prompt: str, user_prompt: str) -> str: input_msg = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] prompt = pipe.tokenizer.apply_chat_template( input_msg, tokenize=False, add_generation_prompt=True ) outputs = pipe( prompt, max_new_tokens=512, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id ) return outputs[0]['generated_text'][len(prompt):].strip() print( get_answer( system_prompt="JesteÅ› przyjaznym chatbotem", user_prompt="Napisz czym jest dokument architectural decision record." ) ) ``` ## Training and evaluation data Dataset: [s3nh/alpaca-dolly-instruction-only-polish](https://huggingface.co./datasets/s3nh/alpaca-dolly-instruction-only-polish) Each row has been converted into conversation using this function: ```py system_message = """JesteÅ› przyjaznym chatbotem""" def create_conversation(sample) -> dict: strip_characters = "\"'" return { "messages": [ {"role": "system", "content": system_message}, {"role": "user", "content": f"{sample['instruction'].strip(strip_characters)} " f"{sample['input'].strip(strip_characters)}"}, {"role": "assistant", "content": f"{sample['output'].strip(strip_characters)}"} ] } ``` Train/test split: 90%/10% ## Training procedure GPU: 2x RTX 4060Ti 16GB Training time: ~1 hour Using accelerate + deepspeed with config: ```yml compute_environment: LOCAL_MACHINE debug: false deepspeed_config: gradient_accumulation_steps: 2 zero3_init_flag: false zero_stage: 1 distributed_type: DEEPSPEED downcast_bf16: 'no' machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 2 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false ``` ### Training hyperparameters Lora config: ```py peft_config = LoraConfig( lora_alpha=128, lora_dropout=0.05, r=256, bias="none", target_modules="all-linear", task_type="CAUSAL_LM" ) ``` Training arguments: ```py args = TrainingArguments( output_dir="Qra-1b-dolly-instruction-0.1", num_train_epochs=3, per_device_train_batch_size=3, gradient_accumulation_steps=2, gradient_checkpointing=True, optim="adamw_torch_fused", logging_steps=10, save_strategy="epoch", learning_rate=2e-4, bf16=True, tf32=True, max_grad_norm=0.3, warmup_ratio=0.03, lr_scheduler_type="constant", push_to_hub=False, report_to=["tensorboard"], ) ``` ### Framework versions - PEFT 0.10.0 - Transformers 4.39.2 - Pytorch 2.2.2+cu121 - Datasets 2.18.0 - Tokenizers 0.15.2