--- license: apache-2.0 language: - en base_model: FourOhFour/Tulu-3.69-DPO-8B tags: - llama-cpp - gguf-my-repo --- # Triangle104/Tulu-3.69-DPO-8B-Q4_K_S-GGUF This model was converted to GGUF format from [`FourOhFour/Tulu-3.69-DPO-8B`](https://huggingface.co./FourOhFour/Tulu-3.69-DPO-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co./spaces/ggml-org/gguf-my-repo) space. Refer to the [original model card](https://huggingface.co./FourOhFour/Tulu-3.69-DPO-8B) for more details on the model. --- Model details: - This is a DPO applied over Tulu-3.69-8B. This model is designed to roleplay and converse like a human chat partner. This model follows instructions well and excels at playing characters in a realistic and entertaining manner. For ease of use, try the Llama 3 instruct format. You may need to set a custom stop string for <|end_of_text|> For optimal performance I have found that a modified Tulu 3 instruct format is quite effective: <|system|> This is an instruction. <|end_of_text|> <|user|> This is the user input. <|assistant|> This is model output. <|end_of_text|> Further, if you want your bot to have a sense of time, you can set the last output prefix as such: <|system|> {{time}} {{weekday}} {{date}} <|end_of_text|> <|assistant|> Note: these macros may differ in your chosen inferencing frontend. Please correct accordingly. base_model: jeiku/Tulu-3.69-8B model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false hub_model_id: jeiku/tuludpo hub_strategy: "all_checkpoints" push_dataset_to_hub: hf_use_auth_token: true chat_template: llama3 rl: dpo datasets: - path: antiven0m/physical-reasoning-dpo type: llama3.prompt_pairs - path: nbeerbower/Purpura-DPO type: llama3.prompt_pairs - path: FourOhFour/Human_DPO_Emojis_Removed type: llama3.prompt_pairs shuffle_merged_datasets: true val_set_size: 0.005 output_dir: ./outputs/out sequence_len: 8192 sample_packing: false eval_sample_packing: false pad_to_sequence_len: false wandb_project: evil wandb_entity: wandb_watch: wandb_name: evil wandb_log_model: gradient_accumulation_steps: 16 micro_batch_size: 2 num_epochs: 2 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.000005 weight_decay: 0.05 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: true gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: saves_per_epoch: 1 debug: deepspeed: fsdp: fsdp_config: special_tokens: pad_token: <|finetune_right_pad_id|> --- ## Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) ```bash brew install llama.cpp ``` Invoke the llama.cpp server or the CLI. ### CLI: ```bash llama-cli --hf-repo Triangle104/Tulu-3.69-DPO-8B-Q4_K_S-GGUF --hf-file tulu-3.69-dpo-8b-q4_k_s.gguf -p "The meaning to life and the universe is" ``` ### Server: ```bash llama-server --hf-repo Triangle104/Tulu-3.69-DPO-8B-Q4_K_S-GGUF --hf-file tulu-3.69-dpo-8b-q4_k_s.gguf -c 2048 ``` Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. Step 1: Clone llama.cpp from GitHub. ``` git clone https://github.com/ggerganov/llama.cpp ``` Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). ``` cd llama.cpp && LLAMA_CURL=1 make ``` Step 3: Run inference through the main binary. ``` ./llama-cli --hf-repo Triangle104/Tulu-3.69-DPO-8B-Q4_K_S-GGUF --hf-file tulu-3.69-dpo-8b-q4_k_s.gguf -p "The meaning to life and the universe is" ``` or ``` ./llama-server --hf-repo Triangle104/Tulu-3.69-DPO-8B-Q4_K_S-GGUF --hf-file tulu-3.69-dpo-8b-q4_k_s.gguf -c 2048 ```