--- library_name: transformers license: llama3.1 base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1 tags: - alignment-handbook - trl - dpo - generated_from_trainer datasets: - Magpie-Align/MagpieLM-SFT-Data-v0.1 - Magpie-Align/MagpieLM-DPO-Data-v0.1 model-index: - name: MagpieLM-8B-Chat-v0.1 results: [] --- ![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) # 🐦 MagpieLM-8B-Chat-v0.1 [Visualize in Weights & Biases](https://api.wandb.ai/links/uw-nsl/0s1eegy2) ## 🧐 About This Model *Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1* This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co./meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it. We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co./datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1). * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co./Magpie-Align/MagpieLM-8B-SFT-v0.1) We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co./datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset. ## 🔥 Benchmark Performance Greedy Decoding - **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)** - **Arena Hard: 48.4** - **WildBench WB Score (v2.0625): 44.72** **Benchmark Performance Compare to Other SOTA SLMs** ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg) ## 👀 Other Information **License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE). **Conversation Template**: Please use the Llama 3 chat template for the best performance. **Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process. ## 🧐 How to use it? [![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co./spaces/flydust/MagpieLM-8B) Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`. You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. ```python import transformers import torch model_id = "MagpieLM-8B-Chat-v0.1" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are Magpie, a friendly AI assistant."}, {"role": "user", "content": "Who are you?"}, ] outputs = pipeline( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1]) ``` --- # Alignment Pipeline The detailed alignment pipeline is as follows. ## Stage 1: Supervised Fine-tuning We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co./Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations. [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml base_model: meta-llama/Meta-Llama-3.1-8B model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer chat_template: llama3 load_in_8bit: false load_in_4bit: false strict: false main_process_port: 0 datasets: - path: Magpie-Align/MagpieLM-SFT-Data-v0.1 type: sharegpt conversation: llama3 dataset_prepared_path: last_run_prepared val_set_size: 0.001 output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1 sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: true wandb_project: SynDa wandb_entity: wandb_watch: wandb_name: MagpieLM-8B-SFT-v0.1 wandb_log_model: hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1 gradient_accumulation_steps: 32 micro_batch_size: 1 num_epochs: 2 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 2e-5 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_ratio: 0.1 evals_per_epoch: 5 eval_table_size: saves_per_epoch: debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: pad_token: <|end_of_text|> ```

## Stage 2: Direct Preference Optimization ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-07 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.686 | 0.0653 | 100 | 0.6856 | -0.0491 | -0.0616 | 0.6480 | 0.0125 | -471.3315 | -478.8181 | -0.7034 | -0.7427 | | 0.6218 | 0.1306 | 200 | 0.6277 | -0.6128 | -0.7720 | 0.6960 | 0.1591 | -542.3653 | -535.1920 | -0.7771 | -0.8125 | | 0.5705 | 0.1959 | 300 | 0.5545 | -2.4738 | -3.0052 | 0.7270 | 0.5314 | -765.6894 | -721.2881 | -0.7894 | -0.8230 | | 0.4606 | 0.2612 | 400 | 0.5081 | -2.6780 | -3.3782 | 0.7560 | 0.7002 | -802.9893 | -741.7116 | -0.6813 | -0.7247 | | 0.4314 | 0.3266 | 500 | 0.4787 | -3.6697 | -4.6026 | 0.7630 | 0.9329 | -925.4283 | -840.8740 | -0.6189 | -0.6691 | | 0.449 | 0.3919 | 600 | 0.4533 | -3.7414 | -4.8019 | 0.7820 | 1.0604 | -945.3563 | -848.0514 | -0.6157 | -0.6681 | | 0.4538 | 0.4572 | 700 | 0.4350 | -4.3858 | -5.6549 | 0.7890 | 1.2690 | -1030.6561 | -912.4920 | -0.5789 | -0.6331 | | 0.35 | 0.5225 | 800 | 0.4186 | -4.7129 | -6.1662 | 0.8010 | 1.4533 | -1081.7843 | -945.1964 | -0.5778 | -0.6347 | | 0.4153 | 0.5878 | 900 | 0.4108 | -4.9836 | -6.5320 | 0.7970 | 1.5484 | -1118.3677 | -972.2631 | -0.5895 | -0.6474 | | 0.3935 | 0.6531 | 1000 | 0.3999 | -4.4303 | -5.9370 | 0.8110 | 1.5067 | -1058.8646 | -916.9379 | -0.6016 | -0.6598 | | 0.3205 | 0.7184 | 1100 | 0.3950 | -5.1884 | -6.8827 | 0.8010 | 1.6943 | -1153.4371 | -992.7452 | -0.5846 | -0.6452 | | 0.3612 | 0.7837 | 1200 | 0.3901 | -5.0426 | -6.7179 | 0.8040 | 1.6753 | -1136.9619 | -978.1701 | -0.6046 | -0.6637 | | 0.3058 | 0.8490 | 1300 | 0.3877 | -5.1224 | -6.8428 | 0.8040 | 1.7204 | -1149.4465 | -986.1475 | -0.6087 | -0.6690 | | 0.3467 | 0.9144 | 1400 | 0.3871 | -5.2335 | -6.9809 | 0.8090 | 1.7474 | -1163.2629 | -997.2610 | -0.6071 | -0.6672 | | 0.3197 | 0.9797 | 1500 | 0.3867 | -5.1502 | -6.8793 | 0.8080 | 1.7291 | -1153.0979 | -988.9237 | -0.6120 | -0.6722 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.4.1+cu121 - Datasets 3.0.0 - Tokenizers 0.19.1
See alignment handbook configs ```yaml # Customized Configs model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1 hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1 output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1 run_name: MagpieLM-8B-Chat-v0.1 dataset_mixer: Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0 dataset_splits: - train - test preprocessing_num_workers: 24 # DPOTrainer arguments bf16: true beta: 0.01 learning_rate: 2.0e-7 gradient_accumulation_steps: 16 per_device_train_batch_size: 2 per_device_eval_batch_size: 4 num_train_epochs: 1 max_length: 2048 max_prompt_length: 1800 warmup_ratio: 0.1 logging_steps: 1 lr_scheduler_type: cosine optim: adamw_torch torch_dtype: null # use_flash_attention_2: true do_eval: true evaluation_strategy: steps eval_steps: 100 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: False log_level: info push_to_hub: true save_total_limit: 0 seed: 42 report_to: - wandb ```
## 📚 Citation If you find the model, data, or code useful, please cite: ``` @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` **Contact** Questions? Contact: - [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and - [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]