--- license: apache-2.0 datasets: - argilla/ultrafeedback-binarized-preferences-cleaned language: - en base_model: - mistralai/Mistral-7B-v0.1 library_name: transformers tags: - transformers - ORPO - RLHF - notus - argilla --- # Model Overview # 𝐌𝐨𝐝𝐞π₯ 𝐍𝐚𝐦𝐞:ElEmperador ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e8ea3892d9db9a93580fe3/gkDcpIxRCjBlmknN_jzWN.png) ## Model Description: ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model. ## Evals: BLEU:0.209 ## Inference Script: ```python def generate_response(model_name, input_text, max_new_tokens=50): # Load the tokenizer and model from Hugging Face Hub tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Tokenize the input text input_ids = tokenizer(input_text, return_tensors='pt').input_ids # Generate a response using the model with torch.no_grad(): generated_ids = model.generate(input_ids, max_new_tokens=max_new_tokens) # Decode the generated tokens into text generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True) return generated_text if __name__ == "__main__": # Set the model name from Hugging Face Hub model_name = "AINovice2005/ElEmperador" input_text = "Hello, how are you?" # Generate and print the model's response output = generate_response(model_name, input_text) print(f"Input: {input_text}") print(f"Output: {output}") ``` ## Results Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the model’s outputs more closely with human preferences, leading to more user-friendly and acceptable results.