File size: 1,941 Bytes

7dac834
 
 
 
 
 
 
 
 
 
 
046d5ab
bc769af
 
fdb9a0a
59729a1
e6e4da2
59729a1
23d2bb9
 
 
59729a1
bc769af
e038f65
bc769af
6d496c5
d6ab674
59729a1
1bee25d
b03f230
d6ab674
d625ec5
a11389d
2418f79
d625ec5
bc769af
 
 
d625ec5
bc769af
 
d625ec5
bc769af
 
 
d625ec5
bc769af
 
 
 
2418f79
bc769af
d625ec5
bc769af
 
d625ec5
bc769af
 
 
 
bf10ea8
 
59729a1
bf10ea8

---
license: apache-2.0
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
language:
- en
base_model:
- mistralai/Mistral-7B-v0.1
library_name: transformers
tags:
- transformers

---

#  Model Overview

- 𝐌𝐨𝐝𝐞𝐥 𝐍𝐚𝐦𝐞:ElEmperador

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e8ea3892d9db9a93580fe3/gkDcpIxRCjBlmknN_jzWN.png)


## Model Description:

ElEmperador is an ORPO-based finetinue derived from the Mistral-7B-v0.1 base model.

The 'ultrafeedback-binarized-preferences-cleaned' dataset was used for training, albeit a small portion was used due to GPU constraints.

## Evals:
BLEU:0.209

## Inference Script:

```python
def generate_response(model_name, input_text, max_new_tokens=50):
    # Load the tokenizer and model from Hugging Face Hub
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    
    # Tokenize the input text
    input_ids = tokenizer(input_text, return_tensors='pt').input_ids
    
    # Generate a response using the model
    with torch.no_grad():
        generated_ids = model.generate(input_ids, max_new_tokens=max_new_tokens)
    
    # Decode the generated tokens into text
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    
    return generated_text

if __name__ == "__main__":
    # Set the model name from Hugging Face Hub
    model_name = "AINovice2005/ElEmperador" 
    input_text = "Hello, how are you?"

    # Generate and print the model's response
    output = generate_response(model_name, input_text)
    
    print(f"Input: {input_text}")
    print(f"Output: {output}")
```

## Results

ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning. It also helps in aligning the model’s outputs more closely with human preferences,
leading to more user-friendly and acceptable results.