---
library_name: transformers
tags:
- hindi
- bilingual
license: llama2
language:
- hi
- en
---

# Eli: A Bilingual Hindi-English Large Language Model

## Introduction

Eli is an innovative, open-source bilingual Hindi-English Large Language Model (LLM) designed to bridge the linguistic gap between Hindi and English. Developed with meticulous attention to detail, Eli represents a pioneering effort to broaden the scope of LLMs to diverse languages.

## Purpose Behind Eli

**Why We Built Eli:**

- **Language Adaptation:** Enhance language adaptability within LLMs for Hindi and English.
- **Efficient Training:** Train and finetune on a compact dataset of 1 billion tokens.
- **Optimized Processes:** Identify and implement the most efficient training processes.
- **World Knowledge Acquisition:** Observe how the model acquires and processes world knowledge.
- **Training Method Optimization:** Optimize training methods tailored to each development stage.

## Development Stages

### Pre-training

- **Objective:** Familiarize Eli with a newly enriched vocabulary.
- **Method:** Full-weight pre-training on a 500-million-token corpus using 2xA100 GPUs, taking about 25 hours.
- **Outcome:** Improved Hindi token prediction and generation capabilities.

### Bilingual Next Token Prediction and Translation

- **Inspired By:** The open Hathi series by Sarvam.ai.
- **Dataset:** 200,000 tokens, with translation using IndicTrans2.
- **Method:** Alternating sentences between Hindi and English for enhanced alignment and balanced exposure.

### Bilingual Instruct Fine-tuning

- **Objective:** Enhance model responsiveness in both English and Hindi.
- **Method:** Supervised fine-tuning with low-rank adaptation using various instruction datasets.
- **Outcome:** A finely-tuned model available on Hugging Face, with a 4-bit quantized version for hands-on experience.

### DPO Fine-tuning

- **Objective:** Refine model preferences using Direct Preference Optimization.
- **Method:** Translation and fine-tuning with the Anthropic/hh-rlhf dataset.
- **Outcome:** Ongoing comprehensive evaluation.

## Learnings and Future Directions

**Challenges:**

- **World Knowledge:** Occasional hallucinations in response to specific queries.
- **Translation:** Requires more training data for nuanced translations.
- **Fine-tuning:** Future iterations will balance between full-weight and Lora fine-tuning based on further tests.

**What's Next:**

- **Romanized Hindi:** Incorporate Romanized Hindi for added linguistic versatility.
- **Continuous Learning:** Refine data pipelines, increase the training dataset to 10-15 billion Hindi tokens, and improve efficiency.

## Generate
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained("Neohumans-ai/Eli", torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("Neohumans-ai/Eli", trust_remote_code=True)

# Existing messages list
messages = [
    {"role": "system", "content": " You are Eli, an AI assistant created by NeoHumans-ai and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
    {"role": "user", "content": "Who are you"}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    # tokenize=False, 
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```


## Multi-turn Chat

To use the Eli model, you can follow the example code below:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, TextStreamer , TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained("Neohumans-ai/Eli", torch_dtype=torch.bfloat16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("Neohumans-ai/Eli", trust_remote_code=True)

# Existing messages list
messages = [
    {"role": "system", "content": " You are Eli, an AI assistant created by NeoHumans-ai and trained on top of Llama 3 Large language model (LLM), proficient in English and Hindi. You can respond in both languages based on the user's request."},
]

# Function to add user input and generate response
def process_user_input(user_input):
    global messages
    # Add user's input to messages list
    messages.append({"role": "user", "content": user_input})

    # Prepare the prompt for generation
    prompt_formatted_message = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=False
    )

    # Configure generation parameters
    generation_config = GenerationConfig(
        repetition_penalty=1.2,
        max_new_tokens=8000,
        temperature=0.2,
        top_p=0.95,
        top_k=40,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
        pad_token_id=tokenizer.pad_token_id,
        do_sample=True,
        use_cache=True,
        return_dict_in_generate=True,
        output_attentions=False,
        output_hidden_states=False,
        output_scores=False,
    )

    streamer = TextStreamer(tokenizer)
    batch = tokenizer(str(prompt_formatted_message.strip()), return_tensors="pt")
    print("\033[32mResponse: \033[0m")  # Print an empty response
    # Generate response
    generated = model.generate(
        inputs=batch["input_ids"].to("cuda"),
        generation_config=generation_config,
        streamer=streamer,

    )

    # Extract and format assistant's response
    # print(tokenizer.decode(generated["sequences"].cpu().tolist()[0]))
    assistant_response = tokenizer.decode(generated["sequences"].cpu().tolist()[0])
     # Find the last occurrence of "assistant" and empty string ("")
    assistant_start_index = assistant_response.rfind("<|start_header_id|>assistant<|end_header_id|>")
    empty_string_index = assistant_response.rfind("<|eot_id|>")

    # Extract the text between the last "assistant" and ""
    if assistant_start_index != -1 and empty_string_index != -1:
        final_response = assistant_response[assistant_start_index + len("<|start_header_id|>assistant<|end_header_id|>") : empty_string_index]
    else:
        # final_response = assistant_response  # If indices not found, use the whole response
        assert "Filed to generate multi turn prompt formate"

    # Append the extracted response to the messages list
    messages.append({"role": "assistant", "content": final_response})
    # messages.append({"role": "assistant", "content": assistant_response})

    # Print assistant's response
    # print(f"Assistant: {assistant_response}")

# Main interaction loop
while True:
    print("=================================================================================")
    user_input = input("Input: ")  # Prompt user for input
    
    # Check if user_input is empty
    if not user_input.strip():  # .strip() removes any leading or trailing whitespace
        break  # Break out of the loop if input is empty
      # Print response placeholder
    process_user_input(user_input)  # Process user's input and generate response

```

## Prompt formate

system prompt = `You are Eli, an AI assistant created by NeoHumans-ai and trained on top of Llama 3 Large language model(LLM), proficient in English and Hindi. You can respond in both languages based on the users request.`
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```

## Benchmarks 
coming soon

## Conclusion

Eli is designed to handle multi-turn chat conversations and understands Hinglish, making it highly effective for bilingual and code-mixed language contexts. Explore Eli’s capabilities on Hugging Face and experience the model firsthand on [chat.cognitivelab.in](https://chat.cognitivelab.in/).

Weights and datasets are available on Hugging Face:
- [Base Model](https://huggingface.co./Cognitive-Lab/LLama3-Gaja-Hindi-8B-base-v0.1)
- [Instruct Model](https://huggingface.co./datasets/Cognitive-Lab/Hindi-Instruct-dataset)

Stay tuned for more updates as we continue to evolve and enrich Eli.