--- license: apache-2.0 language: - en library_name: transformers base_model: - Qwen/Qwen2.5-1.5B-Instruct pipeline_tag: text-generation ---
____ ____ __ __ __ ____ ____ ____ _ _ ( _ \( ___)( ) ( ) /__\ (_ _)( _ \(_ _)( \/ ) ) _ < )__) )(__ )(__ /(__)\ )( ) / _)(_ ) ( (____/(____)(____)(____)(__)(__)(__) (_)\_)(____)(_/\_)# **Bellatrix-1.5B-xElite** Bellatrix-1.5B-xElite is based on a reasoning-based model designed for the QWQ synthetic dataset entries. The pipeline's instruction-tuned, text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. These models outperform many of the available open-source options. Bellatrix is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions utilize supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). # **Quickstart with Transformers** Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/Bellatrix-1.5B-xElite" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` # **Intended Use:** 1. **Multilingual Dialogue Systems:** - Designed for conversational AI applications, capable of handling dialogue across multiple languages. - Useful in customer service, chatbots, and other dialogue-centric use cases. 2. **Reasoning and QWQ Dataset Applications:** - Optimized for tasks requiring logical reasoning and contextual understanding, particularly in synthetic datasets like QWQ. 3. **Agentic Retrieval:** - Supports retrieval-augmented generation tasks, helping systems fetch and synthesize information effectively. 4. **Summarization Tasks:** - Excels in summarizing long or complex text while maintaining coherence and relevance. 5. **Instruction-Following Tasks:** - Can execute tasks based on specific user instructions due to instruction-tuning during training. 6. **Language Generation:** - Suitable for generating coherent and contextually relevant text in various domains and styles. # **Limitations:** 1. **Synthetic Dataset Bias:** - Optimization for QWQ and similar datasets may make the model less effective on real-world or less structured data. 2. **Data Dependency:** - Performance may degrade on tasks or languages not well-represented in the training dataset. 3. **Computational Requirements:** - The optimized transformer architecture may demand significant computational resources, especially for fine-tuning or large-scale deployments. 4. **Potential Hallucinations:** - Like most auto-regressive models, it may generate plausible-sounding but factually incorrect or nonsensical outputs. 5. **RLHF-Specific Biases:** - Reinforcement Learning with Human Feedback (RLHF) can introduce biases based on the preferences of the annotators involved in the feedback process. 6. **Limited Domain Adaptability:** - While effective in reasoning and dialogue tasks, it may struggle with highly specialized domains or out-of-distribution tasks. 7. **Multilingual Limitations:** - Although optimized for multilingual use, certain low-resource languages may exhibit poorer performance compared to high-resource ones. 8. **Ethical Concerns:** - May inadvertently generate inappropriate or harmful content if safeguards are not applied, particularly in sensitive applications. 9. **Real-Time Usability:** - Latency in inference time could limit its effectiveness in real-time applications or when scaling to large user bases.