--- language: en tags: - merge - mergekit - deepseek - opt - code-generation datasets: - openai_humaneval base_model: deepseek-ai/deepseek-coder-1.3b-base pipeline_tag: text-generation library_name: transformers model-index: - name: DeepSeek-OPT-Merged-1.3B results: - task: type: text-generation dataset: type: openai_humaneval name: HumanEval metrics: - name: pass@1 type: pass@1 value: 0 verified: false license: apache-2.0 --- # DeepSeek-OPT-Merged-1.3B A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique. ## 🔍 Model Description This model is created by merging two foundation models: - Primary: DeepSeek Coder 1.3B (code generation capabilities) - Secondary: OPT-350M (general language understanding) ## 🛠️ Training/Merging Process 1. **Base Models Selection**: - DeepSeek Coder 1.3B for code understanding - OPT-350M for general language capabilities 2. **Merge Technique**: - Method: Linear interpolation - Weight ratio: α=0.5 (50% each model) - No additional training, pure weight merging 3. **Technical Process**: - Used PyTorch for model handling - Applied float16 precision - Implemented memory efficient merging - Used device map auto-detection ## 🧩 Configuration models: model: deepseek-ai/deepseek-coder-1.3b-base # Base model model: facebook/opt-350m # Target model merge_method: linear parameters: alpha: 0.5 dtype: float16 ## 💻 Usage python from transformers import AutoModelForCausalLM, AutoTokenizer import torch Load model and tokenizer model = AutoModelForCausalLM.from_pretrained( "grozmart1/deepseek-opt-merged-1.3b", torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b") Example usage text = "Write a Python function to sort a list:" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate( inputs, max_length=200, temperature=0.7, top_p=0.95, do_sample=True ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ## 🔧 Technical Details - Architecture: Transformer-based language model - Parameters: ~1.3B - Precision: float16 - Merge Method: Linear interpolation (α=0.5) - Device Support: CPU/GPU (Auto device mapping) - Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM ## 📊 Model Evaluation - Dataset: HumanEval (Code Generation Benchmark) - Metric: pass@1 (Functional Correctness) - Status: Pending evaluation - Expected Capabilities: - Code completion - Function generation - Technical documentation - General text generation ## 📝 License Apache 2.0 ## 🚀 Intended Use - Code generation and completion - Technical documentation - Programming assistance - General text generation tasks ## ⚠️ Limitations - Inherits limitations from both parent models - May show inconsistencies in code generation - Limited by context window of base models - Performance varies by task type