---
language: en
tags:
- merge
- mergekit
- deepseek
- opt
- code-generation
datasets:
- openai_humaneval
base_model: deepseek-ai/deepseek-coder-1.3b-base
pipeline_tag: text-generation
library_name: transformers
model-index:
- name: DeepSeek-OPT-Merged-1.3B
  results:
  - task:
      type: text-generation
      dataset:
        type: openai_humaneval
        name: HumanEval
    metrics:
    - name: pass@1
      type: pass@1
      value: 0
      verified: false
license: apache-2.0
---

# DeepSeek-OPT-Merged-1.3B

A merged model combining DeepSeek Coder 1.3B and OPT-350M using linear interpolation merge technique.

## 🔍 Model Description
This model is created by merging two foundation models:
- Primary: DeepSeek Coder 1.3B (code generation capabilities)
- Secondary: OPT-350M (general language understanding)

## 🛠️ Training/Merging Process
1. **Base Models Selection**:
   - DeepSeek Coder 1.3B for code understanding
   - OPT-350M for general language capabilities

2. **Merge Technique**:
   - Method: Linear interpolation
   - Weight ratio: α=0.5 (50% each model)
   - No additional training, pure weight merging

3. **Technical Process**:
   - Used PyTorch for model handling
   - Applied float16 precision
   - Implemented memory efficient merging
   - Used device map auto-detection

## 🧩 Configuration
models:
model: deepseek-ai/deepseek-coder-1.3b-base # Base model
model: facebook/opt-350m # Target model
merge_method: linear
parameters:
alpha: 0.5
dtype: float16


## 💻 Usage

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"grozmart1/deepseek-opt-merged-1.3b",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("grozmart1/deepseek-opt-merged-1.3b")


Example usage
text = "Write a Python function to sort a list:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
inputs,
max_length=200,
temperature=0.7,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


## 🔧 Technical Details
- Architecture: Transformer-based language model
- Parameters: ~1.3B
- Precision: float16
- Merge Method: Linear interpolation (α=0.5)
- Device Support: CPU/GPU (Auto device mapping)
- Memory Requirements: ~4GB GPU RAM or 8GB CPU RAM

## 📊 Model Evaluation
- Dataset: HumanEval (Code Generation Benchmark)
- Metric: pass@1 (Functional Correctness)
- Status: Pending evaluation
- Expected Capabilities:
  - Code completion
  - Function generation
  - Technical documentation
  - General text generation

## 📝 License
Apache 2.0

## 🚀 Intended Use
- Code generation and completion
- Technical documentation
- Programming assistance
- General text generation tasks

## ⚠️ Limitations
- Inherits limitations from both parent models
- May show inconsistencies in code generation
- Limited by context window of base models
- Performance varies by task type