|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.1-8B-instruct |
|
pipeline_tag: text-generation |
|
tags: |
|
- lora |
|
- adapter |
|
- writing |
|
- CoT |
|
--- |
|
# Merged-Llama-Adapters-317-320 |
|
|
|
A merged LoRA adapter combining four fine-tuned adapters (317-320) for the Llama-3.1-8B language model. |
|
|
|
## Model Details |
|
|
|
- Base Model: meta-llama/Llama-3.1-8B-instruct |
|
- Adaptation Method: Merged LoRA |
|
- Source Adapters: |
|
- https://huggingface.co./kevin009/merge-17-2 |
|
- https://huggingface.co./kevin009/llama318 |
|
- https://huggingface.co./kevin009/llama319 |
|
- https://huggingface.co./kevin009/llama320 |
|
- https://huggingface.co./kevin009/llama1720-base |
|
- https://huggingface.co./kevin009/llama324 |
|
|
|
## Merger Configuration |
|
|
|
### Source Adapters |
|
|
|
All source adapters share the following configuration: |
|
- Rank (r): 16 |
|
- Alpha: 16 |
|
- Target Modules: |
|
- q_proj (Query projection) |
|
- k_proj (Key projection) |
|
- v_proj (Value projection) |
|
- o_proj (Output projection) |
|
- up_proj (Upsampling projection) |
|
- down_proj (Downsampling projection) |
|
- gate_proj (Gate projection) |
|
|
|
### Merger Details |
|
|
|
- Merger Method: Linear interpolation |
|
- Merger Weights: Equal weights (0.25) for each adapter |
|
- Combined Rank: 16 (maintained from source adapters) |
|
|
|
## Usage |
|
|
|
This merged adapter must be used with the base Llama-3.1-8B-instruct model. |
|
|
|
### Loading the Model |
|
|
|
```python |
|
# Initialize with first F32 model |
|
peft_model = PeftModel.from_pretrained(model, "llama319", adapter_name="llama319") |
|
# Load F32 models (higher precision) |
|
peft_model.load_adapter("llama324", adapter_name="llama324") |
|
peft_model.load_adapter("llama320", adapter_name="llama320") |
|
peft_model.load_adapter("llama318", adapter_name="llama318") |
|
peft_model.load_adapter("llama1720-base", adapter_name="llama1720-base") |
|
peft_model.load_adapter("merge-17-20", adapter_name="merge-17-20") |
|
|
|
adapters = ["llama319", "llama320", "llama1720-base", "merge-17-20", "llama324", "llama318"] |
|
weights = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0] |
|
peft_model.add_weighted_adapter(adapters, weights, "merge", combination_type="ties", density=0.2) |
|
|
|
peft_model.set_adapter("merge") |
|
peft_model.save_pretrained("merged") |
|
|
|
``` |
|
|
|
## Limitations and Biases |
|
|
|
- This merged adapter inherits limitations and biases from: |
|
- The base Llama-3.1-8B-instruct model |
|
- All four source adapters |
|
- The merging process may result in: |
|
- Potential loss of specialized capabilities from individual adapters |
|
- Averaged behavior across different adapter specializations |
|
- Possible interference between adapter weights |
|
|
|
## Merging Process |
|
|
|
The adapters were merged using the following approach: |
|
1. Linear interpolation of adapter weights |
|
2. Equal weighting (0.25) applied to each source adapter |
|
3. Preservation of original LoRA rank and architecture |
|
|
|
### Method Used |
|
|
|
The adapters were merged using PEFT (Parameter-Efficient Fine-Tuning) library's weighted adapter combination feature. The process combines multiple LoRA adapters using linear interpolation with specified weights. |
|
|
|
|
|
### Key Parameters |
|
|
|
- `combination_type="ties"`: Uses the TIES (Task Interference Edge Selection) method for combining adapters |
|
- `density=0.2`: Controls the sparsity of the merged weights |
|
|
|
|
|
### Notes |
|
|
|
- The order of loading adapters may affect the final result |
|
- Equal weights were chosen to maintain balanced influence from each adapter |
|
- The merged adapter maintains the same architecture and rank as the original adapters |
|
- While this adapter merges multiple fine-tunes, each component was developed as part of independent research efforts to explore and language model capabilities as part of R&D process. |
|
|
|
|
|
## Datasets |
|
|
|
- Not yet released, but should be released after evaluation has completed. |
|
- Creating dataset alone tooks more than 3 month for creating 30k pairs dataset. |
|
- Only 1k pairs example considered to be synthetic dataset, the rest half synthetic and human written text. |
|
|
|
### Use Cases |
|
|
|
- This merged adapter can be used for a wide range of tasks, including but not limited to: |
|
- Accessibility |
|
- Revision & Editing |
|
- instruction-following use with xml tags |
|
- Thinking & reasoning with xml tag of <thinking> and </thinking>, if being asked i the instructions. |
|
|
|
|
|
These Models not optimized for code, math, or other specialized tasks that need Perefence Optimization. |
|
|
|
## Why SFT Instead of RLHF/DPO? |
|
- RLHF and DPO approaches often lead to vocabulary limitations and overfitting due to their optimization objectives |
|
|
|
|
|
## Why Multiple Adapters? |
|
- Resource Issue: Placing the training into smaller adapters requires less GPU memory and compute time while gives more control over the training process. |
|
- Iterative Development: Each adapter can be developed and tested independently |
|
- Training Infrastructure: The complete fine-tuning process was conducted across multiple sessions, totaling over 100 hours on high-end GPUs (H100, H200, or L40s) |
|
- Flexibility: Multiple adapters allow for different combinations or weightings |
|
|
|
|
|
## License |
|
|
|
Licensed under Apache 2.0 License. |
|
|
|
This merged adapter is part of independent individual research work. While the code is open-source under the Apache 2.0 license, please note: |
|
|
|
- You are free to use, modify, and distribute this adapter following the Apache 2.0 license terms |
|
- This work is provided "as is" without warranties or conditions of any kind |
|
- This is an independent research project and not affiliated with any organization |
|
- Attribution is appreciated but not required |
|
- For full license details, see: https://www.apache.org/licenses/LICENSE-2.0 |
|
|