merge-bt1 / README.md

Create README.md

92e67f5 verified about 1 month ago

5.51 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- meta-llama/Llama-3.1-8B-instruct
	pipeline_tag: text-generation
	tags:
	- lora
	- adapter
	- writing
	- CoT
	---
	# Merged-Llama-Adapters-317-320

	A merged LoRA adapter combining four fine-tuned adapters (317-320) for the Llama-3.1-8B language model.

	## Model Details

	- Base Model: meta-llama/Llama-3.1-8B-instruct
	- Adaptation Method: Merged LoRA
	- Source Adapters:
	- https://huggingface.co./kevin009/merge-17-2
	- https://huggingface.co./kevin009/llama318
	- https://huggingface.co./kevin009/llama319
	- https://huggingface.co./kevin009/llama320
	- https://huggingface.co./kevin009/llama1720-base
	- https://huggingface.co./kevin009/llama324

	## Merger Configuration

	### Source Adapters

	All source adapters share the following configuration:
	- Rank (r): 16
	- Alpha: 16
	- Target Modules:
	- q_proj (Query projection)
	- k_proj (Key projection)
	- v_proj (Value projection)
	- o_proj (Output projection)
	- up_proj (Upsampling projection)
	- down_proj (Downsampling projection)
	- gate_proj (Gate projection)

	### Merger Details

	- Merger Method: Linear interpolation
	- Merger Weights: Equal weights (0.25) for each adapter
	- Combined Rank: 16 (maintained from source adapters)

	## Usage

	This merged adapter must be used with the base Llama-3.1-8B-instruct model.

	### Loading the Model

	```python
	# Initialize with first F32 model
	peft_model = PeftModel.from_pretrained(model, "llama319", adapter_name="llama319")
	# Load F32 models (higher precision)
	peft_model.load_adapter("llama324", adapter_name="llama324")
	peft_model.load_adapter("llama320", adapter_name="llama320")
	peft_model.load_adapter("llama318", adapter_name="llama318")
	peft_model.load_adapter("llama1720-base", adapter_name="llama1720-base")
	peft_model.load_adapter("merge-17-20", adapter_name="merge-17-20")

	adapters = ["llama319", "llama320", "llama1720-base", "merge-17-20", "llama324", "llama318"]
	weights = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
	peft_model.add_weighted_adapter(adapters, weights, "merge", combination_type="ties", density=0.2)

	peft_model.set_adapter("merge")
	peft_model.save_pretrained("merged")

	```

	## Limitations and Biases

	- This merged adapter inherits limitations and biases from:
	- The base Llama-3.1-8B-instruct model
	- All four source adapters
	- The merging process may result in:
	- Potential loss of specialized capabilities from individual adapters
	- Averaged behavior across different adapter specializations
	- Possible interference between adapter weights

	## Merging Process

	The adapters were merged using the following approach:
	1. Linear interpolation of adapter weights
	2. Equal weighting (0.25) applied to each source adapter
	3. Preservation of original LoRA rank and architecture

	### Method Used

	The adapters were merged using PEFT (Parameter-Efficient Fine-Tuning) library's weighted adapter combination feature. The process combines multiple LoRA adapters using linear interpolation with specified weights.


	### Key Parameters

	- `combination_type="ties"`: Uses the TIES (Task Interference Edge Selection) method for combining adapters
	- `density=0.2`: Controls the sparsity of the merged weights


	### Notes

	- The order of loading adapters may affect the final result
	- Equal weights were chosen to maintain balanced influence from each adapter
	- The merged adapter maintains the same architecture and rank as the original adapters
	- While this adapter merges multiple fine-tunes, each component was developed as part of independent research efforts to explore and language model capabilities as part of R&D process.


	## Datasets

	- Not yet released, but should be released after evaluation has completed.
	- Creating dataset alone tooks more than 3 month for creating 30k pairs dataset.
	- Only 1k pairs example considered to be synthetic dataset, the rest half synthetic and human written text.

	### Use Cases

	- This merged adapter can be used for a wide range of tasks, including but not limited to:
	- Accessibility
	- Revision & Editing
	- instruction-following use with xml tags
	- Thinking & reasoning with xml tag of <thinking> and </thinking>, if being asked i the instructions.


	These Models not optimized for code, math, or other specialized tasks that need Perefence Optimization.

	## Why SFT Instead of RLHF/DPO?
	- RLHF and DPO approaches often lead to vocabulary limitations and overfitting due to their optimization objectives


	## Why Multiple Adapters?
	- Resource Issue: Placing the training into smaller adapters requires less GPU memory and compute time while gives more control over the training process.
	- Iterative Development: Each adapter can be developed and tested independently
	- Training Infrastructure: The complete fine-tuning process was conducted across multiple sessions, totaling over 100 hours on high-end GPUs (H100, H200, or L40s)
	- Flexibility: Multiple adapters allow for different combinations or weightings


	## License

	Licensed under Apache 2.0 License.

	This merged adapter is part of independent individual research work. While the code is open-source under the Apache 2.0 license, please note:

	- You are free to use, modify, and distribute this adapter following the Apache 2.0 license terms
	- This work is provided "as is" without warranties or conditions of any kind
	- This is an independent research project and not affiliated with any organization
	- Attribution is appreciated but not required
	- For full license details, see: https://www.apache.org/licenses/LICENSE-2.0