Solar-DUS
Model Name: solar-DUS
Model Type: Transformer-based model
Architecture: Based on Llama-2-7B architecture with the DUS method applied to match the 48-layer structure of solar-10.7B
Training Data: 5000 examples from the Openocra dataset
Training Parameters:
- Batch Size: 1
- Epochs: 3
- Optimizer: AdamW
- Learning Rate: 5e-5
Model Overview
The solar-DUS
model is a transformer-based architecture built upon the Llama-2-7B model, utilizing the DUS (Dynamic Uncertainty Sampling) method to optimize performance. It features 48 layers, aiming to closely match the architecture of the upstage solar-10.7B model while leveraging DUS to improve generalization and training efficiency.
Model Performance
This model was trained on a subset of 5000 examples from the Openocra dataset, with the goal of testing whether the DUS method enhances performance compared to other configurations. Performance may vary depending on the specific use case, and further evaluation is recommended.
Intended Use
- Primarily intended for natural language processing (NLP) tasks, including but not limited to text generation, classification, and summarization.
- Suitable for fine-tuning with smaller datasets like Openocra, particularly when task-specific adjustments are necessary.
Limitations
- The model was trained on only 5000 examples from the Openocra dataset, which may limit its generalization ability.
- Further fine-tuning with larger datasets could improve performance for more complex tasks.
- The batch size was set to 1, which may impact efficiency and scalability when working with larger datasets.
- Downloads last month
- 11
Model tree for leftfooted/solar-DUS
Base model
meta-llama/Llama-2-7b-hf