lumenspark / README.md
anto18671's picture
Update README.md
bd7d834 verified
|
raw
history blame
4.79 kB
---
license: mit
datasets:
- allenai/c4
language:
- en
library_name: transformers
pipeline_tag: text-generation
base_model:
- anto18671/lumenspark
---
# Linformer-based Language Model
Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks.
## Table of Contents
- [Introduction](#introduction)
- [Architecture](#architecture)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Inference Parameters](#inference-parameters)
- [Hyperparameters](#hyperparameters)
- [Training Progress](#training-progress)
- [Sponsorship](#sponsorship)
- [License](#license)
## Introduction
The **Linformer-based Language Model** leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation.
## Architecture
Built upon the **Linformer Transformer**, the model incorporates several key innovations:
1. **Efficient Attention**: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space.
2. **Low-Rank Linear Projections**: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness.
3. **Self-Attention Mechanism**: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module.
4. **Factorized Feed-Forward Layers**: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters.
5. **PreNorm with LayerNorm and LayerScale**: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability.
6. **Dropout & Residual Connections**: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients.
## Installation
Install the `lumenspark` package via pip:
```bash
pip install lumenspark
```
This command installs the Linformer-based language model along with all necessary dependencies.
## Training Progress
Below is the training loss plot that shows the progress made during the model training process:
![Training Loss Plot](assets/training_loss_plot.png)
## Quick Start
Load the pre-trained model and tokenizer from Hugging Face to perform text generation:
```python
from lumenspark import LumensparkModel
import torch
# 1. Set up the device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# 2. Load the model and move it to the device
model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device)
# 3. Example input text
input_text = "Once upon a time"
# 4. Generate text
output_text = model.generate(
input_text,
max_length=100, # Maximum length of the generated sequence
temperature=0.7, # Controls randomness in predictions
top_k=50, # Top-k sampling to filter high-probability tokens
top_p=0.9, # Nucleus sampling to control diversity
repetition_penalty=1.2 # Penalize repetition
)
# 5. Print the generated text
print(output_text)
```
## Inference Parameters
Customize text generation using the following parameters:
- **`max_length`**: Maximum length of the generated sequence.
- **`temperature`**: Controls randomness (lower = more deterministic).
- **`top_k`**: Limits sampling to top `k` tokens.
- **`top_p`**: Nucleus sampling based on cumulative probability `p`.
- **`repetition_penalty`**: Penalizes repeated tokens or phrases.
- **`no_repeat_ngram_size`**: Prevents repeated n-grams of specified size.
## Hyperparameters
Optimized for performance and efficiency:
- **`vocab_size`**: 50,257
- **`embed_dim`**: 768
- **`depth`**: 8 layers
- **`heads`**: 8 attention heads
- **`seq_length`**: 768 tokens
- **`dropout`**: 1/17
- **`k`**: 384 (attention projection)
- **`rank`**: 256 (low-rank projections)
## Acknowledgements
We would like to extend our gratitude to [RunPod](https://www.runpod.io) for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward.
![RunPod Logo](assets/RunPod.webp)
## Sponsorship
Support the ongoing development of Lumenspark!
### How to Sponsor
Visit [GitHub Sponsors](https://github.com/sponsors/anto18671) and choose a sponsorship tier that suits you. Thank you for your support!
## License
This project is licensed under the [MIT License](LICENSE).