File size: 4,785 Bytes

96f47b6
 
 
 
 
 
 
032b091
d775a10
 
96f47b6
6d6be88
20820bd
 
702aedb
8d55342
 
20820bd
702aedb
20820bd
702aedb
20820bd
2d5b3f1
702aedb
8d55342
 
 
20820bd
8d55342
20820bd
 
8d55342
20820bd
 
 
 
 
 
8d55342
702aedb
20820bd
8d55342
 
 
 
 
20820bd
702aedb
bd7d834
 
 
 
 
20820bd
 
8d55342
 
f30a80c
 
8d55342
f30a80c
 
 
8d55342
f30a80c
 
8d55342
f30a80c
8d55342
f30a80c
 
 
 
 
 
 
 
 
8d55342
 
f30a80c
 
8d55342
 
702aedb
20820bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
702aedb
ab31247
 
 
 
 
 
702aedb
20820bd
702aedb
 
20820bd
8d55342
 
96f47b6

---
license: mit
datasets:
- allenai/c4
language:
- en
library_name: transformers
pipeline_tag: text-generation
base_model:
- anto18671/lumenspark
---
# Linformer-based Language Model

Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks.

## Table of Contents
- [Introduction](#introduction)
- [Architecture](#architecture)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Inference Parameters](#inference-parameters)
- [Hyperparameters](#hyperparameters)
- [Training Progress](#training-progress)
- [Sponsorship](#sponsorship)
- [License](#license)

## Introduction
The **Linformer-based Language Model** leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation.

## Architecture
Built upon the **Linformer Transformer**, the model incorporates several key innovations:

1. **Efficient Attention**: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space.
2. **Low-Rank Linear Projections**: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness.
3. **Self-Attention Mechanism**: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module.
4. **Factorized Feed-Forward Layers**: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters.
5. **PreNorm with LayerNorm and LayerScale**: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability.
6. **Dropout & Residual Connections**: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients.

## Installation
Install the `lumenspark` package via pip:

```bash
pip install lumenspark
```

This command installs the Linformer-based language model along with all necessary dependencies.

## Training Progress
Below is the training loss plot that shows the progress made during the model training process:

![Training Loss Plot](assets/training_loss_plot.png)

## Quick Start
Load the pre-trained model and tokenizer from Hugging Face to perform text generation:

```python
from lumenspark import LumensparkModel
import torch

# 1. Set up the device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 2. Load the model and move it to the device
model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device)

# 3. Example input text
input_text = "Once upon a time"

# 4. Generate text
output_text = model.generate(
    input_text,
    max_length=100,        # Maximum length of the generated sequence
    temperature=0.7,       # Controls randomness in predictions
    top_k=50,              # Top-k sampling to filter high-probability tokens
    top_p=0.9,             # Nucleus sampling to control diversity
    repetition_penalty=1.2 # Penalize repetition
)

# 5. Print the generated text
print(output_text)
```

## Inference Parameters
Customize text generation using the following parameters:

- **`max_length`**: Maximum length of the generated sequence.
- **`temperature`**: Controls randomness (lower = more deterministic).
- **`top_k`**: Limits sampling to top `k` tokens.
- **`top_p`**: Nucleus sampling based on cumulative probability `p`.
- **`repetition_penalty`**: Penalizes repeated tokens or phrases.
- **`no_repeat_ngram_size`**: Prevents repeated n-grams of specified size.

## Hyperparameters
Optimized for performance and efficiency:

- **`vocab_size`**: 50,257
- **`embed_dim`**: 768
- **`depth`**: 8 layers
- **`heads`**: 8 attention heads
- **`seq_length`**: 768 tokens
- **`dropout`**: 1/17
- **`k`**: 384 (attention projection)
- **`rank`**: 256 (low-rank projections)

## Acknowledgements

We would like to extend our gratitude to [RunPod](https://www.runpod.io) for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward.

![RunPod Logo](assets/RunPod.webp)

## Sponsorship
Support the ongoing development of Lumenspark!

### How to Sponsor
Visit [GitHub Sponsors](https://github.com/sponsors/anto18671) and choose a sponsorship tier that suits you. Thank you for your support!

## License
This project is licensed under the [MIT License](LICENSE).