YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co./docs/hub/model-cards#model-card-metadata)
---
title: "GPTWithMoE + MCTS for Text Generation"
summary: "A GPT model enhanced with Mixture of Experts and FlashAttention, incorporating Monte Carlo Tree Search (MCTS) for controlled text generation."
tags:
- text-generation
- gpt
- mixture-of-experts
- mcts
- flashattention
license: "Apache-2.0"
datasets:
- finewebedu
library_name: "pytorch"
language: "en"
---
---
title: "GPTWithMoE + MCTS for Text Generation"
summary: "A GPT model enhanced with Mixture of Experts and FlashAttention, incorporating Monte Carlo Tree Search (MCTS) for controlled text generation."
tags:
- text-generation
- gpt
- mixture-of-experts
- mcts
- flashattention
license: "Apache-2.0"
datasets:
- finewebedu
library_name: "pytorch"
language: "en"
---

# **GPTWithMoE + MCTS for Text Generation**

## Model Summary
This model is a custom implementation of GPT enhanced with a Mixture of Experts (MoE) architecture and FlashAttention for efficient computation. The model incorporates Monte Carlo Tree Search (MCTS) for decoding, making it suitable for tasks that require controlled and exploratory text generation.

The model was trained on the **FinewebEdu dataset**, achieving a training loss of **1.579923** and a validation loss of **7.792485**.

### Key Features
- **Mixture of Experts (MoE)**: Dynamically selects the most relevant experts for each input, improving efficiency and specialization.
- **FlashAttention**: Optimized attention mechanism for long-sequence processing.
- **MCTS Decoding**: Uses Monte Carlo Tree Search to explore possible outputs, providing fine-grained control over text generation.
- **Custom Configurations**:
  - 6 Transformer layers
  - 4 attention heads
  - Embedding dimension: 256
  - Block size: 512 tokens

---

## Intended Use
This model is designed for text generation tasks such as:
- Story generation
- Dialogue systems
- Content creation with controlled exploration

---



#### Load Files from the Repository
You can use the `from_pretrained` method to load specific files or weights:
```python
from transformers import AutoTokenizer, AutoModel
import torch

# Load the tokenizer (if applicable)
tokenizer = AutoTokenizer.from_pretrained("RobbiePasquale/gpt-moe-mcts")

# Load model weights
model = torch.hub.load("RobbiePasquale/gpt-moe-mcts", "GPTWithMoE")

# Use tokenizer and model
prompt = "Once upon a time in a distant galaxy,"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate output (if model is compatible with Hugging Face architecture)
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))

Using huggingface_hub for Direct Access to Files

The huggingface_hub library allows users to download individual files or clone the repository.

1. Install the Library

pip install huggingface_hub

2. Clone the Repository

You can clone the repository to access all files in the directory:

from huggingface_hub import snapshot_download

# Download repository files
repo_path = snapshot_download(repo_id="RobbiePasquale/gpt-moe-mcts")

print(f"Repository downloaded to {repo_path}")

This will download all files from the repository, including:

  • moe_mcts_new.pt
  • q_star.py
  • mcts_text_gen.py

The files will be saved locally in a directory structure matching the repository.

3. Load Specific Files

To load individual files programmatically:

from huggingface_hub import hf_hub_download

# Download specific file
weights_path = hf_hub_download(repo_id="RobbiePasquale/gpt-moe-mcts", filename="moe_mcts_new.pt")
print(f"Downloaded weights to {weights_path}")

Command-Line Access

You can also clone the repository using Git:

git lfs install
git clone https://huggingface.co./RobbiePasquale/gpt-moe-mcts

How to Use

Installation

Ensure you have the following libraries installed:

  • PyTorch (pip install torch)
  • Transformers (pip install transformers)

Usage Instructions

Upload the following three files to your working directory:

  1. moe_mcts_new.pt: The pre-trained weights.
  2. q_star.py: Model definition, training, and validation logic.
  3. mcts_text_gen.py: Script for MCTS-based text generation.

Example: Generating Text

  1. Load the weights and model configuration:

    from q_star import GPTConfig, GPTWithMoE
    from transformers import GPT2Tokenizer
    import torch
    
    device = "cuda" if torch.cuda.is_available() else "cpu"
    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    tokenizer.pad_token = tokenizer.eos_token
    
    config = GPTConfig(vocab_size=50304, block_size=512, n_layer=6, n_head=4, n_embd=256)
    model = GPTWithMoE(config, num_experts=3, expert_layers=3, block_size_q=32, block_size_kv=32, num_blocks_kv=4, device=device)
    model.load_state_dict(torch.load("moe_mcts_new.pt", map_location=device))
    model.eval()
    
  2. Use the mcts_text_gen.py script for text generation:

    from mcts_text_gen import generate_text_with_mcts
    
    prompt = "Once upon a time in a distant galaxy,"
    generated_text = generate_text_with_mcts(
        model=model,
        tokenizer=tokenizer,
        prompt=prompt,
        max_length=100,
        num_simulations=50,
        c_puct=1.5,
        top_k=5,
        device=device,
    )
    
    print("Generated Text:")
    print(generated_text)
    

Training Details

  • Dataset: FinewebEdu
  • Batch Size: 16
  • Sequence Length: 512
  • Optimizer: AdamW with weight decay
  • Learning Rate: 3e-3
  • Gradient Accumulation: Adapted for a total batch size of 262144 tokens.

Loss Metrics

  • Training Loss: 1.579923
  • Validation Loss: 7.792485

Limitations

  • The model may struggle with highly domain-specific language not represented in FinewebEdu.
  • Computationally intensive due to MCTS decoding.

Citation

If you use this model, please cite:

@article{gptmoe_mcts,
  title={GPTWithMoE + MCTS for Controlled Text Generation},
  author={Robbie Pasquale},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .