Text Generation
English
length-extrapolation
context-aware
positional-encoding
Cable

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Context-aware Biases for Length Extrapolation

The source code of (Context-aware Biases for Length Extrapolation)

πŸš€ News

  • [2025.02.3] Code release

Upcoming

  • Cleaning codebase
  • Adding scripts for training ALiBi, RoPE, T5-bias

Datasets and Models

Download the datasets from HuggingFace and use use dataset_preparation.py for saving tokenized dataset.

Some of trained models:

Dataset Model Parameters Sequence Length Checkpoint
Fineweb-Edu(10B) GPT-Medium 334M 1024 Link
Fineweb-Edu(10B) GPT-Medium 334M 512 Link
WikiText-103 GPT-Tiny 44M 1024 Link
WikiText-103 GPT-Tiny 44M 512 Link

Training

  • Single GPU

    python Cable.py --dataset-dir "path to dataset" --model "medium or small or tiny" --save-dir "dir for logs"
    
  • Multiple GPUs

    torchrun --standalone --nproc_per_node=2 Cable.py
    

For Hellaswag benchmark and evaluating extrapolation please use evaluation.ipynb notebook.

Length Extrapolation

A Cable model trained on T=1024 can extrapolate on T=8192, achieving a better performance (PPL=22.22) compared to the sinusoidal model (PPL=22.81) trained on T=8192.

Runtime and Memory Overhead

Cable improves the model's extrapolation ability significantly with a negligible burden in time and memory compared to the vanilla transformer. Furthermore, compared to existing RPE methods, our approach maintains nearly identical training time and GPU memory usage, while its inference overhead remains either negligible or comparable, depending on the sequence length.

Citation

If you use this repository for your research or wish to refer to our distillation method, please use the following BibTeX entry:


Acknowledgement

This repo is based on Karpathy/Build-NanoGPT. Thanks for their excellent work.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for axiomlaborg/Cable

Finetuned
(1443)
this model

Datasets used to train axiomlaborg/Cable