ARWKV

ARWKV๐Ÿชฟ

Paper Link๐Ÿ‘๏ธ | Githubโœ…

ARWKV-7B-FROM-32B (Preview 0.1)

ARWKV Hybrid Architecture

Preview version with RWKV-7 time mixing and Transformer MLP

This version doesn't have the parameter "g". It freezes the MLP and uses 7B logits for distillation alignment.

๐Ÿ“Œ Overview

ALL YOU NEED IS RWKV

This is an early preview of our 7B parameter hybrid RNN-Transformer model, trained on 2k context length (only stage-2 applied, without SFT or DPO) through 3-stage knowledge distillation from Qwen2.5-7B-Instruct. While being a foundational version, it demonstrates:

  • โœ… RWKV-7's efficient recurrence mechanism
  • โœ… No self-attention, fully O(n)
  • โœ… Constant VRAM usage
  • โœ… Single-GPU trainability

Roadmap Notice: We will soon open-source different enhanced versions with:

  • ๐Ÿš€ 16k+ context capability
  • ๐Ÿงฎ Math-specific improvements
  • ๐Ÿ“š RL enhanced reasoning model

How to use

pip3 install --upgrade rwkv-fla transformers
from transformers import AutoModelForCausalLM, AutoTokenizer


model = AutoModelForCausalLM.from_pretrained(
    "RWKV-Red-Team/ARWKV-7B-Preview-0.1-NoG",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RWKV-Red-Team/ARWKV-7B-Preview-0.1-NoG"
)

๐Ÿ”‘ Key Features

Component Specification Note
Architecture RWKV-7 TimeMix + SwiGLU Hybrid design
Context Window 2048 training CTX Preview limitation
Training Tokens 40M Distillation-focused
Precision FP16 inference recommended(16G Vram required) 15%โ†‘ vs BF16

๐Ÿ—๏ธ Architecture Highlights

Core Modification Flow

Qwen2.5 Decoder Layer:
- Grouped Query Attention
+ RWKV-7 Time Mixing (Eq.3)
- RoPE Positional Encoding
+ State Recurrence
= Hybrid Layer Output
Downloads last month
2
Safetensors
Model size
8.27B params
Tensor type
FP16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API does not yet support model repos that contain custom code.

Model tree for RWKV-Red-Team/ARWKV-7B-Preview-0.1-NoG

Finetuned
(5)
this model

Collection including RWKV-Red-Team/ARWKV-7B-Preview-0.1-NoG