pranay-j
's Collections
LLM_architectures
updated
Nemotron-4 15B Technical Report
Paper
•
2402.16819
•
Published
•
42
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
RWKV: Reinventing RNNs for the Transformer Era
Paper
•
2305.13048
•
Published
•
14
Reformer: The Efficient Transformer
Paper
•
2001.04451
•
Published
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
44
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
14
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
•
1910.10683
•
Published
•
8
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Paper
•
2112.06905
•
Published
•
1
UL2: Unifying Language Learning Paradigms
Paper
•
2205.05131
•
Published
•
5
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
•
2211.05100
•
Published
•
28
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
•
2301.13688
•
Published
•
8
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
242
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
142
Paper
•
2310.06825
•
Published
•
47
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
45
Paper
•
2401.04088
•
Published
•
157
The Falcon Series of Open Language Models
Paper
•
2311.16867
•
Published
•
12
Gemma: Open Models Based on Gemini Research and Technology
Paper
•
2403.08295
•
Published
•
47
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
•
2403.19887
•
Published
•
104
ReALM: Reference Resolution As Language Modeling
Paper
•
2403.20329
•
Published
•
21
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
•
2404.05892
•
Published
•
31
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
•
2404.07839
•
Published
•
41
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
63
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
103
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
•
2404.14219
•
Published
•
251
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper
•
2405.05254
•
Published
•
9
TransformerFAM: Feedback attention is working memory
Paper
•
2404.09173
•
Published
•
43
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
Comprehensive Study to Low Rank Compensation
Paper
•
2303.08302
•
Published
Kolmogorov-Arnold Transformer
Paper
•
2409.10594
•
Published
•
38