Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.09371

Papers - Text - Encoders - Fire

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Papers - Text - Length Generalization

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Papers - Text - Decoders

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12
A Thorough Examination of Decoding Methods in the Era of LLMs

Paper • 2402.06925 • Published Feb 10 • 1

Papers - Text - Encoders

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12
Triple-Encoders: Representations That Fire Together, Wire Together

Paper • 2402.12332 • Published Feb 19 • 2

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16 • 29
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16 • 40

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12
Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16 • 40

InContext Learning

In-Context Language Learning: Architectures and Algorithms

Paper • 2401.12973 • Published Jan 23 • 4
Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1 • 21
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs