Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.03301

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 34
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15 • 38
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9 • 21

LangBridge: Multilingual Reasoning Without Multilingual Supervision

Paper • 2401.10695 • Published Jan 19 • 5
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
YAYI 2: Multilingual Open-Source Large Language Models

Paper • 2312.14862 • Published Dec 22, 2023 • 13
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8 • 31

Efficient Training

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5 • 12
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2 • 45
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31 • 15
Understanding LLMs: A Comprehensive Overview from Training to Inference

Paper • 2401.02038 • Published Jan 4 • 61

LLM Pretraining

Scaling Data-Constrained Language Models

Paper • 2305.16264 • Published May 25, 2023 • 17
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29 • 47

Ultra-Long Sequence Distributed Transformer

Paper • 2311.02382 • Published Nov 4, 2023 • 2
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Paper • 2311.02103 • Published Nov 1, 2023 • 16
Extending Context Window of Large Language Models via Semantic Compression

Paper • 2312.09571 • Published Dec 15, 2023 • 12

Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
Co-training and Co-distillation for Quality Improvement and Compression of Language Models

Paper • 2311.02849 • Published Nov 6, 2023 • 3
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Paper • 2311.02303 • Published Nov 4, 2023 • 4
ADaPT: As-Needed Decomposition and Planning with Language Models

Paper • 2311.05772 • Published Nov 8, 2023 • 10

Data Efficiency

Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 16
TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 37
Time is Encoded in the Weights of Finetuned Language Models

Paper • 2312.13401 • Published Dec 20, 2023 • 19

Continual learning

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

Paper • 2310.10134 • Published Oct 16, 2023 • 1
TiC-CLIP: Continual Training of CLIP Models

Paper • 2310.16226 • Published Oct 24, 2023 • 8
In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 28
Controlled Decoding from Language Models

Paper • 2310.17022 • Published Oct 25, 2023 • 14

Approximating Two-Layer Feedforward Networks for Efficient Transformers

Paper • 2310.10837 • Published Oct 16, 2023 • 10
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Paper • 2310.16795 • Published Oct 25, 2023 • 26
LLM-FP4: 4-Bit Floating-Point Quantized Transformers

Paper • 2310.16836 • Published Oct 25, 2023 • 13

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs