Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2307.03172

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 16
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 11
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 47

LM Context Length

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36

papers-context-length

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36

Must Reads On Transformers and Diffusers

Explore the cutting-edge of AI with our curated list of must reads on Transformers & Diffusers, driving innovation in generative-AI and beyond.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Paper • 1701.06538 • Published Jan 23, 2017 • 4
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Paper • 2005.11401 • Published May 22, 2020 • 12
Language Model Evaluation Beyond Perplexity

Paper • 2106.00085 • Published May 31, 2021

To read... eventually

about 17 hours ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 242

Lee's RoPE Tricks / Context Extension Reads

Set of Long Context (RoPE or otherwise) I'm collecting off of HF

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 111
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 21
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18 • 15
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Paper • 2401.07872 • Published Jan 15 • 2

Lost in the Middle: How Language Models Use Long Contexts

Paper • 2307.03172 • Published Jul 6, 2023 • 36

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs