Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.13166

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 16 days ago • 245
Learning an evolved mixture model for task-free continual learning

Paper • 2207.05080 • Published Jul 11, 2022 • 1
EVOLvE: Evaluating and Optimizing LLMs For Exploration

Paper • 2410.06238 • Published Oct 8, 2024 • 1
Smaller Language Models Are Better Instruction Evolvers

Paper • 2412.11231 • Published Dec 15, 2024 • 27

Papers - Training - Dataset Selection - Spectrogram Features

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Papers - Training - Backward Masking

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Papers - Training - Feature Extraction - Frequency - STFT

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Papers - KV Cache - Spectrogram

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Papers - Attention - Spectrogram - KV Cache

An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

about 24 hours ago

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 40
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 117
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 48
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 43

Papers - Llama 3 - Fine-tuning

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 45
LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 38
Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 44
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Paper • 2411.09595 • Published Nov 14, 2024 • 72

Papers - KV Cache

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18, 2024 • 17
SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22, 2024 • 24
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 258
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Papers - Training - Long Context

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 106
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35
An Evolved Universal Transformer Memory

Paper • 2410.13166 • Published Oct 17, 2024 • 3

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs