Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.05254

Papers - BitNet

Additional paper with faq, code and tips on: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8 • 9
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published Jun 6 • 36

Ternary LLMs & Knowledge distillation & SOTA

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1 • 143
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 73
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

Paper Reading List

xLSTM: Extended Long Short-Term Memory

Paper • 2405.04517 • Published May 7 • 11
You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8 • 9
Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14 • 14
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

Papers - University - Tsinghua University

Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1 • 11
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

Paper • 2404.00987 • Published Apr 1 • 21
Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2 • 44
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

Paper • 2404.02893 • Published Apr 3 • 20

Papers - Microsoft

Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22 • 32
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

Paper • 2403.19655 • Published Mar 28 • 18
WavLLM: Towards Robust and Adaptive Speech Large Language Model

Paper • 2404.00656 • Published Mar 31 • 10
Enabling Memory Safety of C Programs using LLMs

Paper • 2404.01096 • Published Apr 1 • 1

To read... eventually

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

LLM_architectures

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 42
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 14
Reformer: The Efficient Transformer

Paper • 2001.04451 • Published Jan 13, 2020

AutoMix: Automatically Mixing Language Models

Paper • 2310.12963 • Published Oct 19, 2023 • 14
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Paper • 2310.03094 • Published Oct 4, 2023 • 12
MatFormer: Nested Transformer for Elastic Inference

Paper • 2310.07707 • Published Oct 11, 2023 • 1
DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Paper • 2310.08461 • Published Oct 12, 2023 • 1

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs