Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2306.13575

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 2
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 6
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 37
Exploring Format Consistency for Instruction Tuning

Paper • 2307.15504 • Published Jul 28, 2023 • 7

Papers - Custom Layers - MLP

MLP Can Be A Good Transformer Learner

Paper • 2404.05657 • Published Apr 8 • 1
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

Paper • 2404.07200 • Published Apr 10 • 1
An inclusive review on deep learning techniques and their scope in handwriting recognition

Paper • 2404.08011 • Published Apr 10 • 1
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16 • 24

Papers - ETH Zurich

I-Design: Personalized LLM Interior Designer

Paper • 2404.02838 • Published Apr 3 • 2
Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
Fast Feedforward Networks

Paper • 2308.14711 • Published Aug 28, 2023 • 2
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 44

Papers - Custom Layers

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 16
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Paper • 2310.00535 • Published Oct 1, 2023 • 2
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Paper • 2307.09458 • Published Jul 18, 2023 • 10
The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 9

Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
Trap of Feature Diversity in the Learning of MLPs

Paper • 2112.00980 • Published Dec 2, 2021 • 1
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics

Paper • 2301.05816 • Published Jan 14, 2023 • 1
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Paper • 2108.04384 • Published Aug 9, 2021 • 1

Training a large language model from scratch for 1$ on LambdaLabs

TheBirdLegacy/FreeLoaderLM

Text Generation • Updated Sep 9, 2023
CofeAI/FLM-101B

Text Generation • Updated Sep 18, 2023 • 48 • 92
FLM-101B: An Open LLM and How to Train It with $100K Budget

Paper • 2309.03852 • Published Sep 7, 2023 • 43
Composable Function-preserving Expansions for Transformer Architectures

Paper • 2308.06103 • Published Aug 11, 2023 • 19

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs