-
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 2 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 6 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Exploring Format Consistency for Instruction Tuning
Paper • 2307.15504 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2306.13575
-
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 1 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 24
-
I-Design: Personalized LLM Interior Designer
Paper • 2404.02838 • Published • 2 -
Scaling MLPs: A Tale of Inductive Bias
Paper • 2306.13575 • Published • 14 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 2 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 44
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16 -
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 10 -
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9
-
Scaling MLPs: A Tale of Inductive Bias
Paper • 2306.13575 • Published • 14 -
Trap of Feature Diversity in the Learning of MLPs
Paper • 2112.00980 • Published • 1 -
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics
Paper • 2301.05816 • Published • 1 -
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Paper • 2108.04384 • Published • 1
-
TheBirdLegacy/FreeLoaderLM
Text Generation • Updated -
CofeAI/FLM-101B
Text Generation • Updated • 48 • 92 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 43 -
Composable Function-preserving Expansions for Transformer Architectures
Paper • 2308.06103 • Published • 19