LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4 • 36
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 37
Weight subcloning: direct initialization of transformers using larger pretrained ones Paper • 2312.09299 • Published Dec 14, 2023 • 17
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation Paper • 2203.03759 • Published Mar 7, 2022 • 5
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning Paper • 2306.07967 • Published Jun 13, 2023 • 24