Training & Architectures
Paper • 1706.03762 • Published • 44Note 🔖 GPT-2: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 8Note 🔖 GH: https://github.com/Dao-AILab/flash-attention 🔖 TGI Docs: https://huggingface.co./docs/text-generation-inference https://benjaminwarner.dev/2023/08/16/flash-attention-compile 🔖 Flash Attention-3: https://www.together.ai/blog/flashattention-3
Mixtral of Experts
Paper • 2401.04088 • Published • 157Mistral 7B
Paper • 2310.06825 • Published • 47Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 121Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 242Code Llama: Open Foundation Models for Code
Paper • 2308.12950 • Published • 22Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 20WizardLM: Empowering Large Language Models to Follow Complex Instructions
Paper • 2304.12244 • Published • 13The Falcon Series of Open Language Models
Paper • 2311.16867 • Published • 12DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181
TeleChat Technical Report
Paper • 2401.03804 • Published • 7Note 🔖Dataset: https://huggingface.co./datasets/Tele-AI/TeleChat-PTD
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 89
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 69Note Check their findings and reward models.
Foundation Models for Generalist Geospatial Artificial Intelligence
Paper • 2310.18660 • Published • 8
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126Note 🔖Input: (Text, Image) Output: (Text, Image)
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 45Note LATEST (Updated 2024): https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf (Gemini 1.5): https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf OpenAI Stuff: 📜GPT-4V System Card: https://cdn.openai.com/papers/GPTV_System_Card.pdf 📜GPT 4: https://cdn.openai.com/papers/gpt-4-system-card.pdf Anthropic: 🔖Claude 3: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
google/gemma-7b
Text Generation • Updated • 334k • • 3.05kNote 🔖 Series: https://huggingface.co./collections/google/gemma-release-65d5efbccdbb8c4202ec078b 🔖 Details: https://ai.google.dev/gemma/docs/model_card 🔖 Paper: https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 36Note 🔖 https://largeworldmodel.github.io/ Context Scaling: 1M
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 111Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Paper • 2312.17661 • Published • 13An In-depth Look at Gemini's Language Abilities
Paper • 2312.11444 • Published • 1
Question Aware Vision Transformer for Multimodal Reasoning
Paper • 2402.05472 • Published • 8Note Requires somewhat grounded data or product specific knowledge.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118Training Transformers Together
Paper • 2207.03481 • Published • 5Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper • 2401.00448 • Published • 28FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 31Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88m-a-p/ChatMusician
Text Generation • Updated • 293 • 116VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Paper • 2207.10551 • PublishedRecurrent Linear Transformers
Paper • 2310.15719 • PublishedTraining Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 134