Multimodal - a yamayou Collection

yamayou 's Collections

Idea

LLM

Multimodal

updated Sep 22

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 30
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4 • 29

Note (instruction+原稿の入力token)をTransformerでspeech tokenに変換した後、DMでより詳細な情報を肉付けする
An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Paper • 2407.04172 • Published Jul 4 • 22
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 116
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published Sep 13 • 46