LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published 8 days ago • 152
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients Paper • 2406.17660 • Published Jun 25, 2024 • 5 • 3
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients Paper • 2406.17660 • Published Jun 25, 2024 • 5
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 57 • 20
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 57