Accelerating Diffusion Transformers with Token-wise Feature Caching Paper • 2410.05317 • Published Oct 5, 2024
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 19
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Paper • 2412.06782 • Published Dec 9, 2024 • 6
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Paper • 2412.06782 • Published Dec 9, 2024 • 6
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction Paper • 2412.06782 • Published Dec 9, 2024 • 6 • 2
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 19
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published Nov 26, 2024 • 19 • 2
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 12
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 12
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 12 • 2
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21, 2024 • 34
VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval Paper • 2211.12764 • Published Nov 23, 2022
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning Paper • 2303.15230 • Published Mar 27, 2023
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation Paper • 2311.15841 • Published Nov 27, 2023 • 2
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation Paper • 2311.15773 • Published Nov 27, 2023 • 4
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21, 2024 • 34