3 4 1

Siteng Huang

huangsiteng

https://kyonhuang.top/

AI & ML interests

vision-language models

Recent Activity

authored a paper about 2 months ago

Accelerating Diffusion Transformers with Token-wise Feature Caching

authored a paper about 2 months ago

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

authored a paper about 2 months ago

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

View all activity

Organizations

None yet

huangsiteng's activity

authored 3 papers about 2 months ago

Accelerating Diffusion Transformers with Token-wise Feature Caching

Paper • 2410.05317 • Published Oct 5, 2024

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Paper • 2411.17686 • Published Nov 26, 2024 • 19

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

Paper • 2412.06782 • Published Dec 9, 2024 • 6

upvoted a paper about 2 months ago

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

Paper • 2412.06782 • Published Dec 9, 2024 • 6

commented a paper about 2 months ago

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction

Paper • 2412.06782 • Published Dec 9, 2024 • 6 •

upvoted a paper 2 months ago

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Paper • 2411.17686 • Published Nov 26, 2024 • 19

commented a paper 2 months ago

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Paper • 2411.17686 • Published Nov 26, 2024 • 19 •

authored a paper 5 months ago

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Paper • 2409.07239 • Published Sep 11, 2024 • 12

upvoted a paper 5 months ago

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Paper • 2409.07239 • Published Sep 11, 2024 • 12

commented a paper 5 months ago

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Paper • 2409.07239 • Published Sep 11, 2024 • 12 •

liked a Space 10 months ago

Runtime error

🐍

Cobra

Cobra: Extending Mamba to MLLM for Efficient Inference

authored 5 papers 11 months ago

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21, 2024 • 34

VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

Paper • 2211.12764 • Published Nov 23, 2022

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

Paper • 2303.15230 • Published Mar 27, 2023

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Paper • 2311.15841 • Published Nov 27, 2023 • 2

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Paper • 2311.15773 • Published Nov 27, 2023 • 4

upvoted a paper 11 months ago

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21, 2024 • 34