Zhongpai Gao

gaozhongpai

Gaozhongpai

AI & ML interests

3D computer vision

Recent Activity

upvoted a paper about 10 hours ago

Token-Efficient Long Video Understanding for Multimodal LLMs

upvoted a paper 7 days ago

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

upvoted a paper 14 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

View all activity

Organizations

gaozhongpai's activity

upvoted a paper about 10 hours ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 3 days ago • 61

upvoted a paper 7 days ago

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published 11 days ago • 56

upvoted a paper 14 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 17 days ago • 128

upvoted a paper 19 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 21 days ago • 141

upvoted a paper 21 days ago

Latent Radiance Fields with 3D-aware 2D Representations

Paper • 2502.09613 • Published 24 days ago • 6

upvoted a paper 27 days ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published 30 days ago • 121

upvoted a paper 28 days ago

DynVFX: Augmenting Real Videos with Dynamic Content

Paper • 2502.03621 • Published Feb 5 • 29

upvoted 2 papers about 1 month ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 109

Relightable Full-Body Gaussian Codec Avatars

Paper • 2501.14726 • Published Jan 24 • 10

upvoted a paper 3 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140

upvoted 4 papers 5 months ago

Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats

Paper • 2410.12781 • Published Oct 16, 2024 • 6

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

Paper • 2410.02458 • Published Oct 3, 2024 • 9

MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis

Paper • 2410.02103 • Published Oct 2, 2024 • 8

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Paper • 2409.17145 • Published Sep 25, 2024 • 15

upvoted 6 papers 6 months ago

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23, 2024 • 29

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

Paper • 2409.12892 • Published Sep 19, 2024 • 5

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76

Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR)

Paper • 2409.10836 • Published Sep 17, 2024 • 5

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Paper • 2409.11211 • Published Sep 17, 2024 • 9

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Paper • 2409.08353 • Published Sep 12, 2024 • 13