Xin Li's picture

Xin Li PRO

lixin4ever

·

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

liked a model 1 day ago

qihoo360/TinyR1-32B-Preview

liked a dataset 4 days ago

facebook/natural_reasoning

liked a model 4 days ago

apple/aimv2-large-patch14-448

View all activity

Organizations

lixin4ever's activity

upvoted a paper 8 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 8 days ago • 118

upvoted a paper 9 days ago

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published 9 days ago • 25

upvoted a collection 10 days ago

VideoRefer

6 items • Updated 10 days ago • 2

upvoted a paper 16 days ago

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published 24 days ago • 22

upvoted a collection 16 days ago

Ovis2

Our latest advancement in multi-modal large language models (MLLMs) • 8 items • Updated 12 days ago • 52

upvoted a paper 21 days ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 99

upvoted a collection 29 days ago

🖼️ 2025 MLLMs

6 items • Updated Jan 26 • 1

upvoted a collection about 1 month ago

VideoLLaMA3

Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated 22 days ago • 13

upvoted 2 papers about 1 month ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 83

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 83

upvoted 3 papers about 2 months ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 90

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41

upvoted 2 collections 3 months ago

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated 18 days ago • 64

Inf-CL

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated Jan 24 • 3

upvoted a collection 4 months ago

OpenCoder Datasets

OpenCoder datasets! • 6 items • Updated Nov 15, 2024 • 39

upvoted 4 papers 4 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16, 2024 • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 92