31 92 9

Byung-Kwan Lee

BK-Lee

https://sites.google.com/view/byungkwanlee

AI & ML interests

Computer Vision, Machine Learning, Large Language and Vision Models, Efficient Modeling

Recent Activity

upvoted a paper 1 day ago

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

upvoted a paper 2 days ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

upvoted a paper 2 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

View all activity

Organizations

BK-Lee's activity

upvoted a paper 1 day ago

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published 4 days ago • 53

upvoted 3 papers 2 days ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 4 days ago • 55

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 4 days ago • 66

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 4 days ago • 186

upvoted a paper 4 days ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published 5 days ago • 35

upvoted a collection 5 days ago

Eagle 2

Collection

Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated 3 days ago • 18

upvoted a paper 6 days ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published 9 days ago • 97

upvoted a paper 7 days ago

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published 12 days ago • 31

upvoted a collection 7 days ago

Multimodal LLM

Collection

155 items • Updated 3 days ago • 9

upvoted a paper 7 days ago

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Paper • 2411.14522 • Published Nov 21, 2024 • 32

upvoted a paper 8 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published 10 days ago • 33

upvoted 3 papers 11 days ago

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published 12 days ago • 13

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published Jun 17, 2024 • 21

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 12 days ago • 268

upvoted a paper 13 days ago

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 16 days ago • 59

upvoted 4 papers 19 days ago

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

Paper • 2411.19146 • Published Nov 28, 2024 • 16

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

Paper • 2412.17153 • Published Dec 22, 2024 • 34

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published 27 days ago • 36

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published 23 days ago • 31

upvoted a paper 23 days ago

Are Vision-Language Models Truly Understanding Multi-vision Sensor?

Paper • 2412.20750 • Published 27 days ago • 20